&copy; 2025 by Deitel & Associates, Inc. All Rights Reserved. https://deitel.com

<hr/>

In [None]:
%%html
<style>
    h1 {color:#BB0000}
    h2 {color:purple}
    h3 {color:#0099ff}
    hr {    
        border: 0;
        height: 3px;
        background: #333;
        background-image: linear-gradient(to right, #ccc, black, #ccc);
    }
</style>

In [None]:
# enable high-res images in notebook 
%config InlineBackend.figure_format = 'retina'

# What is Generative AI?
* AI that creates content
    * Text, images, audio, video, music, poetry, code, ...
    * Makes content creation accessible to everyone
* Can be tailored to specific requirements
* References
    * https://en.wikipedia.org/wiki/Generative_artificial_intelligence
    * https://www.techtarget.com/searchenterpriseai/definition/generative-AI
    * https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-generative-ai

<hr/>

# What is a Large Language Model (LLM)?
* AI that understands/generates natural language 
* "Large" — Neural nets with **billions**, and now **trillions**, of parameters
    * As neural nets learn, they tune parameters in an effort to produce better results
    * Parameters helps models understand/generate language patterns 
* Trained on massive amounts of text
    * books, articles, the Internet, code, ...
* Great at natural language tasks
    * Translation, summarization, answering questions, sentiment analysis, creative writing, ...
    * Also generating code, explaining code, finding errors in code, ...
* Can generate text indistinguishable from human writing
* Some continue to learn and adapt, improving effectiveness over time
* Many LLMs use **generative pretrained transformers (GPTs)**
* References
    * https://en.wikipedia.org/wiki/Large_language_model
    * https://www.techtarget.com/whatis/definition/large-language-model-LLM
    * https://machinelearningmastery.com/what-are-large-language-models/
* There are other techniques, like **Bidirectional Encoder Representations from Transformers (BERT)**
    * https://en.wikipedia.org/wiki/BERT_(language_model)

<hr/>

# What is a Transformer? 
* Neural network that processes sequential data, like text
* Can understand/recognize relationships between distant words in a sentence 
* Also recognizes the relative importance of words 
* References for Transformers
    * https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)  
    * https://blogs.nvidia.com/blog/what-is-a-transformer-model/
    * https://towardsdatascience.com/transformers-141e32e69591

<hr/>

# What is a Self-Attention? 
* Key aspect of transformers
* When you try to understand a word in a sentence, words around it (its context) can help you understand the meaning
    * "good" is positive, but preceding it with "not" makes the context negative
* Transformers use self-attention to understand each word by looking at other words around it
    * Words are not equally important 
    * Self-attention weighs which words to pay attention to when trying to understand each word
* Unlike humans, transformers can consider massive numbers of words in parallel to understand how words are connected
* References for Self-Attention 
    * https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a
    * https://en.wikipedia.org/wiki/Attention_(machine_learning)

<hr/>

# Tokens, Context and Context Windows 
* Token 
    * Unit of processing in language models
    * Words, parts of words, punctuation
    * Tokenization converts tokens to numerical values models can process
    * Models typically have token limits
* Context
    * Sequence of tokens that comes before (and sometimes after) a word/phrase
    * Understanding context helps models generate more accurate responses
* Context window
    * Maximum range of tokens (both input and output) model can consider at once
    * Larger == more context to the conversation == more relevant/coherent responses
    * Google Gemini 1 million tokens
    > https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#context-window
    * Claude.ai Pro 200K tokens
    > https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#context-window
    * GPT-4 Turbo (OpenAI) 128K tokens
    > https://platform.openai.com/docs/models/overview
    

<hr/>

# Prompts and Prompt Engineering
* Poor prompts, lead to poor results — "garbage in, garbage out"
* Prompt engineering is the process of desining prompts to obtain the best responses
* OpenAI provides various strategies
    * https://platform.openai.com/docs/guides/prompt-engineering

<hr/>

# Issues with Generative AI
* Ethics — e.g., misinformation, realistic images/videos
* Biases
* **Hallucinations** — They sometime make up facts
* Copyrights
    * NYT Sues OpenAI: https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
* Picks up human traits? 
    * ChatGPT gets lazy (“winter break hypothesis”): https://arstechnica.com/information-technology/2023/12/is-chatgpt-becoming-lazier-because-its-december-people-run-tests-to-find-out/)
* ...

<hr/>

# OpenAI 
* https://openai.com/
* One of many generative AI providers 
* Creator of ChatGPT, Dall-E
* APIs for
    * Audio — speech-to-text, text-to-speech, translation
    * Chat (ChatGPT) 
    * Fine - tuning—tailoring to your own data
    * Images (Dall-E)
    * Models — explains all the available models
    * Moderations — checks content policy violations
    * Assistants — create assitants that can perform step-by-step tasks
    * more...
* OpenAI Cookbook — loaded with examples: https://cookbook.openai.com/
* OpenAI Docs — tutorials, guides, ...: https://platform.openai.com/docs/overview
* OpenAI API Reference: https://platform.openai.com/docs/api-reference 

<hr/>

# Installing the OpenAI Python Module
* `pip install --upgrade openai`

<hr/>

# Get an OpenAI Developer Account
* Signup: https://platform.openai.com/signup
* Pricing — small free credit depending on location 
> https://openai.com/pricing
* Rate Limits 
> https://platform.openai.com/docs/guides/rate-limits/rate-limits

<hr/>

## OpenAI Developer API Key
* While signed into your account
    * Go to https://platform.openai.com/docs/overview
    * Hover over the icons at the left of the page and click the **API keys** icon (looks like a padlock)
    * Click **Create new secret key**
    * Optionally name your key
    * Click **Create secret key**
    * Copy the lengthy alphanumeric key
    * Follow instructions for your platform at https://platform.openai.com/docs/quickstart to store the API key in an environment variable
    * **Restart your command line before launching iPython or Jupyter Lab**

## Importing the Modules for This Demo

In [None]:
from openai import OpenAI 
# from docx import Document # for creating a Microsoft Word docx file

## Creating the `OpenAI` Client Object
* Provides access to the OpenAI APIs

In [None]:
import keys
#client = OpenAI() # if stored in environment variable OPENAI_API_KEY
client = OpenAI(api_key=keys.OPENAI_API_KEY)

* Assumes your API key is in the environment variable `OPENAI_API_KEY`
* If you used a different environment variable name, replace with preceding statement with
> `client = OpenAI(api_key=os.environ.get("CUSTOM_ENV_NAME"))`

<hr/>

# Text Summarization
## Summary Abstract

In [None]:
from pathlib import Path

path = Path('resources/transcript.txt')
transcript = path.read_text()

# Summary Abstract
response = client.chat.completions.create(
    model='gpt-4o',
    messages=[
        {'role': 'system', 
         'content': """Given a transcript of a technical presentation, create a 
             concise, clear summary abstract in paragraph form, written 
             in a direct style that avoids prepositional phrases and 
             uses straightforward sentence structures. Focus on the key 
             points without referring to the speaker. Capture the key 
             ideas, so that a person can understand it without reading the 
             full transcript."""},
        {'role': 'user', 'content': transcript}
    ]
)

In [None]:
print(response.choices[0].message.content)

## Key Points

In [None]:
response = client.chat.completions.create(
    model='gpt-4o',
    messages=[
        {'role': 'system', 
         'content': """Given a transcript of a technical presentation,  
             identify the top 5 key points and present them as a numbered list.  
             For each point, use a concise, direct, clear, and
             straightforward sentence. Avoid prepositional phrases."""},
        {'role': 'user', 'content': transcript}
    ]
)

In [None]:
print(response.choices[0].message.content)

# Sentiment Analysis

In [None]:
response = client.chat.completions.create(
    model='gpt-4o',
    messages=[
        {'role': 'system', 
         'content': """You are an expert in sentiment analysis. Analyze 
             the following presentation transcript and state whether
             the sentiment is positive, negative, or neutral.
             Explain your analysis."""},
        {'role': 'user', 'content': transcript}
    ]
)

In [None]:
print(response.choices[0].message.content)

# Image Descriptions

In [None]:
import base64

In [None]:
def describe_image(prompt, path_to_image):
    with open(path_to_image, "rb") as img_file:
        base64_image = base64.b64encode(img_file.read()).decode("utf-8")
        
        return client.chat.completions.create(
            model='gpt-4o',
            messages=[
                {'role': 'system', 
                 'content': """You are an expert at creating accessible descriptions
            of images per the World Wide Web Consortium's (W3C's) 
            guidelines. Given an image, explain it in detail for 
            people who are blind or have low vision."""},
                {'role': 'user',
                 'content': [
                    {'type': 'text', 'text': prompt},
                    {'type': 'image_url',
                     'image_url': {
                         'url': f"data:image/jpeg;base64,{base64_image}", 
                         'detail': 'high'}}]
                }
            ]
        )

## Describe UML diagram
![ForLoop.png](attachment:12c22ad2-4b72-4d88-b1bd-b01268405fc1.png)

In [None]:
response = describe_image(
    """The attached image is a UML activity diagram showing the
    flow of control through a Java 'for' loop with UML 
    notes for additional context.""", 
    './resources/ForLoop.png')

In [None]:
print(response.choices[0].message.content)

## Describe a beach scene
![beach.jpg](attachment:4d865d06-e69f-4299-8bc8-cd04c55b6d93.jpg)

In [None]:
response = describe_image(
    """Describe in detail the attached photo of a beach scene,
    including every object in the photo.""", 
    './resources/beach.jpg')

In [None]:
print(response.choices[0].message.content)

# Text Translation

In [None]:
def translate(text, language):
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[
            {'role': 'system',
             'content': f"""You are an expert in natural language translation.
                 Translate the following text into {language}."""},
            {'role': 'user',
             'content': text}
        ]
    )

    return response.choices[0].message.content

In [None]:
english_text = "Today was a beautiful day. Tomorrow looks like bad weather."

## Translate to Spanish and Japanese
* Automatically figures out source language

In [None]:
spanish_text = translate(english_text, 'Spanish')
spanish_text

In [None]:
japanese_text = translate(english_text, 'Japanese')
japanese_text

## Translate from Spanish and Japanese Back to English

In [None]:
translate(spanish_text, 'English')

In [None]:
translate(japanese_text, 'English')

# Code Generation

In [None]:
# Generate Python code with OpenAI API
response = client.chat.completions.create(
    model='gpt-4o',
    messages=[
        {'role': 'system', 
         'content': "You are an expert Python programmer."},
        {'role': 'user', 
         'content': """Write Python code that uses the word_cloud 
             Python library to generate a rainbow color word cloud from 
             the top 200 words in the file 'RomeoAndJuliet.txt' which 
             is in the current folder. Remove both modern English and 
             old English stop words. Use the mask image named mask_heart.png, 
             which is also in the current folder. 
             Display the generated word cloud."""}
    ]
)

In [None]:
print(response.choices[0].message.content)

## Try the code

# Named Entity Recognition

In [None]:
import json

path = Path("resources/web.txt")
text = path.read_text()
text

In [None]:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {'role': 'system', 
         'content': 
             """You are an expert in named entity recognition NER). 
             Analyze the user-supplied text and extract the named 
             entities. Return the result as a JSON object with the 
             following format:
             {
                "entities": [
                   {"text": "Entity name", "tag": "Entity type"}
                ]
             }
             Ensure the output is strictly in JSON format with 
             no additional text."""},
        {'role': 'user', 'content': text}
    ],
    response_format={"type": "json_object"}
)

json_response = json.loads(response.choices[0].message.content)

In [None]:
print(json.dumps(json_response, indent=2))

# Speech-to-Text

In [None]:
def speech_to_text(audio_path):
    with open(audio_path, 'rb') as audio_file:
        transcript = client.audio.transcriptions.create( 
            model='whisper-1', file=audio_file)
    return transcript.text 

In [None]:
transcript = speech_to_text('resources/deep_learning_intro.m4a')

In [None]:
transcript

# Text-to-Speech

In [None]:
def text_to_speech(text, filename):
    response = client.audio.speech.create(
        model='tts-1-hd',
        voice='onyx',
        input=text
    )

    response.write_to_file(f'{filename}')

In [None]:
text_to_speech(english_text, './resources/outputs/english_speech.mp3')

In [None]:
text_to_speech(spanish_text, './resources/outputs/spanish_speech.mp3')

In [None]:
text_to_speech(japanese_text, './resources/outputs/japanese_speech.mp3')

In [None]:
import IPython
IPython.display.Audio(filename='./resources/outputs/english_speech.mp3')

In [None]:
import IPython
IPython.display.Audio(filename='./resources/outputs/spanish_speech.mp3')

In [None]:
import IPython
IPython.display.Audio(filename='./resources/outputs/japanese_speech.mp3')

# Image Generation


In [None]:
def create_image(prompt):
    response = client.images.generate(
        model='dall-e-3',
        prompt=prompt,
        quality='hd'
    )
    return response.data[0].url

In [None]:
create_image(
    """Havanese dog as a Japanese anime character
    in neon colors against a black background""")

In [None]:
create_image('Havanese dog in the style of Vincent Van Gogh')

In [None]:
create_image('Havanese dog in the style of Leonardo DaVinci')

# Content Moderation

In [None]:
while True:
    prompt = input("\nEnter prompt (exit to terminate): ")
    
    if prompt.lower() == "exit":
        break

    response = client.moderations.create(input=prompt)
    result = response.results[0]

    if not result.flagged:
        print("Prompt not flagged for offensive content")
    else:
        print("Offensive content categories & scores:")
        
        for cat, flag, score in \
            zip(result.categories.__dict__.keys(),
                result.categories.__dict__.values(),
                result.category_scores.__dict__.values()):
            print(f"{cat if cat else 'None'}: {flag if flag else 'None'} {score if score else 0:.2f}")

<hr/>
&copy; 2025 by Deitel & Associates, Inc. All Rights Reserved. https://deitel.com