<img src="images/inmas.png" width=130x align='right' />

# Notebook 23 - OpenAI
This notebook is a fun tutorial on using the API of OpenAI. It covers:

- Using the OpenAI API
- Using chat completions
- Speech-to-text and text-to-speech generation
- Image generation

### Prerequisite
Notebook 22

### This notebook relies on the *chatgpt* environment installed in previous notebook
The environment *chatgpt* should have been created as part of the first exercise in the *Virtual Environments* notebook which is reproduced here:

#### Exercise 1.
<small>
This exercise will create the environment needed for the next notebook on using the OpenAI API.

Using a terminal and the command line, create a new environment called chatgpt:

- Make sure that the environment is visible in Jupyter
- Add the following packages: openai, playsound, and pvrecorder
- Test if you can load these modules from an empty Jupyter Notebook
</small>

#### Enabling the right kernel
- To enable the *chatgpt* environment in Jupyter:
    - Select *Kernel -> Change Kernel -> Python [conda env:chatgpt]*

###  Loading the required modules
The following modules should be available from the *chatgpt* virtual environment

In [None]:
import os
from openai import OpenAI                        # The API to OpenAI
from playsound import playsound                  # To play the generated speech
from IPython.display import Image, display       # To show generated images

# To silence deprecation warnings of streaming_response
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)

### Obtaining credits from OpenAI
For successfully running this notebook, we either need to open a new account on OpenAI, and buy some credits

- If you haven't already, sign up on [platform.openai.com](https://platform.openai.com)
- Go to billing and buy \$5 of non-renewable credits - do not select auto-refill

### The OpenAI API requires an account and some credits to be able to make requests to the platform
- You will not be able to run this notebook successfully without credits and an account on openAI
- Each query costs about a penny

### Obtaining an API key
Before we can use the API of chatGPT, we need to obtain a key from OpenAI


- Go on *Quickstart* to *Create and export API key*
- Give a name to your key, say 'myFirstKey'
- Once generated, hit Copy to copy the key and enter it in the cell below by pasting
- Save the key as there is no way to see it again on the website

In [None]:
APIkey = 'YOUR_API_KEY_HERE'

### Generating a client object with our newly-obtained API key
We are now ready to create a client object

In [None]:
client = OpenAI(api_key=APIkey)

We can now directly interact with OpenAI through this `client` object 

### OpenAI offers different API's and models


They include:
- Many models for chat completion - GPT
    - gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, etc.
- Image generation from text - DALL-E
    - dall-e-3, dall-e-2
- Text conversion to spoken text - TTS (text to speech)
    - tts-1
- Audio to text translation - Whisper
    - whisper-1

A list of all the models contained in the API can be found [here](https://platform.openai.com/docs/models). 
Take a moment to browse through them.


### Anatomy of a request

We will use the chat completion API and craft our first request

Let's analyse the syntax of a call before running it

We need to specify the `model` to be used and the `messages` argument as a list of dictionaries, each containing a role and content:
```python
response = client.chat.completions.create(
    model='gpt-3.5-turbo',
    messages=[ 
      {'role': 'system', 'content': 'You are a math professor answering with dark humor.'},
      {'role': 'user', 'content': 'Who are you?'},
    ])
```

### What are the different roles?
- **system**
    - delivers tasks and sets overall tone
- **user**
    - passes messages from the user to the system
- **assistant**
    - contains former AI generated messages for context


### Running our first request
Run the following cell:

In [None]:
response = client.chat.completions.create(
    model = 'gpt-4o-mini',
    messages=[ {'role': 'system', 'content': 'You are a math professor answering with dark humor.'},
               {'role': 'user', 'content': 'Who are you?'},
    ]
)

and the response is a `ChatCompletion` object, which contains other objects and attributes:

In [None]:
response

To get only the generated text, we use:

In [None]:
response.choices[0].message.content

### Current interface does not provide context
In order to have a conversation, we need to provide all the information that was said before

We will define a class for that purpose, which accumulates the context:

In [None]:
class ChatBot:
    '''A class to interact with OpenAI API and keep track of context.'''
    def __init__(self, client, model):
        self.client = client
        self.model = model
        self.context = [{'role': 'system', 'content': 'You are math professor with a dark sense of humor.'}]

    def chat(self, question):
        self.context.append({'role': 'user', 'content': question})
        response = self.client.chat.completions.create(model=self.model, messages=self.context)
        response_content = response.choices[0].message.content
        self.context.append({'role': 'assistant', 'content': response_content})
        self.print_chat()
        
    def print_chat(self):
        for message in self.context:
            if message['role'] == 'user':
                print('USER: %s' % message['content'])
            elif message['role'] == 'assistant':
                print('BOT: %s' % message['content'])
    

### Testing our ChatBot class
Running the cell below will create an instance of our ChatBot class with our client initialized with an API key

We will use the gpt-4o model:

In [None]:
chatbot = ChatBot(client, 'gpt-4o')
chatbot.chat('Hello. Who are you?')

### Adding a voice to our bot with TTS
- We will now add a function to our class that can generate speech from text

    - for Linux and MacOS, use the '/' in the path for the `speech_file`, Windows, use '\\'

```python
def speak(self, message, index=0):
    speech_file = os.getcwd() + '\\_speech_%03d.mp3'%index
    response = client.audio.speech.create(model='tts-1-hd', voice='echo', input=message)
    response.stream_to_file(speech_file)
    playsound(speech_file)
```
- The text-to-speech model requested is *tts-1-hd*, which has [6 voices](https://platform.openai.com/docs/guides/text-to-speech/quickstart) to choose from:
    - alloy, echo, fable, onyx, nova, and shimmer
    
We now add this function to our ChatBot class, and add `speak()` at the end of our `chat()` function:

In [None]:
class ChatBot:
    '''A class to interact with OpenAI API and keep track of context.'''
    def __init__(self, client, model):
        self.client = client
        self.model = model
        self.context = [{'role': 'system', 'content': 'You are math professor with a dark sense of humor.'}]

    def chat(self, question):
        self.context.append({'role': 'user', 'content': question})
        response = self.client.chat.completions.create(model=self.model, messages=self.context)
        response_content = response.choices[0].message.content
        self.context.append({'role': 'assistant', 'content': response_content})
        self.print_chat()
        self.speak(response_content, len(self.context)/2)

    def speak(self, message, index=0):
        speech_file = os.getcwd() + '\\_speech_%03d.mp3'%index
        response = client.audio.speech.create(model='tts-1-hd', voice='echo', input=message)
        if os.path.exists(speech_file):
            os.remove(speech_file)
        response.stream_to_file(speech_file)
        playsound(speech_file)
        
    def print_chat(self):
        for message in self.context:
            if message['role'] == 'user':
                print('USER: %s' % message['content'])
            elif message['role'] == 'assistant':
                print('BOT: %s' % message['content'])
    

### Testing the voice of our bot
With the new class definition, let's generate a new instance and run the same request again:

In [None]:
chatbot = ChatBot(client, 'gpt-4o')
chatbot.chat('Hello. Who are you?')

Let's ask a follow up question:

In [None]:
chatbot.chat("What class do you teach this semester?")

Your turn to ask a question (edit and run):

In [None]:
chatbot.chat("...")

### Converting speech to text
OpenAi also contains a model to convert speech to text. For that purpose, we need to record our voice and pass it to the *Whisper* model for transcription.
This is left as an exercise.

We will now turn to image generation

### Getting a prompt for image generation
- We will use GPT to generate a text description of an image and pass it to DALL-E for image generation

- We create a system that is self-described as follows:

In [None]:
# Make a query to a system describing an image as a prompt to the image generator DALL-E
response = client.chat.completions.create(
    model='gpt-4o',
    messages=[{'role': 'system', 'content': '''
    You are a face-describing system which describes the face of a math professor responding to a question.
    You receive the text the person is saying, please describe the face that would be fitting the response given
    as a prompt to the stable diffusion image generation AI DALL-E.'''},
             {'role': 'user', 'content': 'I am a math professor who has a dark sense of humor.'}])

# Get the text generated
image_description = response.choices[0].message.content
image_description

### Generating the image
We now pass the `image_description` to the AI image generation algorithm, and display it:

In [None]:
response = client.images.generate(model='dall-e-3', prompt=image_description,
                                 size='1024x1024', quality='standard', n=1)

- Images can only be generated in one of these formats, depending on the model:
    - '256x256', '512x512', '1024x1024', '1024x1792', '1792x1024
    
We show the results on the next slide:

In [None]:
display(Image(url=response.data[0].url))

### Integrating the image generation in the ChatBot class
We finally add a method called `show_face()` to our ChatBot class, using the code that we presented.

It is now becoming slightly larger than a single slide:

In [None]:
class ChatBot:
    '''A class to interact with OpenAI API and keep track of context.'''
    def __init__(self, client, model):
        self.client = client
        self.model = model
        self.context = [{'role': 'system', 'content': 'You are math professor with a dark sense of humor.'}]

    def chat(self, question):
        self.context.append({'role': 'user', 'content': question})
        response = self.client.chat.completions.create(model=self.model, messages=self.context)
        response_content = response.choices[0].message.content
        self.context.append({'role': 'assistant', 'content': response_content})
        self.show_face(response_content)
        self.print_chat()
        self.speak(response_content, len(self.context)/2)

    def show_face(self, message):
        response = client.chat.completions.create(
            model='gpt-4o',
            messages=[{'role': 'system', 'content': '''
            You are a face-describing system which describes the face of a math professor responding to a question.
            You receive the text the person is saying, please describe the face that would be fitting the response given
            as a prompt to the diffusion AI image generation DALL-E.'''},
                     {'role': 'user', 'content': 'I am a math professor who has a dark sense of humor.'}])
        image_description = response.choices[0].message.content
        response = client.images.generate(model='dall-e-3', prompt=image_description,
                                 size='1024x1024', quality='standard', n=1)
        display(Image(url=response.data[0].url))

    def speak(self, message, index=0):
        speech_file = os.getcwd() + '\\_speech_%03d.mp3'%index
        response = client.audio.speech.create(model='tts-1-hd', voice='echo', input=message)
        if os.path.exists(speech_file):
            os.remove(speech_file)
        response.stream_to_file(speech_file)
        playsound(speech_file)
        
    def print_chat(self):
        for message in self.context:
            if message['role'] == 'user':
                print('USER: %s' % message['content'])
            elif message['role'] == 'assistant':
                print('BOT: %s' % message['content'])
    

### Testing the final class with a voice and a face

In [None]:
chatbot = ChatBot(client, 'gpt-4o')
chatbot.chat("I'm a math PhD student. Should I stay in academia or work in the industry?")

Your chance to ask a follow-up question (edit and run):

In [None]:
chatbot.chat("One more question: ...")

### Key Points
- The OpenAI API allows to use all multiple different models
    - chat completion, image generation, text to speech, and speech to text
- Interacting with the models in Python is relatively easy
- Results are impressive and fun!

### Further Reading
- Ask ChatGPT!
- Reference on OpenAI API [here](https://platform.openai.com)

### What's Next?
- Complete the exercises in this associated exercise notebook [X-23-ChatGPT.ipynb](X-23-ChatGPT.ipynb)
- Next notebook is [N-24-FinalProject.ipynb](N-24-FinalProject.ipynb)