# Lesson 1 Project: Introduction to Multimodal AI

## Introduction

Welcome to the first lesson on multimodal AI! While this course primarily focuses on images and speech, it also involves working with text. You should have already learned about text generation in previous courses and how to access the OpenAI text generation API. However, it never hurts to refresh your memory. By the end of this lesson, you will be able to:
- Access the OpenAI text generation API
- Ensure that the API responses are structured outputs

These skills will serve as the foundation for learning the multimodal AI system.

## Setting Up OpenAI Development Environment

In [6]:
# Install the libraries
!pip install openai pydantic python-dotenv

# Load the OpenAI library
from openai import OpenAI

# Set up relevant environment variables
# Make sure OPENAI_API_KEY=... exists in .env
from dotenv import load_dotenv

load_dotenv()

# Create the OpenAI connection object
client = OpenAI()



Could not find platform independent libraries <prefix>

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Making an API Request

To make a request to the OpenAI text generation API, you can use the following code:

In [3]:
# Make an API request
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are an English grammar checker."},
        {
            "role": "user",
            "content": "Check the grammar: 'Alice eat an apple every day.'"
        }
    ]
)

# Print the response
print(completion.choices[0].message)

ChatCompletionMessage(content="The correct sentence should be: 'Alice eats an apple every day.'", refusal=None, role='assistant', function_call=None, tool_calls=None)


## Structured Outputs

OpenAI introduced structured outputs, allowing you to enforce that the generated response from the API adheres to a JSON schema. This makes it easier to extract information without having to parse a raw string. To create an API request with structured outputs, use the following code:

In [5]:
# Make an API request
from pydantic import BaseModel

class GrammarChecking(BaseModel):
    wrong_sentence: str
    correct_sentence: str
    is_correct: bool

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are an English grammar checker."},
        {"role": "user", "content": "Check the grammar: 'Alice eat an apple every day.'"},
    ],
    response_format=GrammarChecking,
)

# Print the response
print(completion.choices[0].message)

ParsedChatCompletionMessage[GrammarChecking](content='{"wrong_sentence":"Alice eat an apple every day.","correct_sentence":"Alice eats an apple every day.","is_correct":false}', refusal=None, role='assistant', function_call=None, tool_calls=[], parsed=GrammarChecking(wrong_sentence='Alice eat an apple every day.', correct_sentence='Alice eats an apple every day.', is_correct=False))


## Conclusion

In this lesson, you’ve refreshed your knowledge of the OpenAI text generation API. You’ve learned how to:
- Set up the OpenAI client
- Make an API request
- Use structured outputs

Although this multimodal AI course focuses on working with images and audio, you’ll also need to work with text. Text is an essential component of multimodal AI.