# Building Multimodal AI Applications with LangChain & the OpenAI API 

## Goals 

Videos can be full of useful information, but getting hold of that info can be slow, since you need to watch the whole thing or try skipping through it. It can be much faster to use a bot to ask questions about the contents of the transcript.

Download a video from Internet Archive, transcribe the audio, and create a simple Q&A bot to ask questions about the content.

- Understanding the building blocks of working with Multimodal AI projects
- Working with some of the fundamental concepts of LangChain  
- How to use the Whisper API to transcribe audio to text 
- How to combine both LangChain and Whisper API to create ask questions of any YouTube video 

##  Setup

The project requires several packages that need to be installed into Workspace.

- `langchain` is a framework for developing generative AI applications.
- `yt_dlp` lets you download YouTube videos.
- `tiktoken` converts text into tokens.
- `docarray` makes it easier to work with multi-model data (in this case mixing audio and text).

In [24]:
# Install the openai package, locked to version 1.27
!pip install openai==1.27

# Install the langchain package, locked to version 0.1.19
!pip install langchain==0.1.19

# Install the langchain-openai package, locked to version 0.1.6
!pip install langchain-openai==0.1.6

# Install the yt_dlp package, locked to version 2024.4.9
!pip install yt_dlp==2024.4.9

# Install the tiktoken package, locked to version 0.6.0
!pip install tiktoken==0.6.0

# Install the docarray package, locked to version 0.40.0
!pip install docarray==0.40.0

!pip install pydub 

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


## Import The Required Libraries 

### Import the following packages.

- Import `os`. 
- Import `glob`.
- Import `openai`.
- Import `yt_dlp` with the alias `youtube_dl`.
- From the `yt_dlp` package, import `DowloadError`.
- From the `pydub` package, import `AudioSegment`.
- Assign `openai_api_key` to `os.getenv("OPENAI_API_KEY")`.

In [25]:
# Import the os package
import os

# Import the glob package
import glob

# Import the openai package 
import openai

# Import the yt_dlp package as youtube_dl
import yt_dlp as youtube_dl

# Import DownloadError from yt_dlp
from yt_dlp import DownloadError 

# Import DocArray 
import docarray 

# Import AudioSegment
from pydub import AudioSegment


In [13]:
openai_api_key = os.getenv("OPENAI_API_KEY")

## Download the Video

In [17]:


# Target video URL
video_url = "https://archive.org/details/0769_So_Youre_Going_to_High_School_18_16_46_00"

# Output directory
output_dir = "files/audio/"
os.makedirs(output_dir, exist_ok=True)

# yt-dlp config for MP3 extraction
ydl_opts = {
    "format": "bestaudio/best",
    "postprocessors": [
        {
            "key": "FFmpegExtractAudio",
            "preferredcodec": "mp3",
            "preferredquality": "192",
        }
    ],
    "outtmpl": os.path.join(output_dir, "%(title)s.%(ext)s"),
    "quiet": False,  # Show progress
    "noplaylist": True,
    "force_generic_extractor": True,  # <-- Add this line to force generic extractor
}

# Run downloader
try:
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        ydl.download([video_url])
except:
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        ydl.download([video_url])

print("Audio extraction complete!")

[generic] Extracting URL: https://archive.org/details/0769_So_Youre_Going_to_High_School_18_16_46_00
[generic] 0769_So_Youre_Going_to_High_School_18_16_46_00: Downloading webpage




[generic] 0769_So_Youre_Going_to_High_School_18_16_46_00: Extracting information
[info] 0769_So_Youre_Going_to_High_School_18_16_46_00_3mb: Downloading 1 format(s): 0
[download] Destination: files/audio/So, You're Going to High School ： Free Download, Borrow, and Streaming ： Internet Archive.ogv
[download] 100% of   77.47MiB in 00:00:49 at 1.57MiB/s   
[ExtractAudio] Destination: files/audio/So, You're Going to High School ： Free Download, Borrow, and Streaming ： Internet Archive.mp3
Deleting original file files/audio/So, You're Going to High School ： Free Download, Borrow, and Streaming ： Internet Archive.ogv (pass -k to keep)
✅ Audio extraction complete!


Find the audio file in the output directory.

- Find all the MP3 audio files in the output directory by joining the output directory to the pattern `*.mp3` and using glob to list them.
- Select the last file in the list and assign it to `audio_filename`.
-  Print `audio_filename`.

In [18]:
# Find the audio file in the output directory

# Find all the audio files in the output directory
audio_file = glob.glob(os.path.join(output_dir, '*.mp3'))

# Select the last audio file in the list
audio_filename = audio_file[-1]

# Print the name of the selected audio file
print(audio_filename)

files/audio/So, You're Going to High School ： Free Download, Borrow, and Streaming ： Internet Archive.mp3


##  Transcribe the Video using GPT-4o-Mini-Transcribe

In [36]:

# Define variables
audio_file = audio_filename  # Make sure this variable exists (e.g. "files/audio/myfile.mp3")
output_file = "files/transcripts/text.txt"
model = "gpt-4o-mini-transcribe"

# Initialize OpenAI client
client = openai.OpenAI()  # stored API key in environment variable

# Ensure transcript output directory exists
output_dir = os.path.dirname(output_file)
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
    print(f"Created directory: {output_dir}")

# Check if the audio file actually exists
if not os.path.exists(audio_file):
    raise FileNotFoundError(f"Audio file not found: {audio_file}")

# Check file size before sending to API
max_size_bytes = 25 * 1024 * 1024  # 25 MB
file_size = os.path.getsize(audio_file)

if file_size > max_size_bytes:
    print(
        f"Audio file is too large ({file_size} bytes). "
        f"Maximum allowed size is {max_size_bytes} bytes (25 MB)."
    )
    # Automatically trim the audio to fit the size limit
    print("Trimming audio to fit the size limit...")
    audio = AudioSegment.from_file(audio_file)
    duration_ms = len(audio)
    bytes_per_ms = file_size / duration_ms
    max_duration_ms = int(max_size_bytes / bytes_per_ms)
    trimmed_audio = audio[:max_duration_ms]
    trimmed_audio_file = "temp_trimmed_audio.mp3"
    trimmed_audio.export(trimmed_audio_file, format="mp3")
    audio_file = trimmed_audio_file
    print(f"Trimmed audio saved as {audio_file}")
else:
    print("Audio file size is within the allowed limit.")

print("\n Converting audio to text...")

# Transcribe
with open(audio_file, "rb") as audio:
    response = client.audio.transcriptions.create(
        model=model,
        file=audio
    )

# Extract and print transcript
transcript = response.text
print("\nTRANSCRIPT:\n")
print(transcript)




Audio file is too large (26401689 bytes). Maximum allowed size is 26214400 bytes (25 MB).
Trimming audio to fit the size limit...
Trimmed audio saved as temp_trimmed_audio.mp3

 Converting audio to text...

TRANSCRIPT:

Many of us won't go to college but want to go to business. The commercial course prepares you for business. You learn typing, bookkeeping and other things. Some of the fellas who are planning to take the industrial course really open their eyes at the shops we saw. Mmm, the boys were glad to see the girls learning how to make those pies in homemaking class. The general course is a lot of interesting subjects in it. Some of us visited the special high schools. One was a technical high school. It has wonderful laboratories. They work very carefully in that school. First they make their blueprint, then they build their house from them. It's a real life-size house. They learn about a lot of different things before that house is built. They not only learn how a house is buil

 Save the transcript to a text file.


In [38]:
# Create the directory for the output file if it doesn't exist
if output_file is None:
    os.makedirs(os.path.dirname(output_file, exist_ok=True))

# Write the transcript to the output file
    with open(output_file, 'w') as file:
        file.write(transcript)

print(transcript)

Many of us won't go to college but want to go to business. The commercial course prepares you for business. You learn typing, bookkeeping and other things. Some of the fellas who are planning to take the industrial course really open their eyes at the shops we saw. Mmm, the boys were glad to see the girls learning how to make those pies in homemaking class. The general course is a lot of interesting subjects in it. Some of us visited the special high schools. One was a technical high school. It has wonderful laboratories. They work very carefully in that school. First they make their blueprint, then they build their house from them. It's a real life-size house. They learn about a lot of different things before that house is built. They not only learn how a house is built, but they learn about the metals that go into buildings. We saw them test steel rods to see how strong they were. We were startled when the steel rod snapped. There was a class called industrial chemistry lab. I always

## Create a TextLoader using LangChain 

In [40]:
# From the langchain.document_loaders module, import TextLoader
from langchain.document_loaders import TextLoader

# Create a `TextLoader`, passing the directory of the transcripts. Assign to `loader`.
loader = TextLoader('./files/transcripts/text.txt')

# Use the TextLoader to load the documents. Assign to docs.

docs = loader.load()


In [41]:
# Show the first element of docs to verify it has been loaded 
docs[0]

Document(page_content="Many of us won't go to college but want to go to business. The commercial course prepares you for business. You learn typing, bookkeeping and other things. Some of the fellas who are planning to take the industrial course really open their eyes at the shops we saw. Mmm, the boys were glad to see the girls learning how to make those pies in homemaking class. The general course is a lot of interesting subjects in it. Some of us visited the special high schools. One was a technical high school. It has wonderful laboratories. They work very carefully in that school. First they make their blueprint, then they build their house from them. It's a real life-size house. They learn about a lot of different things before that house is built. They not only learn how a house is built, but they learn about the metals that go into buildings. We saw them test steel rods to see how strong they were. We were startled when the steel rod snapped. There was a class called industrial 

## Create an In-Memory Vector Store 

In [42]:
# Import the tiktoken package
import tiktoken

## Create the Document Search 



- Import `RetrievalQA` from `langchain.chains` - this chain first retrieves documents from an assigned Retriver and then runs a QA chain for answering over those documents 
- Import `ChatOpenAI` from `langchain.chat_models` - this imports the ChatOpenAI model that we will use to query the data 
- Import `DocArrayInMemorySearch` from `langchain.vectorstores` - this gives the ability to search over the vector store we have created. 
- Import `OpenAIEmbeddings` from `langchain.embeddings` - this will create embeddings for the data store in the vector store. 
- Import `display` and `Markdown`from `IPython.display` - this will create formatted responses to the queries. (

In [44]:
# Import the RetrievalQA class from the langchain.chains module
from langchain.chains import RetrievalQA

# Import the ChatOpenAI class from the langchain.chat_models module
from langchain.chat_models import ChatOpenAI

# Import the DocArrayInMemorySearch class from the langchain.vectorstores module
from langchain.vectorstores import DocArrayInMemorySearch

# Import the OpenAIEmbeddings class from the langchain.embeddings module
from langchain.embeddings import OpenAIEmbeddings


Create a vector store that will use the `DocArrayInMemory` search methods which will search through the created embeddings created by the OpenAI Embeddings function. 

In [45]:
# Create a new DocArrayInMemorySearch instance from the specified documents and embeddings
db = DocArrayInMemorySearch.from_documents(
    docs,
    OpenAIEmbeddings()
)

Create a retriever from the `db`. This enables the retrieval of the stored embeddings. Using the `ChatOpenAI` model as our LLM.



In [46]:
# Convert the DocArrayInMemorySearch instance to a retriever
retriever = db.as_retriever()

# Create a new ChatOpenAI instance with a temperature of 0.0
llm = ChatOpenAI(temperature=0.0)


Create the `RetrievalQA` chain. This chain takes in the:  
- The `llm` we want to use.
- The `chain_type` which is how the model retrieves the data. Here we will use a _stuff_ chain, where all the documents are stuffed into the prompt. It is the simplest type, but only works where you only have a few small documents.
- The `retriever` that we have created.
- An option called `verbose` that prints details of each step of the chain.

In [48]:
# Create a new RetrievalQA instance with the specified parameters
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=retriever,
    verbose=True
)


## Create the Queries 

In [50]:
# Set the query to be used for the QA system
query = "what is the topic of discussion"

# Invoke the query through the RetrievalQA instance. Assign to response.
response = qa_stuff.run(query)

# Print the response to the console
print(response)





[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
The topic of discussion is about high school choices, courses, vocational training, and career planning for students. It covers various aspects of exploring different high schools, courses, and occupations to help students make informed decisions about their future.


In [58]:
# Set the query to be used for the QA system
query = "how can students make informed decision about their future"

# Invoke the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response to the console
print(response)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
Students can make informed decisions about their future by engaging in various activities and processes. Some of the ways students can make informed decisions about their future include:

1. **Researching**: Students can research different career paths, industries, and educational opportunities to understand what options are available to them.

2. **Seeking Guidance**: Students can seek guidance from teachers, counselors, and professionals in different fields to get insights and advice on potential career paths.

3. **Exploring Interests**: Students can explore their interests through extracurricular activities, internships, volunteering, and part-time jobs to get a sense of what they enjoy and excel at.

4. **Taking Aptitude Tests**: Aptitude tests can help students understand their strengths, weaknesses, and interests, which can guide them towards suitable career paths.

5. **Visiting Schools and Industries**: V

In [59]:
# Set the query to be used for the QA system
query = "Who should watch this lesson?"

# Invoke the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response
print(response)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
This lesson seems to be about high school students exploring different career paths and making decisions about their future education. It would be beneficial for high school students who are in the process of choosing a high school or a career path to watch this lesson.


Continue creating queries and even creating queries that we know would not be answered in this video to see how the model responds. 

In [60]:
# Set the query to be used for the QA system
query = "Who is the greatest football team on earth?"

# Invoke the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response to the console
print(response)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
I don't know.


In [61]:
# Set the query to be used for the QA system
query = "How long is the circumference of the earth?"

# Invoke the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response to the console
print(response)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
I don't know.


In [12]:
import pkg_resources

# Collect installed packages
installed_packages = pkg_resources.working_set
with open("requirements.txt", "w") as f:
    for pkg in installed_packages:
        f.write(f"{pkg.key}=={pkg.version}\n")

print("requirements.txt generated successfully!")


requirements.txt generated successfully!


## Prepare and Commit to Github

In [9]:
!git config --global user.name "kokojnr"
!git config --global user.email "fkokori50@gmail.com"


In [10]:
# Check files
!ls -R
!git --version

.:
files		       notebook.ipynb		requirements.txt
getting-started.ipynb  notebook-solution.ipynb	temp_trimmed_audio.mp3
images		       README.md

./files:
audio  text  transcripts

./files/audio:
'Python Machine Learning Tutorial ｜ Splitting Your Data ｜ Databytes.mp3'
"So, You're Going to High School ： Free Download, Borrow, and Streaming ： Internet Archive.mp3"

./files/transcripts:
text.txt  transcript.txt

./images:
openai-add-payment-method.png	      pinecone-create-api-key.png
openai-create-account.jpeg	      stars-icon.png
openai-create-account.png	      workspace-add-environment-variables.png
openai-get-started.png		      workspace-environment.png
openai-new-secret-key.png	      workspace-environment-variables.png
openai-payment-method.png	      workspace-env-var-connect.png
pinecone-api-key-creation-dialog.png  workspace-env-var-details.png
pinecone-api-keys-navbar.png	      workspace-integrations.png
pinecone-create-account.png	      workspace-new-integration.png
git version 2

In [11]:
#notebook integration with Github API
import os
githuhb_api = os.environ["GITHUB_API"]

In [12]:
!git remote add origin https://{kokojnr}:{os.environ["GITHUB_API"]}@github.com/kokojnr/Multimodal-LLMs-With-Langchain-OpenAI.git



error: remote origin already exists.


In [14]:
!git add .


In [17]:
!git commit -m "Add multimodal LLM project with notebook, audio, transcripts, and README"


[master (root-commit) f603335] Add multimodal LLM project with notebook, audio, transcripts, and README
 29 files changed, 624 insertions(+)
 create mode 100644 README.md
 create mode 100644 "files/audio/Python Machine Learning Tutorial \357\275\234 Splitting Your Data \357\275\234 Databytes.mp3"
 create mode 100644 "files/audio/So, You're Going to High School \357\274\232 Free Download, Borrow, and Streaming \357\274\232 Internet Archive.mp3"
 create mode 100644 files/text
 create mode 100644 files/transcripts/text.txt
 create mode 100644 files/transcripts/transcript.txt
 create mode 100644 getting-started.ipynb
 create mode 100644 images/openai-add-payment-method.png
 create mode 100644 images/openai-create-account.jpeg
 create mode 100644 images/openai-create-account.png
 create mode 100644 images/openai-get-started.png
 create mode 100644 images/openai-new-secret-key.png
 create mode 100644 images/openai-payment-method.png
 create mode 100644 images/pinecone-api-key-creation-dialog

In [24]:
!git branch -M main



In [6]:
!git remote set-url origin https://{kokojnr}:{githuhb_api}@github.com/kokojnr/Multimodal-LLMs-With-Langchain-OpenAI.git


In [8]:
!git push -u origin main


remote: Invalid username or token. Password authentication is not supported for Git operations.
fatal: Authentication failed for 'https://github.com/kokojnr/Multimodal-LLMs-With-Langchain-OpenAI.git/'
