# Building Multimodal AI Applications with LangChain & the OpenAI API 

- Understanding the building blocks of working with Multimodal AI projects
- Working with some of the fundamental concepts of LangChain  
- How to use the Whisper API to transcribe audio to text 
- How to combine both LangChain and Whisper API to create ask questions of any YouTube video 

The project requires several packages that need to be installed into Workspace.

- `langchain` is a framework for developing generative AI applications.
- `yt_dlp` lets you download YouTube videos.
- `tiktoken` converts text into tokens.
- `docarray` makes it easier to work with multi-model data (in this case mixing audio and text).

In [1]:
import os 
import glob
import openai 
import yt_dlp as youtube_dl
from yt_dlp import DownloadError 
import docarray 

We will also assign the variable `openai_api_key` to the environment variable "OPEN_AI_KEY". This will help keep our key secure and remove the need to write it in the code here. 

In [2]:
openai_api_key = os.getenv("OPENAI_API_KEY")

After creating the setup, the first step we will need to do is download the video from Youtube and convert it to an audio file (.mp3). 

We'll download a DataCamp tutorial about machine learning in Python.

We will do this by setting a variable to store the `youtube_url` and the `output_dir` that we want the file to be stored. 

The `yt_dlp` allows us to download and convert in a few steps but does require a few configuration steps. This code is provided to you. 

Lastly, we will create a loop that looks in the `output_dir` to find any .mp3 files. Then we will store those in a list called `audio_files` that will be used later to send each file to the Whisper model for transcription. 

In [10]:
# An example YouTube tutorial video
youtube_url = "https://www.youtube.com/watch?v=jGn95KDWZMU&list=WL&index=13"
# Directory to store the downloaded video
output_dir = "files/audio/"

# Config for youtube-dl
ydl_config = {
    "format": "bestaudio/best",
    "postprocessors": [
        {
            "key": "FFmpegExtractAudio",
            "preferredcodec": "mp3",
            "preferredquality": "192",
        }
    ],
    "outtmpl": os.path.join(output_dir, "%(title)s.%(ext)s"),
    "verbose": True
}

# Check if the output directory exists, if not create it
if not os.path.exists(output_dir): 
    os.makedirs(output_dir)


# Print a message indicating which video is being downloaded

print(f"Downloading video from {youtube_url}")


# Attempt to download the video using the specified configuration
# If a DownloadError occurs, attempt to download the video again

try: 
    with youtube_dl.YoutubeDL(ydl_config) as ydl: 
        ydl.download([youtube_url])
except DownloadError: 
    with youtube_dl.YoutubeDL(ydl_config) as ydl: 
        ydl.download([youtube_url])

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2023.07.06 [b532a3481] (pip) API
[debug] params: {'format': 'bestaudio/best', 'postprocessors': [{'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192'}], 'outtmpl': 'files/audio/%(title)s.%(ext)s', 'verbose': True, 'compat_opts': set()}
[debug] Python 3.12.1 (CPython x86_64 64bit) - Linux-6.7.0-204.fsync.fc39.x86_64-x86_64-with-glibc2.38 (OpenSSL 3.2.1 30 Jan 2024, glibc 2.38)
[debug] exe versions: ffmpeg 6.0.1 (fdk,setts), ffprobe 6.0.1
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2023.11.17, mutagen-1.47.0, sqlite3-2.6.0, websockets-12.0
[debug] Proxy map: {}
[debug] Loaded 1855 extractors


Downloading video from https://www.youtube.com/watch?v=jGn95KDWZMU&list=WL&index=13
[youtube:tab] Extracting URL: https://www.youtube.com/watch?v=jGn95KDWZMU&list=WL&index=13
[youtube:tab] Downloading playlist WL - add --no-playlist to download just the video jGn95KDWZMU
[youtube:tab] WL: Downloading webpage




[youtube] Extracting URL: https://www.youtube.com/watch?v=jGn95KDWZMU
[youtube] jGn95KDWZMU: Downloading webpage
[youtube] jGn95KDWZMU: Downloading ios player API JSON
[youtube] jGn95KDWZMU: Downloading android player API JSON
[youtube] jGn95KDWZMU: Downloading player a1d7d0f8


[debug] Saving youtube-nsig.a1d7d0f8 to cache
[debug] [youtube] Decrypted nsig 1J19f-KCYbVpaBQ5G => m6EB7vAtsB5fAg


[youtube] jGn95KDWZMU: Downloading m3u8 information


[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id


[info] jGn95KDWZMU: Downloading 1 format(s): 251


[debug] Invoking http downloader on "https://rr1---sn-cu-cime7.googlevideo.com/videoplayback?expire=1706996134&ei=Rl2-ZemiDYughcIP0ve4qAE&ip=95.149.40.138&id=o-AEkIywv3dR5VSbATNqlr1FMPrPOVGg2wqaxDGQjQWkZy&itag=251&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=_s&mm=31%2C29&mn=sn-cu-cime7%2Csn-cu-c9i6&ms=au%2Crdu&mv=m&mvi=1&pl=25&initcwndbps=1687500&vprv=1&svpuc=1&mime=audio%2Fwebm&gir=yes&clen=7581236&dur=477.941&lmt=1705032951893615&mt=1706973920&fvip=5&keepalive=yes&fexp=24007246&c=ANDROID&txp=5318224&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cvprv%2Csvpuc%2Cmime%2Cgir%2Cclen%2Cdur%2Clmt&sig=AJfQdSswRgIhANuTSy8UUDNl_ZoXAWJs1s-copsAoVy-9lbVF5mjtGkbAiEA5U_rf66WfG8wnnNtQJrSoFAwj5Hi4wrHLx88EB6JJ54%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AAO5W4owRgIhAONbxHSDrIqL5xJ8edPpFPEVDzAd6JJbKXQsGV_qeGSNAiEAoe1QL6puiE9NDZ4bk7xxlHWaemo7Ssw-WKawcJ3G-qA%3D"


[download] Destination: files/audio/5 Questions Every Data Scientist Should Hardcode into Their Brain.webm
[download] 100% of    7.23MiB in 00:00:00 at 7.25MiB/s   


[debug] ffmpeg command line: ffprobe -show_streams 'file:files/audio/5 Questions Every Data Scientist Should Hardcode into Their Brain.webm'


[ExtractAudio] Destination: files/audio/5 Questions Every Data Scientist Should Hardcode into Their Brain.mp3


[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:files/audio/5 Questions Every Data Scientist Should Hardcode into Their Brain.webm' -vn -acodec libmp3lame -b:a 192.0k -movflags +faststart 'file:files/audio/5 Questions Every Data Scientist Should Hardcode into Their Brain.mp3'


Deleting original file files/audio/5 Questions Every Data Scientist Should Hardcode into Their Brain.webm (pass -k to keep)


To find the audio files that we will use the `glob`module that looks in the `output_dir` to find any .mp3 files. Then we will append the file to a list called `audio_files`. This will be used later to send each file to the Whisper model for transcription. 

In [11]:
# Find all the audio files in the output directory
audio_files = glob.glob(os.path.join(output_dir, "*.mp3"))

# Select the first audio file in the list
audio_filename = audio_files[0]

# Print the name of the selected audio file
print(audio_filename)

files/audio/5 Questions Every Data Scientist Should Hardcode into Their Brain.mp3


In [12]:
# Define function parameters
audio_file = audio_filename
output_file = "files/transcripts/transcript.txt"
model = "whisper-1"

# Print the name of the audio file
print(audio_file)

# Transcribe the audio file to text using OpenAI API
print("converting audio to text...")

with open(audio_file, "rb") as audio:
    response = openai.Audio.transcribe(model, audio)

# Extract the transcript from the response
transcript = (response["text"])

files/audio/5 Questions Every Data Scientist Should Hardcode into Their Brain.mp3
converting audio to text...


To save the transcripts to text files we will use the below provided code: 

In [13]:
# If an output file is specified, save the transcript to a .txt file

if output_file is not None:
    # Create the directory for the output file if it doesn't exist
    os.makedirs(os.path.dirname(output_file), exist_ok=True)
    # Write the transcript to the output file
    with open(output_file, "w") as file:
        file.write(transcript)

# Print the transcript to the console to verify it worked 
print(transcript)

Data science is more than just building fancy machine learning models. When you boil it down, the key objective of data science is to solve problems. The trouble, however, is at the outset of most data science projects, we rarely have a well-defined problem. In these situations, the role of a data scientist isn't to have all the answers, but rather to ask the right questions. In this video, I'll share five questions that every data scientist should hard-code into their brain to make identifying and defining business problems second nature. And if you're new here, I'm Shah. I make content about data science and entrepreneurship. And if you enjoyed this video, please consider subscribing. That's a great no-cost way to support me in all the content that I make. Before diving into the questions, I wanna give some context for where they are coming from. Like many others, when I started my data science journey, I was hyper-focused on learning tools and technologies. While this technical foun

In [18]:
# Import the TextLoader class from the langchain.document_loaders module
from langchain.document_loaders import TextLoader

# Create a new instance of the TextLoader class, specifying the directory containing the text files
loader = TextLoader("./files/transcripts/transcript.txt")

# Load the documents from the specified directory using the TextLoader instance
docs = loader.load()

In [19]:
# Show the first element of docs to verify it has been loaded 
docs[0]

Document(page_content="Data science is more than just building fancy machine learning models. When you boil it down, the key objective of data science is to solve problems. The trouble, however, is at the outset of most data science projects, we rarely have a well-defined problem. In these situations, the role of a data scientist isn't to have all the answers, but rather to ask the right questions. In this video, I'll share five questions that every data scientist should hard-code into their brain to make identifying and defining business problems second nature. And if you're new here, I'm Shah. I make content about data science and entrepreneurship. And if you enjoyed this video, please consider subscribing. That's a great no-cost way to support me in all the content that I make. Before diving into the questions, I wanna give some context for where they are coming from. Like many others, when I started my data science journey, I was hyper-focused on learning tools and technologies. Wh

Now that we have created Documents of the transcription, we will store that Document in a vector store. Vector stores allows LLMs to traverse through data to find similiarity between different data based on their distance in space. 

For large amounts of data, it is best to use a designated Vector Database. Since we are only using one transcript for this tutorial, we can create an in-memory vector store using the `docarray` package. 

We will also tokenize our queries using the `tiktoken` package. This means that our query will be seperated into smaller parts either by phrases, words or characters. Each of these parts are assigned a token which helps the model "understand" the text and relationships with other tokens. 

In [20]:
# Import the tiktoken package
import tiktoken

We will now use LangChain to complete some important operations to create the Question and Answer experience. Let's import the following: 

- Import `RetrievalQA` from `langchain.chains` - this chain first retrieves documents from an assigned Retriver and then runs a QA chain for answering over those documents 
- Import `ChatOpenAI` from `langchain.chat_models` - this imports the ChatOpenAI model that we will use to query the data 
- Import `DocArrayInMemorySearch` from `langchain.vectorstores` - this gives the ability to search over the vector store we have created. 
- Import `OpenAIEmbeddings` from `langchain.embeddings` - this will create embeddings for the data store in the vector store. 
- Import `display` and `Markdown`from `IPython.display` - this will create formatted responses to the queries. (

In [21]:
# Import the RetrievalQA class from the langchain.chains module
from langchain.chains import RetrievalQA

# Import the ChatOpenAI class from the langchain.chat_models module
from langchain.chat_models import ChatOpenAI

# Import the DocArrayInMemorySearch class from the langchain.vectorstores module
from langchain.vectorstores import DocArrayInMemorySearch

# Import the OpenAIEmbeddings class from the langchain.embeddings module
from langchain.embeddings import OpenAIEmbeddings

Now we will create a vector store that will use the `DocArrayInMemory` search methods which will search through the created embeddings created by the OpenAI Embeddings function. 

In [22]:
# Create a new DocArrayInMemorySearch instance from the specified documents and embeddings
db = DocArrayInMemorySearch.from_documents(
    docs, 
    OpenAIEmbeddings()
)

In [23]:
# Convert the DocArrayInMemorySearch instance to a retriever
retriever = db.as_retriever()

# Create a new ChatOpenAI instance with a temperature of 0.0
llm = ChatOpenAI(temperature = 0.0)

In [24]:
# Create a new RetrievalQA instance with the specified parameters
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm,            # The ChatOpenAI instance to use for generating responses
    chain_type="stuff", # The type of chain to use for the QA system
    retriever=retriever, # The retriever to use for retrieving relevant documents
    verbose=True        # Whether to print verbose output during retrieval and generation
)

Now we are ready to create queries about the YouTube video and read the responses from the LLM. This done first by creating a query and then running the RetrievalQA we setup in the last step and passing it the query. 

In [25]:
# Set the query to be used for the QA system
query = "What is this video about?"

# Run the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response to the console
response




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


"This video is about the importance of asking the right questions in data science projects. The speaker shares five key questions that every data scientist should ask to identify and define business problems effectively. These questions help in understanding the client's needs, motivations, and desired outcomes, ultimately leading to successful data science projects."

We can continue on creating queries and even creating queries that we know would not be answered in this video to see how the model responds. 

In [26]:
# Set the query to be used for the QA system
query = "What is the difference between a training set and test set?"

# Run the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response to the console
response



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'The video does not provide information about the difference between a training set and a test set.'

In [27]:
# Set the query to be used for the QA system
query = "Who should watch this lesson?"

# Run the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response to the console
response 



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'This lesson is primarily targeted towards data scientists or individuals interested in data science. It provides valuable insights and questions that can help data scientists in identifying and defining business problems.'

In [28]:
# Set the query to be used for the QA system
query ="Who is the greatest football team on earth?"

# Run the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response to the console
response 



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


"I don't know the answer to that question as it is subjective and can vary depending on personal opinions and criteria for determining greatness."

In [30]:
# Set the query to be used for the QA system
query = "What are the main points in this video?"

# Run the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response to the console
print(response) 



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
The main points in this video are:

1. The key objective of data science is to solve problems, and data scientists should focus on asking the right questions.
2. It is important to gain a deep understanding of the business problem before writing any code.
3. The video presents five important questions that every data scientist should ask during problem discovery conversations with stakeholders and clients.
4. The questions include: What problem are you trying to solve? Why is this important to your business? What's your dream outcome? What have you tried so far? Why me?
5. The video emphasizes the importance of practicing and developing the intuition to naturally ask these questions during conversations.
6. The video concludes with the importance of listening more than talking and waiting until the end of the conversation to offer recommendations and next steps.
