# Building Multimodel AI application with LangChain and OpenAI API
## Content
1. Goals
2. Setting up
3. Download the Youtube Video
4. Transcribe the video using Whisper
5. Creating an In-Memory Vector Store
6. Create the Document Search

# 1. Goals

- Phiên âm nội dung video Youtube bằng AI, chuyển giọng nói thành văn bản dạng Whisper
- Sau đó sử dụng GPT để hỏi về nội dung video
- Thực hành với thư viện LangChain

# 2. Setting up

In [None]:
%pip install langchain yt_dlp tiktoken docarray

In [None]:
import os
import glob

import openai
import yt_dlp as youtube_dl
from yt_dlp import DownloadError
import docarray

In [None]:
openai_api_key = os.getenv("OPENAI_API_KEY")

# 3. Download the Youtube Video

- Sau khi setup, chúng ta tải về một video youtube và convert nó sang một file audio (.mp3)


In [None]:
youtube_url = "https://www.youtube.com/watch?v=M53H-zwHNxs"
output_dir = "./files_audio"

# config for youtube-dl
ydl_config = {
    "format": "bestaudio/best",
    "postprocessors": [{
        "key": "FFmpegExtractAudio",
        "preferredcodec": "mp3",
        "preferredquality": "192",
    }],
    "outtmlp": os.path.join(output_dir, "%(title)s.%(ext)s"),
    "verbose": True,
}

# check if the output directory exists, if not => create
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

print(f"Downloading video from {youtube_url}")

# download video from youtube, if download error occurs then attempt again
try:
    with youtube_dl.YoutubeDL(ydl_config) as ydl:
        ydl.download([youtube_url])
except DownloadError as e:
    # with youtube_dl.YoutubeDL(ydl_config) as ydl:
    #     ydl.download([youtube_url])
    print(e)


In [None]:
audio_file = glob.glob(os.path.join(output_dir, "*.mp3"))
audio_file_name = audio_file[0]
print(audio_file_name)

# 4. Transcribe the video using Whisper

In [None]:
audio_file = audio_file_name
output_file = "files/transcipts/transcripts.txt"
model = "whisper-1"

# Transcribe the audio file to text using OpenAI API
print("Converting audio to text...")

with open(audio_file, "rb") as audio:
    response = openai.Audio.transcribe(model, audio)

# Extract the transcript from the response
transcript = (response['text'])

In [None]:
if output_file is not None:
    os.makedirs(os.path.dirname(output_file), exist_ok=True)
    with open(output_file, 'w') as file:
        file.write(transcript)

print(transcript)

In [None]:
from langchain.document_loaders import TextLoader

loader = TextLoader("./files/transcripts/transcript.txt")

docs = loader.load()

In [None]:
docs[0]

# 5. Creating an In-Memory Vector Store

In [None]:
import tiktoken

# 6. Create the Document Search

In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.embeddings import OpenAIEmbeddings

In [None]:
db = DocArrayInMemorySearch.from_documents(docs, OpenAIEmbeddings())

In [None]:
# convert the DocArrayInMemorySearch instance to a retriever
retriever = db.as_retriever()

# create a new ChatOpenAI instance with a temperature of 0.0
llm = ChatOpenAI(temperature=0.0)

In [None]:
# create a new RetrievalQA instance with the specified parameters
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True,
)

# 7. Create the Queries

To create the questions to ask the model complete the following steps:
- Create a variable cal query and assigned it a string value of "What is this tutorial about?"
- Create a response variable that will store the result of qa_stuff.run(query)
- Show the response

In [None]:
# Set the query to be used for QA system
query = "What is this tutorial about?"

# Run the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response
response

In [None]:
# Set the query to be used for QA system
query = "Any question you want to ask"

# Run the query through the RetrievalQA instance and store the response
response = qa_stuff.run(query)

# Print the response
response

All done!