Starting by loading the environment variables

In [25]:
import os

from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=5t1vTLU7s40"

Setting up the model and defining the LLM model that we'll use


In [7]:
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI(openai_api_key = OPENAI_API_KEY, model = "gpt-3.5-turbo")

Asking a random question to test if the model works

In [8]:
model.invoke("Who is the owner of OPEN AI?")

AIMessage(content='OpenAI is not owned by any individual or company in the traditional sense. It is a research organization that was founded as a non-profit in December 2015, later becoming a for-profit company in 2019. It is governed by its board of directors and leadership team, and its mission is to ensure that artificial general intelligence benefits all of humanity.', response_metadata={'token_usage': {'completion_tokens': 71, 'prompt_tokens': 15, 'total_tokens': 86}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-8979871e-7c70-4e6f-84c2-e4f6dfbc35b1-0', usage_metadata={'input_tokens': 15, 'output_tokens': 71, 'total_tokens': 86})

The result from the model is an AIMessage instance containing the answer. Using Langchain parsers, we can extract the answer by chaining the model with an output parser. 

In [9]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser
chain.invoke("Who is the owner of OPEN AI?") 

'OpenAI is owned by the OpenAI LP, which is structured as a non-profit organization. The founders of OpenAI include Elon Musk, Sam Altman, Greg Brockman, Ilya Sutskever, Wojciech Zaremba, and John Schulman.'

Initailising the Prompt template to add in the chain.

In [16]:
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below.
If you can't answer, reply "Can't find the answer in the video"

Context : {context}

Question : {question}
"""

prompt = ChatPromptTemplate.from_template(template)
#prompt.format(context = "Mary's sister is Susan", question = "Who is Mary's Sister?")


'Human: \nAnswer the question based on the context below.\nIf you can\'t answer, reply "Can\'t find the answer in the video"\n\nContext : Mary\'s sister is ROshan\n\nQuestion : Who is Mary\'s Sister?\n'

Adding the prompt in the Chain and testing if the model give the correct answer on a random context and question

In [23]:
chain = prompt | model | parser
chain.invoke({
    "context": "Mary's sister is Susan", 
    "question": "Who is Mary's Sister?"
})

'Susan'

Just to test the power of chains, we will create a new prompt which translates the output of the first chain to whichever language is specified

In [21]:
translation_prompt = ChatPromptTemplate.from_template(
    "Translate {answer} to {language}"
    )

creating a translation chain where we pass the answer from the original chain and pass language of translation in the new chain to see the translated answer

In [22]:
from operator import itemgetter

translation_chain = (
    {"answer": chain, "language": itemgetter("language")} | translation_prompt | model | parser
)

translation_chain.invoke(
    {
            "context": "Mary's sister is Susan", 
            "question": "Who is Mary's Sister?",
            "language": "Hindi"
    }

)

'मेरी बहन सुजान है।'

Back to the task at hand, Transcribing the YouTube Video using OpenAI's Whisper.


In [28]:
import tempfile
import whisper
from pytube import YouTube


# Let's do this only if we haven't created the transcription file yet.
if not os.path.exists("transcription.txt"):
    youtube = YouTube(YOUTUBE_VIDEO)
    audio = youtube.streams.filter(only_audio=True).first()

    # Let's load the base model. This is not the most accurate
    # model but it's fast.
    whisper_model = whisper.load_model("base")

    with tempfile.TemporaryDirectory() as tmpdir:
        file = audio.download(output_path=tmpdir)
        transcription = whisper_model.transcribe(file, fp16=False)["text"].strip()

        with open("transcription.txt", "w") as file:
            file.write(transcription)

In [29]:
with open("transcription.txt") as file:
    transcription = file.read()

transcription[:100]

'I see the danger of this concentration of power to proprietary AI systems as a much bigger danger th'

We use the entire trasncrption as context in the chain, and see that there's an error (Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens.)

Large Language Models support limitted context sizes. The video we are using is too long for the model to handle. To overcome this we will split the text into smaller chucks and when we invoke the model, only send relevant chunks to asnwer the question.

In [30]:
try: 
    chain.invoke({
        "context": transcription,
        "question": "What is Yann LeCun's stance on the potential dangers of AGI"
    })
except Exception as e: 
    print(e)

Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens. However, your messages resulted in 30770 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}


First we load the text

In [39]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("transcription.txt")
text_documents = loader.load()
text_documents


[Document(page_content="I see the danger of this concentration of power to proprietary AI systems as a much bigger danger than everything else. What works against this is people who think that for reasons of security, we should keep AI systems under lock and key because it's too dangerous to put it in the hands of everybody. That would lead to a very bad future in which all of our information diet is controlled by a small number of companies who proprietary systems. I believe that people are fundamentally good and so if AI, especially open source AI, can make them smarter and just empower the goodness in humans. So I share that feeling, okay, I think people are fundamentally good. And in fact a lot of domers are domers because they don't think that people are fundamentally good. The following is a conversation with Jan LeCoon. His third time on this podcast, he is the Chief AI scientist at Meta, Professor at NYU, Touring Award Winner, and one of the seminal figures in the history of ar

Splitting the transcription into chunks of 1000 characters with an overlap of 20 characters

In [45]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 20)
documents = text_splitter.split_documents(text_documents)

Loading the vector embeddings into pinecone, pinecone then uses similarity search to onlt return the most valid document

In [63]:
import os
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
os.environ['PINECONE_API_KEY'] = 'PINECONE_API_KEY'

index_name = "youtube-index"
embeddings = OpenAIEmbeddings()

pinecone = PineconeVectorStore.from_documents(
    documents, embeddings, index_name = index_name
)

We test the similarity_search on pinecone to see it works

In [64]:
pinecone.similarity_search("What is Yann LeCun's stance on the potential dangers of AGI (Artificial General Intelligence)")[:3]

[Document(page_content="intelligence. He and Meta AI have been big proponents of open sourcing AI development and have been walking the walk by open sourcing many of their biggest models, including Lama II and eventually Lama III. Also, Jan has been an outspoken critic of those people in the AI community who warned about the looming danger and existential threat of AGI. He believes the AGI will be created one day, but it will be good. It will not escape human control, nor will it dominate and kill all humans. At this moment of rapid AI development, this happens to be somewhat a controversial position. So it's been fun seeing Jan get into a lot of intense and fascinating discussions online as we do in this very conversation. This is the Lexington podcast that supported, please check out our sponsors in the description. And now dear friends, here's Jan LeCoon. You've had some strong statements, technical statements about the future of artificial intelligence recently. Throughout your car

We can create a map with the two inputs by using the RunnableParallel and RunnablePassthrough classes. This will allow us to pass the context and question to the prompt as a map with the keys "context" and "question."

In [69]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

setup = RunnableParallel(
    context = pinecone.as_retriever(), question = RunnablePassthrough()
    )

chain = setup | prompt | model | parser

In [72]:
chain.invoke("Who is Tanmay)")

"Can't find the answer in the video"