### Requirements:

In [1]:
# !pip install openai
# !pip install langchain
# !pip install langchain-openai
# !pip install langchain_pinecone
# !pip install langchain[docarray]
# !pip install docarray
# !pip install pydantic==1.10.8
# !pip install pytube 
# !pip install python-dotenv
# !pip install tiktoken 
# !pip install pinecone-client 
# !pip install scikit-learn
# !pip install ruff
#!pip install langchain-ollama
#!pip install langchain langchain-core -U
# !pip install -U openai-whisper
# !pip install --upgrade pytube
# !pip install yt-dlp
# !pip install langchain-community
#!pip install --upgrade  langchain-community "docarray"

Load enviroment variables and use any video link to load trascripts or captions.
This caption or transcript act as source for building out Vector DB

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=oVtlp72f9NQ"

In [4]:
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")

Load Models from Langchain

OPEN AI - gpt 3.5 model for testing

In [5]:
from langchain_openai.chat_models import ChatOpenAI
openai_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo")


In [3]:
openai_model.invoke("Can Retrieval Augmented Generation solve hallucination problems?")

AIMessage(content='Retrieval Augmented Generation (RAG) is a machine learning model that combines the benefits of retrieval-based and generation-based models. While RAG can improve the quality of generated text by incorporating information from external knowledge sources, it is not specifically designed to address hallucination problems.\n\nHallucination in the context of natural language processing refers to the generation of text that is not grounded in reality or is not supported by the input data. While RAG can improve the coherence and relevance of generated text by incorporating external knowledge during the generation process, it may still produce hallucinations if the external knowledge is inaccurate or misleading.\n\nTherefore, while RAG may help reduce hallucination to some extent by improving the quality of generated text, it is not a guaranteed solution to hallucination problems. Researchers are actively working on developing new techniques and models to address hallucinati

Ollama model - llama 3

#### Before running this we might need to run Ollama model (llama 3) in local, whereas OpenAI has API for the model

In [18]:
from langchain_ollama import ChatOllama

llama_model = ChatOllama(
    model="llama3",
    temperature=0.8,
    num_predict=256
)
    

In [None]:
llama_model.invoke("Can RAG solve hallucination problems?")

Output parsing only to get the string from AI Message object returned as output

In [6]:
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()

chain = openai_model | parser # this chains the output of model into the parser. Not the string retured by the parser is stored in chain
#equivalent:
# output = openai_model.run(input)  # Generates output from the model
# parsed_output = parser.parse(output)  # Processes output through the parser
chain.invoke("Can Retrieval Augmented Generation solve hallucination problems?")


'Retrieval Augmented Generation (RAG) is a natural language processing model that combines the benefits of retrieval-based and generation-based approaches to improve text generation tasks. While RAG can help improve the quality and coherence of generated text, it is unlikely to directly solve hallucination problems.\n\nHallucination in language generation refers to the generation of text that is not grounded in the input context or is factually incorrect. While RAG can improve the relevance of generated text by retrieving relevant information from a knowledge base, it may still struggle with generating text that is accurate and coherent.\n\nTo address hallucination problems in text generation, additional techniques such as fact checking, knowledge verification, and context-aware generation strategies may be necessary. While RAG can be a valuable tool in improving text generation, it is not a complete solution to hallucination problems.'

Create prompt template

In [7]:
from langchain.prompts import ChatPromptTemplate
template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
prompt.format(context="Lewis Hamilton is eight time formula 1 world champion", question="How many times has Lewis won Formula 1 championship?")


'Human: \nAnswer the question based on the context below. If you can\'t \nanswer the question, reply "I don\'t know".\n\nContext: Lewis Hamilton is eight time formula 1 world champion\n\nQuestion: How many times has Lewis won Formula 1 championship?\n'

In [8]:
type(prompt)

langchain_core.prompts.chat.ChatPromptTemplate

Now the prompt is expecting two paramters: contect and question. Now chain the prompt with two parameter, model and output into the parser.

In [9]:
# output of the prompt becomes input to the model, output of model becomes input to the parser, input of parser is stored in chain
chain = prompt | openai_model | parser
chain.invoke({
    "context": "Lewis Hamilton is eight time formula 1 world champion",
    "question": "How many times has Lewis won Formula 1 championship?"
})

'Lewis Hamilton has won the Formula 1 championship eight times.'

Combining another prompt action (translation) with existing prompt chain: 

(context, question)->model->parser->(parser_output, language)->model->parser

In [10]:
# new prompt chain
translation_prompt = ChatPromptTemplate.from_template(
    "Translate {answer} to {language}"
)
# here answer will be the output of the previous chain and language will be the language to translate to given as input to the new chain

In [11]:
from operator import itemgetter
translation_chain = (
    {"answer": chain, "language": itemgetter("language")} | translation_prompt | openai_model | parser
)
#itemgetter will get the value of the key "language" from the translation_prompt input

translation_chain.invoke(
    {
        "context": "Lewis is a formula 1 driver. Lewis Hamilton is eight time formula 1 world champion",
        "question": "How many times has Lewis won Formula 1 championship?",
        "language": "Tamil",
    }
)

'லூயிஸ் ஹேமில்டன் ஃபார்முலா 1 சாதனை எட்டு முறைகளில் வெற்றி பெற்றுள்ளார்.'

Transcribing the captions from a test youtube video using OpenAI whisper model

In [12]:
import os
import subprocess
import whisper

YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=u47GtXwePms"

if not os.path.exists("transcription.txt"):
    # Download audio using yt-dlp
    subprocess.run(["yt-dlp", "-x", "--audio-format", "mp3", "-o", "downloaded_audio.%(ext)s", YOUTUBE_VIDEO])

    # Transcribe with Whisper
    whisper_model = whisper.load_model("base")
    transcription = whisper_model.transcribe("downloaded_audio.mp3", fp16=False)["text"].strip()

    with open("transcription.txt", "w") as f:
        f.write(transcription)


In [13]:
with open("transcription.txt") as file:
    transcription = file.read()

transcription[:100]

"I think it's possible that physics has exploits and we should be trying to find them. arranging some"

Pass in entire transcript as context:

In [14]:
try:
    chain.invoke({
        "context": transcription,
        "question": "Is RAG a good idea?"
    })
except Exception as e:
    print(e)

Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens. However, your messages resulted in 47047 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}


If the transcription is going to be large, then the direct invocation might fail (max token to a model)

Load the transcript into text loader so that we can split the whole text into various documents

In [15]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("transcription.txt")
text_documents = loader.load()
type(text_documents)

list

Use text splitter to split the transcript which is loaded in to text loader class

In [16]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)  # there is a overlap of 20 characters between the splits
text_splitter.split_documents(text_documents)[:5]

[Document(metadata={'source': 'transcription.txt'}, page_content="I think it's possible that physics has exploits and we should be trying to find them. arranging some"),
 Document(metadata={'source': 'transcription.txt'}, page_content='arranging some kind of a crazy quantum mechanical system that somehow gives you buffer overflow,'),
 Document(metadata={'source': 'transcription.txt'}, page_content='buffer overflow, somehow gives you a rounding error in the floating point. Synthetic intelligences'),
 Document(metadata={'source': 'transcription.txt'}, page_content="intelligences are kind of like the next stage of development. And I don't know where it leads to."),
 Document(metadata={'source': 'transcription.txt'}, page_content='where it leads to. Like at some point, I suspect the universe is some kind of a puzzle. These')]

In [17]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(text_documents)

Now we have separate chunks of documents:

    Challenge: We need to map which document to use or send it to the model based on the question the user gives as input.
    
    Solution: Store embeddings of all documents and embedding for question. Embedding in a vector space can be used to do similarity match. Use this similarity result to fihure out which document contains relavant information for the given question. (Vector embedding similarity search)

Generate embeddings for question/prompt:

In [18]:
from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
embedded_query = embeddings.embed_query("Who is Lewis Hamilton?")
print(f"Embedding length: {len(embedded_query)}")
print(embedded_query[:10])

Embedding length: 1536
[-0.005083421710878611, 0.0008213773253373802, -0.022214777767658234, -0.01773599162697792, -0.013884236104786396, 0.00518899317830801, -0.025490690022706985, 0.01019243709743023, -0.02641203999519348, -0.019796233624219894]


Test cosine similarity of embeddings of two similar prompts

In [19]:
sentence1 = embeddings.embed_query("Lewis is a formula 1 driver.")
sentence2 = embeddings.embed_query("Lewis Hamilton is eight time formula 1 world champion.")

In [20]:
from sklearn.metrics.pairwise import cosine_similarity

query_sentence1_similarity = cosine_similarity([embedded_query], [sentence1])[0][0]
query_sentence2_similarity = cosine_similarity([embedded_query], [sentence2])[0][0]

query_sentence1_similarity, query_sentence2_similarity

(0.8923234163986097, 0.8839253717303879)

For the given question(Who is Lewis Hamilton?), the similarity search of the two sentences using their embeddings are good.
Closer to 1 indicated good match

### Set up Vector DB:
Use embedding model get embedding and store these embeddings into vector DB. From vector DB we can do similarity match between the question and the transcript data

In [21]:
len(documents)

221

In [22]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_community.vectorstores import DocArrayInMemorySearch

vector_store = DocArrayInMemorySearch.from_documents(documents, embeddings)






In [23]:
chain = (
    {"context": vector_store.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | openai_model
    | parser
)
chain.invoke("What is synthetic intelligence?")

'Synthetic intelligence is described as the next stage of development in artificial intelligence, where synthetic AIs uncover and solve puzzles in the universe.'

    Setup pinecone DB - vector DB to store embeddings   

In [24]:
from langchain_pinecone import PineconeVectorStore

index_name = "transcript-embedding"

pinecone = PineconeVectorStore.from_documents(
    documents, embeddings, index_name=index_name
)

Test similarity search of using embeddings stored and retrieved from pinecone

In [25]:
pinecone.similarity_search("What is Hollywood going to start doing?")[:3]
# It returns top 3 similar embeddings to the query

[Document(id='49d62313-64d3-41e3-90c8-595461da42e7', metadata={'source': 'transcription.txt'}, page_content="It's like high quality audio and you're speaking usually pretty clearly. I don't know what open AI's plans are either. Yeah, there's always fun projects basically. And stable diffusion also is opening up a huge amount of experimentation. I would say in the visual realm and generating images and videos and movies. I'll think like videos now. And so that's going to be pretty crazy. That's going to almost certainly work and it's going to be really interesting when the cost of content creation is going to fall to zero. You used to need a painter for a few months to paint a thing and now it's going to be speak to your phone to get your video. So Hollywood will start using it to generate scenes, which completely opens up. Yeah, so you can make a movie like Avatar eventually for under a million dollars. Much less. Maybe just by talking to your phone. I mean, I know it sounds kind of cr

In [26]:
chain = (
    {"context": pinecone.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | openai_model
    | parser
)

chain.invoke("What is Hollywood going to start doing?")

'Hollywood is going to start using AI to generate scenes, which will open up new possibilities for content creation.'


    To summarize the overall flow:
    1.  Embeddings for question is generated.
    2.  Based on this embeddings the most similar content embedding is retrieved from vector DB (Pinecone)
    3.  Not this content and question is sent into the LLM model
    4.  parser gets the output and prints the string based on the context.