In [34]:
!pip install -q youtube-transcript-api langchain_community langchain_openai faiss_cpu tiktoken python-dotenv

In [35]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate


# Step 1a - Indexing(Document Ingestion)

In [36]:
video_id = "JV3pL1_mn2M"
try:
  transcript_list= YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
  transcript = " ".join(chunk["text"] for chunk in transcript_list)
  print(transcript)
except:
  print("No Captions available for this video")

hey everyone today we're diving into the book AI engineering by chip win 800 pages of really great content about this in demand field that's offering salaries of $300,000 or more in this video I'm summarizing everything from the book to help you get a highle overview of the field we'll talk about Foundation models prompt engineering rag fine tuning agents how to build a system improving inference and more I also want to mention this is a super highlevel overview of a very detailed technical book don't expect to learn all the details just from watching this video I really recommend using this is a way to get an overview of what the field looks like and use it as a jumping off point for your own research and exploration so what exactly is AI engineering and how is it different from traditional machine learning let's break it down AI engineering has exploded recently for two simple reasons AI models have gotten dramatically better at solving real problems while the barrier to building wit

In [37]:
#transcript_list

## Step 1b-Indexing (Text Splitting)

In [38]:
splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
chunks = splitter.create_documents([transcript])
len(chunks)

115

# Step 1c & 1d - Indexing(Embedding Generation and Storing in Vectors

In [39]:
embeddings = OpenAIEmbeddings(model = 'text-embedding-ada-002')
vector_store = FAISS.from_documents(chunks, embeddings)
#vector_store.index_to_docstore_id

# Step 2 - Retrieval

In [40]:
retriever = vector_store.as_retriever(search_type = 'similarity', search_kwargs = {'k': 4})

In [41]:
retriever

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7c4efaf2a450>, search_kwargs={'k': 4})

In [42]:
retriever.invoke("What is the current trend in AI")

[Document(id='d8eb449e-9421-4535-a5dd-09d7d4d38d47', metadata={}, page_content="machine learning let's break it down AI engineering has exploded recently for two simple reasons AI models have gotten dramatically better at solving real problems while the barrier to building with them has gotten much lower this perfect storm has created one of the fastest growing engineering disciplines today at its core AI engineering is about building applications on top of foundation models those massive AI systems trained by companies like open AI or Google unlike traditional machine learning Engineers who build models from scratch AI Engineers leverage existing ones focusing Less on training and more on adaptation these Foundation models work through a process called self-supervision instead of requiring humans to painstakingly label data these models can learn by predicting parts of their input data this breakthrough solved the data labeling bottleneck that held back AI for years as these models sc

# Step 3- Augmentation

In [43]:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
prompt = PromptTemplate(
    template=""""
    You are a helpful assistant.
    Answer ONLY from the provided transcript context.
    If the context is insufficient, just say you don't know.
    {context}
    Question: {question}
    """,
    input_variables=["context", "question"]
)

In [58]:
question = "Is the topic of RAG discussed in the video? If yes, what was discussed, give in clear pointers."
retrieved_docs = retriever.invoke(question)
print(retrieved_docs)

[Document(id='dca1371a-3ee5-4e50-8c05-edf63afa0ad0', metadata={}, page_content="let's explore how to give Foundation models access to information beyond what they were trained on to solve a task effectively a model needs two things instructions on how to perform the task and the necessary information to complete it two dominant patterns have emerged for providing models with the information they need retrieval augmented generation or rag and the agentic pattern rag allows models to retrieve relevant information from external data sources while the agentic pattern enables models to use tools like web search and apis to gather information actively while rag is primarily used for context construction the agentic pattern can do much more let's start with rag first so what is rag retrieval augmented generation is a technique that enhances a model's generation capabilities by retrieving relevant information from external memory sources these sources could be an internal database a user's pre

In [59]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"let's explore how to give Foundation models access to information beyond what they were trained on to solve a task effectively a model needs two things instructions on how to perform the task and the necessary information to complete it two dominant patterns have emerged for providing models with the information they need retrieval augmented generation or rag and the agentic pattern rag allows models to retrieve relevant information from external data sources while the agentic pattern enables models to use tools like web search and apis to gather information actively while rag is primarily used for context construction the agentic pattern can do much more let's start with rag first so what is rag retrieval augmented generation is a technique that enhances a model's generation capabilities by retrieving relevant information from external memory sources these sources could be an internal database a user's previous chat sessions or even the internet you can think of rag as a technique to

In [60]:
final_prompt = prompt.invoke({"context":context_text, "question":question})


# Step 4- Generation

In [61]:
answer = llm.invoke(final_prompt)
print(answer.content)

Yes, the topic of RAG (Retrieval Augmented Generation) is discussed in the video. Here are the key points:

1. **Definition of RAG**: RAG is a technique that enhances a model's generation capabilities by retrieving relevant information from external memory sources, such as internal databases, previous chat sessions, or the internet.

2. **Components of RAG**: A RAG system consists of two main components:
   - **Retriever**: Fetches information from external memory sources.
   - **Generator**: The foundation model that produces a response based on the retrieved information.

3. **Training of Components**: In many cases, the retriever and generator are trained separately, but fine-tuning the entire RAG system from end to end can significantly improve performance.

4. **Functions of the Retriever**: The retriever performs two main functions:
   - **Indexing**: Processing data for quick retrieval, which includes adding metadata like tags and keywords.
   - **Querying**: Retrieving relevant

# Building a Chain

In [62]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [64]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text


In [65]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})


In [66]:
parallel_chain.invoke('Explain Advanced RAG and the future scope in fine tuning')

{'context': "you've maximized performance gains from prompting choosing between Rag and fine-tuning depends on whether your model's failures are information based or behavior-based if the model fails because it lacks information like private company data or recent events rag gives the model better access to that information if the model has behavioral issues which I think is very funny to say like outputs that are factually correct but irrelevant or they're in the wrong format fine tuning might help more if your model has both issues start with rag because it's easier begin with a simple term-based solution and evolve from there in many cases combining rag and fine tuning will give you the biggest performance boost so the workflow to adapt a model to a task might be first design evaluation criteria and an evaluation pipeline then try to get the model to perform the task with prompting alone add more examples to the prompt from there at that point if the model continues to have informat

In [67]:
parser = StrOutputParser()

In [68]:
main_chain = parallel_chain | prompt | llm | parser

In [69]:
main_chain.invoke('Summarize the video in clear explainable pointers.')

'1. The creator enjoyed making the video and plans to produce more technical content in the future.\n2. Viewers are encouraged to suggest which book they would like summarized next in the comments.\n3. Subscribers are reminded to subscribe to not miss future videos.\n4. The video emphasizes the importance of clear instructions when working with models, including specifying output formats and providing examples.\n5. It discusses evaluating models based on domain-specific capabilities, general capabilities, instruction-following abilities, and cost/latency.\n6. The creator suggests breaking complex tasks into simpler subtasks to improve performance and monitoring.\n7. Techniques like Chain of Thought prompting and self-critique are recommended to enhance model reasoning.\n8. The video highlights the need for clear scoring systems and the potential benefits of adopting a specific persona for responses.'