# Introduction to LangChain v0.2.0 and LCEL: LangChain Powered RAG

## Task 1: Installing Required Libraries

In [5]:
!pip install -qU langchain langchain-core langchain-community langchain-openai
!pip install -qU qdrant-client
!pip install -qU tiktoken pymupdf

## Task 2: Set Environment Variables

In [6]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

## Task 3: Initialize a Simple Chain using LCEL

### LLM Orchestration Tool (LangChain)

In [7]:
from langchain_openai import ChatOpenAI

openai_chat_model = ChatOpenAI(model="gpt-4o")

### ❓ Question #1:
What other models could we use, and how would the above code change?

1. gpt-3.5-turbo
2. gpt-4
3. gpt-4-domain-specific
4. babbage-002
5. davinci-002

We need to change the value of model paramter to the name of the model in order to use it.

### Prompt Template

In [8]:
from langchain_core.prompts import ChatPromptTemplate

system_template = "You are a legendary and mythical Wizard. You speak in riddles and make obscure and pun-filled references to exotic cheeses."
human_template = "{content}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", system_template),
    ("human", human_template)
])

### Our First Chain

In [9]:
chain = chat_prompt | openai_chat_model

In [10]:
print(chain.invoke({"content": "Hello world!"}))

content='Ah, a seeker of wisdom approaches! Greetings, traveler of the digital realm. In the land of ancient scripts and timeless codes, your words echo like the curdling of a fine Roquefort.\n\nWhat knowledge do you seek, or perhaps, what riddle do you bring? For every question, there is an answer aged like a fine Gouda.' response_metadata={'token_usage': {'completion_tokens': 73, 'prompt_tokens': 38, 'total_tokens': 111}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_ce0793330f', 'finish_reason': 'stop', 'logprobs': None} id='run-bd782e9a-ec2a-476e-8a8a-5bbf8f9e293d-0' usage_metadata={'input_tokens': 38, 'output_tokens': 73, 'total_tokens': 111}


In [11]:
chain.invoke({"content" : "Could I please have some advice on how to become a better Python Programmer?"})

AIMessage(content='Ah, seeker of serpentine scripts, you seek the wisdom to master the Python! Fear not, for I shall bestow upon you the curdled clues of coding excellence:\n\n1. **Read the Ancient Scrolls**: Dive into the sacred tomes such as "Automate the Boring Stuff with Python" and "Python Crash Course". These manuscripts are like the finest Gruyère, rich and full of flavor.\n\n2. **Practice in the Cheesery of Code**: Write code daily, for only through practice can one perfect the art. Like a fine Roquefort, your skills will mature over time.\n\n3. **Join the Fellowship of the Cheese Board**: Engage with the community through forums and groups such as Stack Overflow or Reddit\'s r/learnpython. Sharing knowledge is like sharing a wheel of Brie, delightful and enriching.\n\n4. **Solve Riddles of the Sphinx**: Tackle coding challenges on platforms like LeetCode, HackerRank, or Codewars. These puzzles are the Camembert of your intellectual journey, smooth yet complex.\n\n5. **Study th

## Naive RAG - Manually adding context through the Prompt Template

In [12]:
system_template = "You are a helpful assistant."
human_template = "{content}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", system_template),
    ("human", human_template)
])

chat_chain = chat_prompt | openai_chat_model

print(chat_chain.invoke({"content" : "Please define LangChain."}))

content='LangChain is an open-source framework designed to simplify the development of applications that use large language models (LLMs). It provides a suite of tools and components that make it easier to integrate LLMs into various applications, such as chatbots, virtual assistants, and other natural language processing (NLP) tasks.\n\nKey features of LangChain include:\n\n1. **Model Wrappers**: Simplifies the use of different LLMs by providing standardized interfaces.\n2. **Data Connectors**: Facilitates the integration of external data sources, enabling the models to access and utilize additional information.\n3. **Prompt Templates**: Helps in designing and managing prompts, which are essential for interacting with LLMs effectively.\n4. **Evaluation Modules**: Provides tools for assessing the performance and accuracy of language models in various applications.\n5. **Deployment Support**: Assists in deploying applications that leverage LLMs, making it easier to move from development

In [13]:
print(chat_chain.invoke({"content" : "What is LangChain Expression Language (LECL)?"}))

content='LangChain Expression Language (LEL) is a domain-specific language designed to facilitate complex operations within the LangChain framework. LangChain is a library used to build applications leveraging language models, and LEL provides a structured way to define and execute expressions that interact with these models.\n\nLEL allows developers to write expressions that can perform a variety of tasks such as string manipulation, mathematical operations, and logical comparisons, which are essential when working with the outputs of language models. By using LEL, developers can create more sophisticated and fine-tuned applications that better harness the potential of language models.\n\nKey features of LEL include:\n\n1. **Simplicity and Readability**: LEL is designed to be easy to read and write, making it accessible for developers who are familiar with other programming languages.\n2. **Integration with LangChain**: LEL seamlessly integrates with the LangChain framework, enabling 

In [14]:
HUMAN_TEMPLATE = """
#CONTEXT:
{context}

QUERY:
{query}

Use the provide context to answer the provided user query. Only use the provided context to answer the query. If you do not know the answer, response with "I don't know"
"""

CONTEXT = """
LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.

Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, it often becomes important to run things in parallel. With LCEL syntax, any components that can be run in parallel automatically are.

Seamless LangSmith Tracing Integration As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step. With LCEL, all steps are automatically logged to LangSmith for maximal observability and debuggability.
"""

chat_prompt = ChatPromptTemplate.from_messages([
    ("human", HUMAN_TEMPLATE)
])

chat_chain = chat_prompt | openai_chat_model

print(chat_chain.invoke({"query" : "What is LangChain Expression Language?", "context" : CONTEXT}))

content='LangChain Expression Language (LCEL) is a declarative way to easily compose chains together, providing several benefits over writing normal code. These benefits include:\n\n1. **Async, Batch, and Streaming Support**: Chains constructed with LCEL have automatic support for synchronous, asynchronous, batch, and streaming operations.\n2. **Fallbacks**: LCEL allows you to easily attach fallbacks to any chain to handle errors gracefully.\n3. **Parallelism**: Components that can be run in parallel will automatically do so, improving efficiency.\n4. **Seamless LangSmith Tracing Integration**: All steps in LCEL chains are automatically logged to LangSmith, enhancing observability and debuggability.\n\n' response_metadata={'token_usage': {'completion_tokens': 142, 'prompt_tokens': 274, 'total_tokens': 416}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d576307f90', 'finish_reason': 'stop', 'logprobs': None} id='run-089686dc-33c9-4f1c-960b-c835e2474fb2-0' usage_metadata={

## Task #4: Implement Naive RAG using LCEL

## Putting the R in RAG: Retrieval 101

In [15]:
context = """
EVERY HITCHHIKER'S GUIDE BOOK
"""

In [16]:
import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

In [17]:
len(enc.encode(context))

12

### TextSplitting aka Chunking

In [18]:
import tiktoken
from langchain.text_splitter import RecursiveCharacterTextSplitter

def tiktoken_len(text):
    tokens = tiktoken.encoding_for_model("gpt-4o").encode(
        text,
    )
    return len(tokens)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

In [19]:
chunks = text_splitter.split_text(CONTEXT)
     

In [20]:
len(chunks)

3

In [21]:
for chunk in chunks:
  print(chunk)
  print("----")

LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.
----
Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, it often becomes important to run things in parallel. With LCEL syntax, any components that can be run in parallel automatically are.
----
Seamless LangSmith Tracing Integration As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at ev

### Activity #1:
While there's nothing specifically wrong with the chunking method used above - it is a naive approach that is not sensitive to specific data formats.

Brainstorm some ideas that would split large single documents into smaller documents.
--------------------------------------------------------
Splitting large single documents into smaller, more manageable pieces can be approached in various ways, depending on the type of document and the intended use. Here are some ideas:

1. Section-Based Splitting
Use a parser to identify headings and split the text accordingly.

2. Paragraph-Based Splitting
Identify paragraph breaks and split the document at these points.

3. Sentence-Based Splitting
Use natural language processing (NLP) tools to detect sentence boundaries.

4. Topic-Based Splitting
Apply algorithms like Latent Dirichlet Allocation (LDA) to identify topics and split the document.

5. Page-Based Splitting
Use PDF manipulation tools to split the document by pages.

6. Keyword-Based Splitting
Use regular expressions or NLP to detect keywords and split the text.

7. Time-Based Splitting (for transcripts)
Split audio or video transcripts into smaller sections based on time intervals. Use the timestamps in the transcript to guide the splitting process.

8. Summarization-Based Splitting
Generate summaries of smaller sections from the large document. Use text summarization algorithms to condense the content into smaller pieces.

9. Data-Driven Splitting
Description: Use machine learning models to determine optimal split points based on document structure. Train a model to identify natural breakpoints in legal documents. Develop and train models using labeled datasets to identify split points.

## Embeddings and Dense Vector Search

In [23]:
from langchain_openai.embeddings import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

## ❓ Question #2:
What is the embedding dimension, given that we're using text-embedding-3-small?

The text-embedding-3-small model from OpenAI has an embedding dimension of 1536 by default. This means that when you use this model to generate embeddings for a piece of text, the resulting embedding vector will have 1536 dimensions

## Finding the Embeddings for Our Chunks

In [25]:
embeddings_dict = {}

for chunk in chunks:
  embeddings_dict[chunk] = embedding_model.embed_query(chunk)

In [26]:
for k,v in embeddings_dict.items():
  print(f"Chunk - {k}")
  print("---")
  print(f"Embedding - Vector of Size: {len(v)}")
  print("\n\n")
     

Chunk - LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.
---
Embedding - Vector of Size: 1536



Chunk - Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, it often becomes important to run things in parallel. With LCEL syntax, any components that can be run in parallel automatically are.
---
Embedding - Vector of Size: 1536



Chunk - Seamless LangSmith Tracing Integration As your chains get more and

In [27]:
query = "Can LCEL help take code from the notebook to production?"

query_vector = embedding_model.embed_query(query)
print(f"Vector of Size: {len(query_vector)}")

Vector of Size: 1536


In [28]:
import numpy as np
from numpy.linalg import norm

def cosine_similarity(vec_1, vec_2):
  return np.dot(vec_1, vec_2) / (norm(vec_1) * norm(vec_2))

In [29]:
max_similarity = -float('inf')
closest_chunk = ""

for chunk, chunk_vector in embeddings_dict.items():
  cosine_similarity_score = cosine_similarity(chunk_vector, query_vector)

  if cosine_similarity_score > max_similarity:
    closest_chunk = chunk
    max_similarity = cosine_similarity_score

print(closest_chunk)
print(max_similarity)

LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.
0.537298487051912


## Creating a Retriever

In [30]:
def retrieve_context(query, embeddings_dict, embedding_model):
  query_vector = embedding_model.embed_query(query)
  max_similarity = -float('inf')
  closest_chunk = ""

  for chunk, chunk_vector in embeddings_dict.items():
    cosine_similarity_score = cosine_similarity(chunk_vector, query_vector)

    if cosine_similarity_score > max_similarity:
      closest_chunk = chunk
      max_similarity = cosine_similarity_score

  return closest_chunk

In [31]:
def simple_rag(query, embeddings_dict, embedding_model, chat_chain):
  context = retrieve_context(query, embeddings_dict, embedding_model)

  response = chat_chain.invoke({"query" : query, "context" : context})

  return_package = {
      "query" : query,
      "response" : response,
      "retriever_context" : context
  }

  return return_package

In [32]:
simple_rag("Can LCEL help take code from the notebook to production?", embeddings_dict, embedding_model, chat_chain)


{'query': 'Can LCEL help take code from the notebook to production?',
 'response': AIMessage(content='Yes, LCEL can help take code from the notebook to production. It allows you to prototype a chain in a Jupyter notebook using the sync interface and then easily expose it as an async streaming interface. This ensures that any chain constructed in this manner has full support for sync, async, batch, and streaming operations, making the transition from prototyping to production smoother.', response_metadata={'token_usage': {'completion_tokens': 74, 'prompt_tokens': 152, 'total_tokens': 226}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d576307f90', 'finish_reason': 'stop', 'logprobs': None}, id='run-ffaf24ef-3fc3-44a6-9fc4-11e447089531-0', usage_metadata={'input_tokens': 152, 'output_tokens': 74, 'total_tokens': 226}),
 'retriever_context': 'LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing c

## ❓ Question #3:
What does LCEL do that makes it more reliable at scale?


In [50]:
simple_rag("What does LCEL do that makes it more reliable at scale?", embeddings_dict, embedding_model, chat_chain)

{'query': 'What does LCEL do that makes it more reliable at scale?',
 'response': AIMessage(content='LCEL makes chains more reliable at scale by automatically providing full support for synchronous (sync), asynchronous (async), batch, and streaming interfaces. This ensures that chains can handle various types of workloads and execution models efficiently, making it easier to prototype, scale, and expose them in different ways without needing to rewrite the code for each specific use case.', response_metadata={'token_usage': {'completion_tokens': 69, 'prompt_tokens': 153, 'total_tokens': 222}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_ce0793330f', 'finish_reason': 'stop', 'logprobs': None}, id='run-718d94e4-2b06-453c-a15f-4a60b612b0e7-0', usage_metadata={'input_tokens': 153, 'output_tokens': 69, 'total_tokens': 222}),
 'retriever_context': 'LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writi

## Task #5: Create a Simple RAG Application Using Qdrant, OpenAI, and LCEL

## LangChain Powered RAG


## Data Collection

In [33]:
from langchain.document_loaders import PyMuPDFLoader

docs = PyMuPDFLoader("https://singjupost.com/wp-content/uploads/2014/07/Steve-Jobs-iPhone-2007-Presentation-Full-Transcript.pdf").load()
     

## Chunking Our Documents

In [34]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

split_chunks = text_splitter.split_documents(docs)

In [35]:
len(split_chunks)


86

In [36]:
max_chunk_length = 0

for chunk in split_chunks:
  max_chunk_length = max(max_chunk_length, tiktoken_len(chunk.page_content))

print(max_chunk_length)

197


## Embeddings and Vector Storage

In [37]:
from langchain_community.vectorstores import Qdrant

qdrant_vectorstore = Qdrant.from_documents(
    split_chunks,
    embedding_model,
    location=":memory:",
    collection_name="Steve Job's Speech",
)

In [38]:
qdrant_retriever = qdrant_vectorstore.as_retriever()

## Setting up our RAG

## Activity #2:


In [39]:
RAG_PROMPT = """
CONTEXT:
{context}

QUERY:
{question}

Use the provide context to answer the provided user query. Only use the provided context to answer the query. If you do not know the answer, response with "I don't know"
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

## Our RAG Chain

In [40]:
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

retrieval_augmented_qa_chain = (
    # INVOKE CHAIN WITH: {"question" : "<>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | qdrant_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | openai_chat_model, "context": itemgetter("context")}
)

In [41]:
!pip install -qU grandalf

In [42]:
print(retrieval_augmented_qa_chain.get_graph().draw_ascii())


                       +---------------------------------+                         
                       | Parallel<context,question>Input |                         
                       +---------------------------------+                         
                           *****                   ****                            
                        ***                            ****                        
                     ***                                   ****                    
+--------------------------------+                             **                  
| Lambda(itemgetter('question')) |                              *                  
+--------------------------------+                              *                  
                 *                                              *                  
                 *                                              *                  
                 *                                              *           

In [43]:
response = retrieval_augmented_qa_chain.invoke({"question" : "What is the most important thing about the iPhone?"})


In [44]:
response["response"].content


'The most important thing about the iPhone, based on the provided context, is its innovative design and advanced technology. The iPhone integrates a multi-touch screen, miniaturization, custom silicon, power management, OSX inside a mobile device, advanced sensors, desktop-class applications, and a widescreen video iPod. It is designed to be the ultimate digital device, essentially "like having your life in your pocket."'

In [45]:
for context in response["context"]:
  print("Context:")
  print(context)
  print("----")

Context:
page_content='of the art in every facet of this design. So let me just talk a little bit about it here. We’ve got\nthe multi-touch screen. A first. Miniaturization, more than any we’ve done before. A lot of\ncustom silicon. Tremendous power management. OSX inside a mobile device. Featherweight\nprecision enclosures. Three advanced sensors. Desktop class applications, and of course, the\nwidescreen video iPod. We’ve been innovating like crazy for the last few years on this, and\nwe filed for over 200 patents for all the inventions in iPhone, and we intend to protect them.\nSo, a lot of high technology. I think we’re advancing the state of the art in every aspect of\nthis design. So iPhone is like having your life in your pocket. It’s the ultimate digital device.' metadata={'source': 'https://singjupost.com/wp-content/uploads/2014/07/Steve-Jobs-iPhone-2007-Presentation-Full-Transcript.pdf', 'file_path': 'https://singjupost.com/wp-content/uploads/2014/07/Steve-Jobs-iPhone-2007-Pr

In [46]:
response = retrieval_augmented_qa_chain.invoke({"question" : "What is the airspeed velocity of an unladen swallow?"})


In [47]:
response["response"].content


"I don't know."

## ❓ Question #4:
What key innovations did the iPhone introduce?

In [49]:
response = retrieval_augmented_qa_chain.invoke({"question" : "What key innovations did the iPhone introduce?"})
response["response"].content


'The key innovations introduced by the iPhone, as highlighted in the provided context, include:\n\n1. **Widescreen iPod with Touch Controls**: This innovation allowed users to touch their music and interact with it in a more intuitive way.\n\n2. **Revolutionary Mobile Phone**: The iPhone aimed to reinvent the phone by making it easier to make calls and manage contacts. It allowed users to sync their contacts from their PC or Mac to their phone, ensuring they had all their numbers with them at all times.\n\n3. **Breakthrough Internet Communications Device**: The iPhone combined an iPod, a phone, and an Internet communicator into one device, changing the way users interacted with the internet on a mobile device.\n\n4. **Multi-touch Screen**: This was a first in mobile devices, allowing for more advanced and intuitive touch interactions.\n\n5. **Miniaturization and Custom Silicon**: The iPhone featured significant advancements in miniaturization and custom silicon design for better perfor