# Introduction to LangChain v0.2.0 and LCEL: LangChain Powered RAG

## Task 1: Installing Required Libraries

In [51]:
!pip install -qU langchain langchain-core langchain-community langchain-openai
!pip install -qU qdrant-client
!pip install -qU tiktoken pymupdf

The folder you are executing pip from can no longer be found.
The folder you are executing pip from can no longer be found.
The folder you are executing pip from can no longer be found.


## Task 2: Set Environment Variables

In [52]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

## Task 3: Initialize a Simple Chain using LCEL

### LLM Orchestration Tool (LangChain)

In [53]:
from langchain_openai import ChatOpenAI

openai_chat_model = ChatOpenAI(model="gpt-4o")

### ❓ Question #1:
What other models could we use, and how would the above code change?

1. gpt-3.5-turbo
2. gpt-4
3. gpt-4-domain-specific
4. babbage-002
5. davinci-002

We need to change the value of model paramter to the name of the model in order to use it.

### Prompt Template

In [54]:
from langchain_core.prompts import ChatPromptTemplate

system_template = "You are a legendary and mythical Wizard. You speak in riddles and make obscure and pun-filled references to exotic cheeses."
human_template = "{content}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", system_template),
    ("human", human_template)
])

### Our First Chain

In [55]:
chain = chat_prompt | openai_chat_model

In [56]:
print(chain.invoke({"content": "Hello world!"}))

content='Ah, greetings, seeker of wisdom! Just as a wheel of Gouda rolls through the valleys of time, your words echo through the ages. What curdled curiosity brings you to my enchanted realm today?' response_metadata={'token_usage': {'completion_tokens': 42, 'prompt_tokens': 38, 'total_tokens': 80}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4008e3b719', 'finish_reason': 'stop', 'logprobs': None} id='run-6bf42ed0-85f9-49c3-93e3-dd98e428b3eb-0' usage_metadata={'input_tokens': 38, 'output_tokens': 42, 'total_tokens': 80}


In [57]:
chain.invoke({"content" : "Could I please have some advice on how to become a better Python Programmer?"})

AIMessage(content='Ah, seeker of serpentine wisdom, to master the Python’s embrace, one must dance with the code like a Roquefort in a river of milk. Heed these cryptic morsels of counsel:\n\n1. **Gaze into the Gouda**: Study the ancient scripts and scrolls, known to mere mortals as documentation. The Python Software Foundation’s tomes and texts are your sacred cheese wedges.\n\n2. **Mingle with the Muenster**: Join the circles of fellow enchanters in places like GitHub, Stack Overflow, or the Python Discord. Their shared spells and incantations shall be your guiding light.\n\n3. **Slice the Swiss**: Write code with holes, yet refine it until it becomes a solid block. Practice by tackling small projects, then gradually gnaw your way to grander wheels.\n\n4. **Taste the Taleggio**: Savor the art of testing. Ensure your code is as smooth and consistent as a well-aged cheese. Employ tools like pytest to certify your concoctions.\n\n5. **Revere the Ricotta**: Make your code as fresh and si

## Naive RAG - Manually adding context through the Prompt Template

In [58]:
system_template = "You are a helpful assistant."
human_template = "{content}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", system_template),
    ("human", human_template)
])

chat_chain = chat_prompt | openai_chat_model

print(chat_chain.invoke({"content" : "Please define LangChain."}))

content='LangChain is a framework designed to facilitate the development of applications that utilize large language models (LLMs). It provides a structured approach to building language model-powered applications by linking various components and capabilities. The core functionalities of LangChain include:\n\n1. **Prompt Management**: Tools for creating, formatting, and managing prompts for language models.\n2. **LLM Chaining**: Methods to chain together multiple calls to language models, enabling more complex interactions and workflows.\n3. **Data Augmented Generation**: Techniques to incorporate external data sources to enhance the responses generated by language models.\n4. **Agents**: Components that allow language models to interact with other tools and services, effectively acting as intelligent agents.\n5. **Memory**: Mechanisms to maintain state and context across interactions, which is crucial for applications requiring ongoing dialogues or sessions.\n\nBy integrating these c

In [59]:
print(chat_chain.invoke({"content" : "What is LangChain Expression Language (LECL)?"}))

content='LangChain Expression Language (LEL) is a specialized language designed for the LangChain framework, which is used to manage and manipulate chains of data processing tasks. LEL enables developers to define, configure, and execute complex workflows involving multiple data processing steps in a declarative manner.\n\nBy using LEL, developers can:\n\n1. **Define Chains:** Specify a sequence of operations or tasks that data should pass through.\n2. **Configure Parameters:** Set parameters and configurations for each task in the chain.\n3. **Manage Dependencies:** Handle dependencies and data flow between different tasks.\n4. **Optimize Performance:** Optimize the execution of tasks by managing resources and parallelism.\n\nLEL abstracts the complexity involved in orchestrating multiple data processing tasks, making it easier for developers to build and maintain scalable data pipelines. It is typically used in environments where data needs to be processed through a series of transfo

In [60]:
HUMAN_TEMPLATE = """
#CONTEXT:
{context}

QUERY:
{query}

Use the provide context to answer the provided user query. Only use the provided context to answer the query. If you do not know the answer, response with "I don't know"
"""

CONTEXT = """
LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.

Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, it often becomes important to run things in parallel. With LCEL syntax, any components that can be run in parallel automatically are.

Seamless LangSmith Tracing Integration As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step. With LCEL, all steps are automatically logged to LangSmith for maximal observability and debuggability.
"""

chat_prompt = ChatPromptTemplate.from_messages([
    ("human", HUMAN_TEMPLATE)
])

chat_chain = chat_prompt | openai_chat_model

print(chat_chain.invoke({"query" : "What is LangChain Expression Language?", "context" : CONTEXT}))

content='LangChain Expression Language (LCEL) is a declarative way to easily compose chains together. It provides several benefits, such as full sync, async, batch, and streaming support, the ability to attach fallbacks to handle errors gracefully, automatic parallel execution of components that can be run in parallel, and seamless integration with LangSmith for tracing and observability.' response_metadata={'token_usage': {'completion_tokens': 72, 'prompt_tokens': 274, 'total_tokens': 346}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d576307f90', 'finish_reason': 'stop', 'logprobs': None} id='run-1f377483-6d8b-45fb-98db-ba80a4fcff09-0' usage_metadata={'input_tokens': 274, 'output_tokens': 72, 'total_tokens': 346}


## Task #4: Implement Naive RAG using LCEL

## Putting the R in RAG: Retrieval 101

In [61]:
context = """
EVERY HITCHHIKER'S GUIDE BOOK
"""

In [62]:
import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

In [63]:
len(enc.encode(context))

12

### TextSplitting aka Chunking

In [64]:
import tiktoken
from langchain.text_splitter import RecursiveCharacterTextSplitter

def tiktoken_len(text):
    tokens = tiktoken.encoding_for_model("gpt-4o").encode(
        text,
    )
    return len(tokens)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

In [65]:
chunks = text_splitter.split_text(CONTEXT)
     

In [66]:
len(chunks)

3

In [67]:
for chunk in chunks:
  print(chunk)
  print("----")

LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.
----
Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, it often becomes important to run things in parallel. With LCEL syntax, any components that can be run in parallel automatically are.
----
Seamless LangSmith Tracing Integration As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at ev

### Activity #1:
While there's nothing specifically wrong with the chunking method used above - it is a naive approach that is not sensitive to specific data formats.

Brainstorm some ideas that would split large single documents into smaller documents.
--------------------------------------------------------
Splitting large single documents into smaller, more manageable pieces can be approached in various ways, depending on the type of document and the intended use. Here are some ideas:

1. Section-Based Splitting
Use a parser to identify headings and split the text accordingly.

2. Paragraph-Based Splitting
Identify paragraph breaks and split the document at these points.

3. Sentence-Based Splitting
Use natural language processing (NLP) tools to detect sentence boundaries.

4. Topic-Based Splitting
Apply algorithms like Latent Dirichlet Allocation (LDA) to identify topics and split the document.

5. Page-Based Splitting
Use PDF manipulation tools to split the document by pages.

6. Keyword-Based Splitting
Use regular expressions or NLP to detect keywords and split the text.

7. Time-Based Splitting (for transcripts)
Split audio or video transcripts into smaller sections based on time intervals. Use the timestamps in the transcript to guide the splitting process.

8. Summarization-Based Splitting
Generate summaries of smaller sections from the large document. Use text summarization algorithms to condense the content into smaller pieces.

9. Data-Driven Splitting
Description: Use machine learning models to determine optimal split points based on document structure. Train a model to identify natural breakpoints in legal documents. Develop and train models using labeled datasets to identify split points.

## Embeddings and Dense Vector Search

In [68]:
from langchain_openai.embeddings import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

## ❓ Question #2:
What is the embedding dimension, given that we're using text-embedding-3-small?

The text-embedding-3-small model from OpenAI has an embedding dimension of 1536 by default. This means that when you use this model to generate embeddings for a piece of text, the resulting embedding vector will have 1536 dimensions

## Finding the Embeddings for Our Chunks

In [69]:
embeddings_dict = {}

for chunk in chunks:
  embeddings_dict[chunk] = embedding_model.embed_query(chunk)

In [70]:
for k,v in embeddings_dict.items():
  print(f"Chunk - {k}")
  print("---")
  print(f"Embedding - Vector of Size: {len(v)}")
  print("\n\n")
     

Chunk - LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.
---
Embedding - Vector of Size: 1536



Chunk - Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, it often becomes important to run things in parallel. With LCEL syntax, any components that can be run in parallel automatically are.
---
Embedding - Vector of Size: 1536



Chunk - Seamless LangSmith Tracing Integration As your chains get more and

In [71]:
query = "Can LCEL help take code from the notebook to production?"

query_vector = embedding_model.embed_query(query)
print(f"Vector of Size: {len(query_vector)}")

Vector of Size: 1536


In [72]:
import numpy as np
from numpy.linalg import norm

def cosine_similarity(vec_1, vec_2):
  return np.dot(vec_1, vec_2) / (norm(vec_1) * norm(vec_2))

In [73]:
max_similarity = -float('inf')
closest_chunk = ""

for chunk, chunk_vector in embeddings_dict.items():
  cosine_similarity_score = cosine_similarity(chunk_vector, query_vector)

  if cosine_similarity_score > max_similarity:
    closest_chunk = chunk
    max_similarity = cosine_similarity_score

print(closest_chunk)
print(max_similarity)

LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.
0.537298487051912


## Creating a Retriever

In [74]:
def retrieve_context(query, embeddings_dict, embedding_model):
  query_vector = embedding_model.embed_query(query)
  max_similarity = -float('inf')
  closest_chunk = ""

  for chunk, chunk_vector in embeddings_dict.items():
    cosine_similarity_score = cosine_similarity(chunk_vector, query_vector)

    if cosine_similarity_score > max_similarity:
      closest_chunk = chunk
      max_similarity = cosine_similarity_score

  return closest_chunk

In [75]:
def simple_rag(query, embeddings_dict, embedding_model, chat_chain):
  context = retrieve_context(query, embeddings_dict, embedding_model)

  response = chat_chain.invoke({"query" : query, "context" : context})

  return_package = {
      "query" : query,
      "response" : response,
      "retriever_context" : context
  }

  return return_package

In [76]:
simple_rag("Can LCEL help take code from the notebook to production?", embeddings_dict, embedding_model, chat_chain)


{'query': 'Can LCEL help take code from the notebook to production?',
 'response': AIMessage(content='Yes, LCEL can help take code from the notebook to production. By writing chains in LCEL, you can easily prototype a chain in a Jupyter notebook using the sync interface and then expose it as an async streaming interface. This provides a seamless transition from development in a notebook environment to production-ready code with full sync, async, batch, and streaming support.', response_metadata={'token_usage': {'completion_tokens': 73, 'prompt_tokens': 152, 'total_tokens': 225}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d576307f90', 'finish_reason': 'stop', 'logprobs': None}, id='run-b9e3552e-0afd-436c-8fb8-f2ec361a8065-0', usage_metadata={'input_tokens': 152, 'output_tokens': 73, 'total_tokens': 225}),
 'retriever_context': 'LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in t

## ❓ Question #3:
What does LCEL do that makes it more reliable at scale?


In [77]:
simple_rag("What does LCEL do that makes it more reliable at scale?", embeddings_dict, embedding_model, chat_chain)

{'query': 'What does LCEL do that makes it more reliable at scale?',
 'response': AIMessage(content='LCEL makes chains more reliable at scale by providing full support for synchronous, asynchronous, batch, and streaming operations. This flexibility allows for easier prototyping and ensures that the chains can handle various types of workloads efficiently, making them more robust and scalable.', response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 153, 'total_tokens': 203}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d576307f90', 'finish_reason': 'stop', 'logprobs': None}, id='run-33b43168-0dbf-45c6-8ebe-e886c9b26f7b-0', usage_metadata={'input_tokens': 153, 'output_tokens': 50, 'total_tokens': 203}),
 'retriever_context': 'LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):\n\nAsync, Batch, and Streaming Suppo

## Task #5: Create a Simple RAG Application Using Qdrant, OpenAI, and LCEL

## LangChain Powered RAG


## Data Collection

In [78]:
from langchain.document_loaders import PyMuPDFLoader

docs = PyMuPDFLoader("https://singjupost.com/wp-content/uploads/2014/07/Steve-Jobs-iPhone-2007-Presentation-Full-Transcript.pdf").load()
     

## Chunking Our Documents

In [79]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

split_chunks = text_splitter.split_documents(docs)

In [80]:
len(split_chunks)


86

In [81]:
max_chunk_length = 0

for chunk in split_chunks:
  max_chunk_length = max(max_chunk_length, tiktoken_len(chunk.page_content))

print(max_chunk_length)

197


## Embeddings and Vector Storage

In [82]:
from langchain_community.vectorstores import Qdrant

qdrant_vectorstore = Qdrant.from_documents(
    split_chunks,
    embedding_model,
    location=":memory:",
    collection_name="Steve Job's Speech",
)

In [83]:
qdrant_retriever = qdrant_vectorstore.as_retriever()

## Setting up our RAG

## Activity #2:


In [84]:
RAG_PROMPT = """
CONTEXT:
{context}

QUERY:
{question}

Use the provide context to answer the provided user query. Only use the provided context to answer the query. If you do not know the answer, response with "I don't know"
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

## Our RAG Chain

In [85]:
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

retrieval_augmented_qa_chain = (
    # INVOKE CHAIN WITH: {"question" : "<>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | qdrant_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | openai_chat_model, "context": itemgetter("context")}
)

In [86]:
!pip install -qU grandalf

The folder you are executing pip from can no longer be found.


In [87]:
print(retrieval_augmented_qa_chain.get_graph().draw_ascii())


                       +---------------------------------+                         
                       | Parallel<context,question>Input |                         
                       +---------------------------------+                         
                           *****                   ****                            
                        ***                            ****                        
                     ***                                   ****                    
+--------------------------------+                             **                  
| Lambda(itemgetter('question')) |                              *                  
+--------------------------------+                              *                  
                 *                                              *                  
                 *                                              *                  
                 *                                              *           

In [88]:
response = retrieval_augmented_qa_chain.invoke({"question" : "What is the most important thing about the iPhone?"})


In [89]:
response["response"].content


'The most important thing about the iPhone, as highlighted in the provided context, is its comprehensive and advanced design that integrates various high technologies. This includes features such as a multi-touch screen, miniaturization, custom silicon, power management, OSX inside a mobile device, advanced sensors, desktop-class applications, and its function as a widescreen video iPod. The iPhone is described as the "ultimate digital device" that can hold your life in your pocket, representing significant innovation in mobile technology.\n\n'

In [90]:
for context in response["context"]:
  print("Context:")
  print(context)
  print("----")

Context:
page_content='of the art in every facet of this design. So let me just talk a little bit about it here. We’ve got\nthe multi-touch screen. A first. Miniaturization, more than any we’ve done before. A lot of\ncustom silicon. Tremendous power management. OSX inside a mobile device. Featherweight\nprecision enclosures. Three advanced sensors. Desktop class applications, and of course, the\nwidescreen video iPod. We’ve been innovating like crazy for the last few years on this, and\nwe filed for over 200 patents for all the inventions in iPhone, and we intend to protect them.\nSo, a lot of high technology. I think we’re advancing the state of the art in every aspect of\nthis design. So iPhone is like having your life in your pocket. It’s the ultimate digital device.' metadata={'source': 'https://singjupost.com/wp-content/uploads/2014/07/Steve-Jobs-iPhone-2007-Presentation-Full-Transcript.pdf', 'file_path': 'https://singjupost.com/wp-content/uploads/2014/07/Steve-Jobs-iPhone-2007-Pr

In [91]:
response = retrieval_augmented_qa_chain.invoke({"question" : "What is the airspeed velocity of an unladen swallow?"})


In [92]:
response["response"].content


"I don't know."

## ❓ Question #4:
What key innovations did the iPhone introduce?

In [93]:
response = retrieval_augmented_qa_chain.invoke({"question" : "What key innovations did the iPhone introduce?"})
response["response"].content


'The iPhone introduced several key innovations according to the provided context:\n\n1. **Combination of Multiple Devices**: It combined three devices into one — a widescreen iPod with touch controls, a revolutionary mobile phone, and a breakthrough Internet communications device.\n   \n2. **Touch Controls**: It featured touch controls that allowed users to interact with their music in new ways.\n\n3. **Ease of Making Calls**: It simplified making calls by allowing users to sync their contacts from their PC or Mac, making it easier to access and use contact information.\n\n4. **Multi-Touch Screen**: It included a multi-touch screen, which was a first in mobile devices.\n\n5. **Miniaturization and Custom Silicon**: The iPhone featured advanced miniaturization and custom silicon for better performance and power management.\n\n6. **OSX in a Mobile Device**: It incorporated OSX, providing a powerful operating system in a mobile device.\n\n7. **Advanced Sensors**: The iPhone had three advan