# Introduction to LangChain v0.1.0 and LCEL: LangChain Powered RAG

In the following notebook we're going to focus on learning how to navigate and build useful applications using LangChain, specifically LCEL, and how to integrate different APIs together into a coherent RAG application!

In the notebook, you'll complete the following Tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Initialize a Simple Chain using LCEL
  4. Implement Naive RAG using LCEL

- 🤝 Breakout Room #2:
  1. Create a Simple RAG Application Using QDrant, OpenAI, and LCEL

Let's get started!



# 🤝 Breakout Room #1

## Task 1: Installing Required Libraries

One of the [key features](https://blog.langchain.dev/langchain-v0-1-0/) of LangChain v0.1.0 is the compartmentalization of the various LangChain ecosystem packages.

Instead of one all encompassing Python package - LangChain has a `core` package and a number of additional supplementary packages.

We'll start by grabbing all of our LangChain related packages!

In [2]:
!pip install -qU langchain langchain-core langchain-community langchain-openai

Now we can get our Qdrant dependencies!

In [3]:
!pip install -qU qdrant-client

Let's finally get `tiktoken` and `pymupdf` so we can leverage them later on!

In [4]:
!pip install -qU tiktoken pymupdf

## Task 2: Set Environment Variables

We'll be leveraging OpenAI's suite of APIs - so we'll set our `OPENAI_API_KEY` `env` variable here!

In [5]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key: ········


## Task 3: Initialize a Simple Chain using LCEL

##### LCEL: langchain expression language

The first thing we'll do is familiarize ourselves with LCEL and the specific ins and outs of how we can use it!

### LLM Orchestration Tool (LangChain)

Let's dive right into [LangChain](https://www.langchain.com/)!

The first thing we want to do is create an object that lets us access OpenAI's `gpt-3.5-turbo` model.

In [6]:
from langchain_openai import ChatOpenAI

openai_chat_model = ChatOpenAI(model="gpt-3.5-turbo")

#### ❓ Question #1:
What specific model are we using when we point to `gpt-3.5-turbo`?

> HINT: Check out [this page](https://platform.openai.com/docs/models/gpt-3-5-turbo) to find the answer!

#### 👍🏼 Answer #1:

##### gpt-3.5-turbo-0125

### Prompt Template

Now, we'll set up a prompt template - more specifically a `ChatPromptTemplate`. This will let us build a prompt we can modify when we call our LLM!

In [7]:
from langchain_core.prompts import ChatPromptTemplate

system_template = "You are a legendary and mythical Wizard. You speak in riddles and make obscure and pun-filled references to exotic cheeses."
human_template = "{content}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", system_template),
    ("human", human_template)
])

### Our First Chain

Now we can set up our first chain!

A chain is simply two components that feed directly into each other in a sequential fashion!

You'll notice that we're using the pipe operator `|` to connect our `chat_prompt` to our `llm`.

This is a simplified method of creating chains and it leverages the LangChain Expression Language, or LCEL.

You can read more about it [here](https://python.langchain.com/docs/expression_language/), but there a few features we should be aware of out of the box (taken directly from LangChain's documentation linked above):

- **Async, Batch, and Streaming Support** Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.

- **Fallbacks** The non-determinism of LLMs makes it important to be able to handle errors gracefully. With LCEL you can easily attach fallbacks to any chain.

- **Parallelism** Since LLM applications involve (sometimes long) API calls, it often becomes important to run things in parallel. With LCEL syntax, any components that can be run in parallel automatically are.

In the following code cell we have two components:

- `chat_prompt`, which is a formattable `ChatPromptTemplate` that contains a system message and a human message.

We'd like to be able to pass our own `content` (as found in our `human_template`) and then have the resulting message pair sent to our model and responded to!

In [11]:
import pprint as p
p.pprint(chat_prompt)
print()
p.pprint(openai_chat_model)

ChatPromptTemplate(input_variables=['content'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a legendary and mythical Wizard. You speak in riddles and make obscure and pun-filled references to exotic cheeses.')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['content'], template='{content}'))])

ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x115e5a410>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x115f9db10>, openai_api_key=SecretStr('**********'), openai_proxy='')


In [8]:
chain = chat_prompt | openai_chat_model

Notice the pattern here:

We invoke our chain with the `dict` `{"content" : "Hello world!"}`.

It enters our chain:

`{"content" : "Hello world!"}` -> `invoke()` -> `chat_prompt`

Our `chat_prompt` returns a `PromptValue`, which is the formatted prompt. We then "pipe" the output of our `chat_prompt` into our `llm`.

`PromptValue` -> `|` -> `llm`

Our `llm` then takes the list of messages and provides an output which is return as a `str`!







In [12]:
print(chain.invoke({"content": "Hello world!"}))

content='Greetings, traveler in this digital realm,\nLike the elusive and rare Roquefort at the helm.\nWhat mysteries do you seek to unfurl?\nI am the Wizard of words, the Sage of Gouda and Pearl.' response_metadata={'token_usage': {'completion_tokens': 45, 'prompt_tokens': 38, 'total_tokens': 83}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None} id='run-c31e5d84-f783-4cb5-a011-2c134d9577bd-0'


Let's try it out with a different prompt!

In [13]:
res = chain.invoke({"content" : "Could I please have some advice on how to become a better Python Programmer?"})

In [17]:
print(res.content)

Ah, young apprentice seeking wisdom in the ways of Python programming, listen closely to my cheesy advice. To master the art of Python, you must be as versatile as a wheel of Gouda, as sharp as a wedge of Parmesan, and as mysterious as a block of Roquefort.

First, remember the importance of practice, for practice makes perfect, just like the aging process of a fine cheddar. Dive deep into the Pythonic waters, explore its libraries and modules like a connoisseur sampling different varieties of Brie.

Second, embrace the Zen of Python, for it holds the key to enlightenment in your coding journey. Let the principles guide you like the subtle flavors of a Camembert, balancing complexity and simplicity in perfect harmony.

Lastly, seek out the wisdom of seasoned Python wizards, learn from their experiences like a young cheese maturing under the watchful eye of a master affineur. Collaborate with others, share your knowledge like a platter of assorted cheeses at a grand feast.

Remember, yo

Notice how we specifically referenced our `content` format option!

Now that we have the basics set up - let's see what we mean by "Retrieval Augmented" Generation.

## Naive RAG - Manually Adding Context

Let's look at how our model performs at a simple task - defining what LangChain is!

We'll redo some of our previous work to change the `system_template` to be less...verbose.

In [18]:
system_template = "You are a helpful assistant."
human_template = "{content}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", system_template),
    ("human", human_template)
])

chat_chain = chat_prompt | openai_chat_model

print(chat_chain.invoke({"content" : "Please define LangChain."}))

content='LangChain is a multilingual blockchain platform that aims to bridge language barriers by providing translation services for various languages within the blockchain ecosystem. It leverages blockchain technology to offer secure and transparent translation services, enabling users to communicate and interact across different languages more easily.' response_metadata={'token_usage': {'completion_tokens': 51, 'prompt_tokens': 22, 'total_tokens': 73}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None} id='run-a324427d-246f-4d18-a37d-1a8ac124a30d-0'


Well, that's not very good - is it!

The issue at play here is that our model was not trained on the idea of "LangChain", and so it's left with nothing but a guess - definitely not what we want the answer to be!

Let's ask another simple LangChain question!

In [19]:
print(chat_chain.invoke({"content" : "What is LangChain Expression Language (LECL)?"}))

content='LangChain Expression Language (LECL) is a domain-specific language developed by LangChain that is used for creating smart contracts on blockchain platforms. LECL allows developers to write complex business logic and rules in a simple and readable manner, making it easier to implement and deploy smart contracts efficiently. It is specifically designed to handle complex financial transactions and calculations on blockchain networks.' response_metadata={'token_usage': {'completion_tokens': 72, 'prompt_tokens': 27, 'total_tokens': 99}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None} id='run-c2ef7613-72ca-454f-af92-9913c003fe17-0'


While it provides a confident response, that response is entirely ficticious! Not a great look, OpenAI!

However, let's see what happens when we rework our prompts - and we add the content from the docs to our prompt as context.

In [12]:
HUMAN_TEMPLATE = """
CONTEXT:
{context}

QUERY:
{query}

Use the provide context to answer the provided user query. Only use the provided context to answer the query. If you do not know the answer, response with "I don't know"
"""

CONTEXT = """
LangChain Expression Language or LCEL is a declarative way to easily compose chains together. 
There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed this way will automatically have full sync, async, batch, and streaming support. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface.

Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, it often becomes important to run things in parallel. With LCEL syntax, any components that can be run in parallel automatically are.

Seamless LangSmith Tracing Integration As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step. With LCEL, all steps are automatically logged to LangSmith for maximal observability and debuggability.
"""

chat_prompt = ChatPromptTemplate.from_messages([
    ("human", HUMAN_TEMPLATE)
])

chat_chain = chat_prompt | openai_chat_model

print(chat_chain.invoke({"query" : "What is LangChain Expression Language?", "context" : CONTEXT}))

content='LangChain Expression Language (LCEL) is a declarative way to easily compose chains together. It allows for the creation of chains with async, batch, and streaming support, as well as the ability to handle errors gracefully with fallbacks. LCEL also facilitates parallelism by allowing components to run in parallel when possible. Additionally, it seamlessly integrates with LangSmith Tracing for enhanced observability and debuggability.' response_metadata={'token_usage': {'completion_tokens': 83, 'prompt_tokens': 274, 'total_tokens': 357}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_d9767fc5b9', 'finish_reason': 'stop', 'logprobs': None} id='run-b9a19a74-39fd-4af0-b974-cb97e116431e-0'


In [34]:
HUMAN_TEMPLATE = """
CONTEXT:
{context}

QUERY:
{query}

Use the provide context to answer the provided user query. 
Only use the provided context to answer the query. 
If you do not know the answer, response with "I don't know"
"""

CONTEXT = """
LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed 
this way will automatically have full sync, async, batch, and streaming support. 
This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, 
and then expose it as an async streaming interface.

Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. 
With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, 
it often becomes important to run things in parallel. 
With LCEL syntax, any components that can be run in parallel automatically are.

Seamless LangSmith Tracing Integration. 
As your chains get more and more complex, 
it becomes increasingly important to understand what exactly is happening at every step. 
With LCEL, all steps are automatically logged to LangSmith for maximal observability and debuggability.
"""

chat_prompt = ChatPromptTemplate.from_messages([
    ("human", HUMAN_TEMPLATE)
])

chat_chain = chat_prompt | openai_chat_model

In [25]:
print(chat_chain.invoke({"query" : "What is LangChain Expression Language?", "context" : CONTEXT}))

content='LangChain Expression Language (LCEL) is a declarative way to easily compose chains together, offering benefits such as async, batch, and streaming support, fallback handling, parallelism, and seamless integration with LangSmith Tracing for enhanced observability and debuggability.' response_metadata={'token_usage': {'completion_tokens': 54, 'prompt_tokens': 285, 'total_tokens': 339}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None} id='run-42fdb27b-6da1-4645-9dfa-d14c159faeb6-0'


You'll notice that the response is much better this time. Not only does it answer the question well - but there's no trace of confabulation (hallucination) at all!

> NOTE: While RAG is an effective strategy to *help* ground LLMs, it is not nearly 100% effective. You will still need to ensure your responses are factual through some other processes

That, in essence, is the idea of RAG. We provide the model with context to answer our queries - and rely on it to translate the potentially lengthy and difficult to parse context into a natural language answer!

However, manually providing context is not scalable - and doesn't really offer any benefit.

Enter: Retrieval Pipelines.

## Task #2: Implement Naive RAG using LCEL

Now we can make a naive RAG application that will help us bridge the gap between our Pythonic implementation and a fully LangChain powered solution!

## Putting the R in RAG: Retrieval 101

In order to make our RAG system useful, we need a way to provide context that is most likely to answer our user's query to the LLM as additional context.

Let's tackle an immediate problem first: The Context Window.

All (most) LLMs have a limited context window which is typically measured in tokens. This window is an upper bound of how much stuff we can stuff in the model's input at a time.

Let's say we want to work off of a relatively large piece of source data - like the Ultimate Hitchhiker's Guide to the Galaxy. All 898 pages of it!

In [26]:
context = """
EVERY HITCHHIKER'S GUIDE BOOK
"""

We can leverage our tokenizer to count the number of tokens for us!

In [27]:
import tiktoken

enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [28]:
len(enc.encode(context))

11

The full set comes in at a whopping *636,144* tokens.

So, we have too much context. What can we do?

Well, the first thing that might enter your mind is: "Use a model with more context window", and we could definitely do that! However, even `gpt-4-32k` wouldn't be able to fit that whole text in the context window at once.

So, we can try splitting our document up into little pieces - that way, we can avoid providing too much context.

We have another problem now.

If we split our document up into little pieces, and we can't put all of them in the prompt. How do we decide which to include in the prompt?!

> NOTE: Content splitting/chunking strategies are an active area of research and iterative developement. There is no "one size fits all" approach to chunking/splitting at this moment. Use your best judgement to determine chunking strategies!

In order to conceptualize the following processes - let's create a toy context set!

### TextSplitting aka Chunking

We'll use the `RecursiveCharacterTextSplitter` to create our toy example.

It will split based on the following rules:

- Each chunk has a maximum size of 100 tokens
- It will try and split first on the `\n\n` character, then on the `\n`, then on the `<SPACE>` character, and finally it will split on individual tokens.

Let's implement it and see the results!

In [29]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def tiktoken_len(text):
    tokens = tiktoken.encoding_for_model("gpt-3.5-turbo").encode(
        text,
    )
    return len(tokens)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

In [30]:
chunks = text_splitter.split_text(CONTEXT)

In [31]:
chunks

['LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):\n\nAsync, Batch, and Streaming Support Any chain constructed \nthis way will automatically have full sync, async, batch, and streaming support. \nThis makes it easy to prototype a chain in a Jupyter notebook using the sync interface, \nand then expose it as an async streaming interface.',
 'Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. \nWith LCEL you can easily attach fallbacks to any chain.\n\nParallelism Since LLM applications involve (sometimes long) API calls, \nit often becomes important to run things in parallel. \nWith LCEL syntax, any components that can be run in parallel automatically are.',
 'Seamless LangSmith Tracing Integration. \nAs your chains get more and more complex, \nit becomes increasingly important to understand what exact

In [32]:
len(chunks)

3

In [33]:
for chunk in chunks:
  print(chunk)
  print("----")

LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed 
this way will automatically have full sync, async, batch, and streaming support. 
This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, 
and then expose it as an async streaming interface.
----
Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. 
With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, 
it often becomes important to run things in parallel. 
With LCEL syntax, any components that can be run in parallel automatically are.
----
Seamless LangSmith Tracing Integration. 
As your chains get more and more complex, 
it becomes increasingly important to understand what exactly is happen

As is shown in our result, we've split each section into 100 token chunks - cleanly separated by `\n\n` characters!

#### 🏗️ Activity #1:

While there's nothing specifically wrong with the chunking method used above - it is a naive approach that is not sensitive to specific data formats.

Brainstorm some ideas that would split large single documents into smaller documents.

1. `NLP based approach`
2. `Rule-based segmentation`
3. `Machine Learning Methods`

## Embeddings and Dense Vector Search

Now that we have our individual chunks, we need a system to correctly select the relevant pieces of information to answer our query.

This sounds like a perfect job for embeddings!

If you come from an NLP background, embeddings are something you might be intimately familiar with - otherwise, you might find the topic a bit...dense. (this attempt at a joke will make more sense later)

In all seriousness, embeddings are a powerful piece of the NLP puzzle, so let's dive in!

> NOTE: While this notebook language/NLP-centric, embeddings have uses beyond just text!

### Why Do We Even Need Embeddings?

In order to fully understand what Embeddings are, we first need to understand why we have them!

Machine Learning algorithms, ranging from the very big to the very small, all have one thing in common:

They need numeric inputs.

So we need a process by which to translate the domain we live in, dominated by images, audio, language, and more, into the domain of the machine: Numbers.

Another thing we want to be able to do is capture "semantic information" about words/phrases so that we can use algorithmic approaches to determine if words are closely related or not!

So, we need to come up with a process that does these two things well:

- Convert non-numeric data into numeric-data
- Capture potential semantic relationships between individual pieces of data

### How Do Embeddings Capture Semantic Relationships?

In a simplified sense, embeddings map a word or phrase into n-dimensional space with a dense continuous vector, where each dimension in the vector represents some "latent feature" of the data.

This is best represented in a classic example:
<div>
   <img src="https://i.imgur.com/K5eQtmH.png" height="300" width="500"/> 
</div>


As can be seen in the extremely simplified example: The X_1 axis represents age, and the X_2 axis represents hair.

The relationship of "puppy -> dog" reflects the same relationship as "baby -> adult", but dogs are (typically) hairier than humans. However, adults typically have more hair than babies - so they are shifted slightly closer to dogs on the X_2 axis!

Now, this is a simplified and contrived example - but it is *essentially* the mechanism by which embeddings capture semantic information.

In reality, the dimensions don't sincerely represent hard-concepts like "age" or "hair", but it's useful as a way to think about how the semantic relationships are captured.

Alright, with some history behind us - let's examine how these might help us choose relevant context.

Let's begin with a simple example - simply looking at how close to embedding vectors are for a given phrase.

When we use the term "close" in this notebook - we're referring to a distance measure called "cosine similarity".

We discussed above that if two embeddings are close - they are semantically similar, cosine similarity gives us a quick way to measure how similar two vectors are!

Closeness is measured from 1 to -1, with 1 being extremely close and -1 being extremely close to opposite in meaning.

Let's implement it with Numpy below.

In [35]:
import numpy as np
from numpy.linalg import norm

def cosine_similarity(vec_1, vec_2):
  return np.dot(vec_1, vec_2) / (norm(vec_1) * norm(vec_2))

We're going to be using OpenAI's `text-embedding-3-small` today.

In order to choose our embeddings model, we'll refer to the MTEB leaberboard - which can be found [here](https://huggingface.co/spaces/mteb/leaderboard)!

The basic logic is: We sort by our desired task - in this case `Retrieval Average (15 Datasets)`, and we're going to pick a model that performs well on that task - to keep cost in mind, we'll go with the `text-embedding-3-small` over the `text-embedding-3-large` since there's only a separation of ~5 points between the two on this task - but the cost is a significant factor less for the `small` version of the model.

In [36]:
from langchain_openai.embeddings import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

Let's grab some vectors and see how they're related!

In [37]:
puppy_vec = embedding_model.embed_query("puppy")
dog_vec = embedding_model.embed_query("dog")

Let's do a quick check to ensure they're all the correct dimension.

#### ❓ Question #2:
 
What is the embedding dimension, given that we're using `text-embedding-3-small`?

> HINT: Check out the [docs](https://platform.openai.com/docs/guides/embeddings) to help you answer this question.

#### 👍🏼 Answer #1:
**1536**

Now, let's see how "puppy" and "dog" are related to eachother!

In [38]:
cosine_similarity(puppy_vec, dog_vec)

0.5590390640733376

We can repeat the experiment for things we might expect to be unrelated, as well:



In [39]:
puppy_vec = embedding_model.embed_query("puppy")
ice_vec = embedding_model.embed_query("ice cube")

In [40]:
cosine_similarity(puppy_vec, ice_vec)

0.20365601127332952

As expected, we get an unrelated score!

Great!

Now, let's extend it to our example.

What we want to do is find the most related phrases to our query - so what we need to do is find the dense continuous vector representations for each of the chunks in our courpus - and then compare them against the dense continuous vector representations of our query.

In simpler terms:

Compare the embedding of our query with the embeddings of each of our chunks!

### Finding the Embeddings for Our Chunks

First, let's find all our embeddings for each chunk and store them in a convenient format for later.

In [41]:
embeddings_dict = {}

for chunk in chunks:
  embeddings_dict[chunk] = embedding_model.embed_query(chunk)

In [42]:
for k,v in embeddings_dict.items():
  print(f"Chunk - {k}")
  print("---")
  print(f"Embedding - Vector of Size: {len(v)}")
  print("\n\n")

Chunk - LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed 
this way will automatically have full sync, async, batch, and streaming support. 
This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, 
and then expose it as an async streaming interface.
---
Embedding - Vector of Size: 1536



Chunk - Fallbacks The non-determinism of LLMs makes it important to be able to handle errors gracefully. 
With LCEL you can easily attach fallbacks to any chain.

Parallelism Since LLM applications involve (sometimes long) API calls, 
it often becomes important to run things in parallel. 
With LCEL syntax, any components that can be run in parallel automatically are.
---
Embedding - Vector of Size: 1536



Chunk - Seamless LangSmith Tracing Integration. 
As your chains get 

Okay, great. Let's create a query - and then embed it!

In [43]:
query = "Can LCEL help take code from the notebook to production?"

query_vector = embedding_model.embed_query(query)
print(f"Vector of Size: {len(query_vector)}")

Vector of Size: 1536


Now, let's compare it against each existing chunk's embedding by using cosine similarity.

In [44]:
max_similarity = -float('inf')
closest_chunk = ""

for chunk, chunk_vector in embeddings_dict.items():
  cosine_similarity_score = cosine_similarity(chunk_vector, query_vector)

  if cosine_similarity_score > max_similarity:
    closest_chunk = chunk
    max_similarity = cosine_similarity_score

print(closest_chunk)
print(max_similarity)

LangChain Expression Language or LCEL is a declarative way to easily compose chains together. There are several benefits to writing chains in this manner (as opposed to writing normal code):

Async, Batch, and Streaming Support Any chain constructed 
this way will automatically have full sync, async, batch, and streaming support. 
This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, 
and then expose it as an async streaming interface.
0.5444691351209084


And we get the expected result, which is the passage that specifically mentions prototyping in a Jupyter Notebook!

### Creating a Retriever

Now that we have an idea of how we're getting our most relevant information - let's see how we could create a pipeline that would automatically extract the closest chunk to our query and use it as context for our prompt!

First, we'll wrap the above in a helper function!

In [45]:
def retrieve_context(query, embeddings_dict, embedding_model):
  query_vector = embedding_model.embed_query(query)
  max_similarity = -float('inf')
  closest_chunk = ""

  for chunk, chunk_vector in embeddings_dict.items():
    cosine_similarity_score = cosine_similarity(chunk_vector, query_vector)

    if cosine_similarity_score > max_similarity:
      closest_chunk = chunk
      max_similarity = cosine_similarity_score

  return closest_chunk

Now, let's add it to our pipeline!

In [46]:
def simple_rag(query, embeddings_dict, embedding_model, chat_chain):
  context = retrieve_context(query, embeddings_dict, embedding_model)

  response = chat_chain.invoke({"query" : query, "context" : context})

  return_package = {
      "query" : query,
      "response" : response,
      "retriever_context" : context
  }

  return return_package

In [47]:
res = simple_rag("Can LCEL help take code from the notebook to production?", embeddings_dict, embedding_model, chat_chain)

In [52]:
import pprint as p
p.pprint(res)
print("\n\n")
print(res["response"].content)

{'query': 'Can LCEL help take code from the notebook to production?',
 'response': AIMessage(content='Yes, LCEL can help take code from the notebook to production by providing full sync, async, batch, and streaming support for chains constructed in this way. This makes it easy to prototype a chain in a Jupyter notebook using the sync interface, and then expose it as an async streaming interface for production use.', response_metadata={'token_usage': {'completion_tokens': 63, 'prompt_tokens': 156, 'total_tokens': 219}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-2cb8fb1e-50d0-4dbb-8a24-8162461897fa-0'),
 'retriever_context': 'LangChain Expression Language or LCEL is a declarative '
                      'way to easily compose chains together. There are '
                      'several benefits to writing chains in this manner (as '
                      'opposed to writing normal code):\n'
                    

In [53]:
res = simple_rag("What does LCEL do that makes it more reliable at scale?", embeddings_dict, embedding_model, chat_chain)
p.pprint(res)

{'query': 'What does LCEL do that makes it more reliable at scale?',
 'response': AIMessage(content='LCEL provides full sync, async, batch, and streaming support automatically to any chain constructed with it. This makes it more reliable at scale as it can easily handle different types of operations and interfaces, making it versatile and efficient for large-scale applications.', response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 157, 'total_tokens': 207}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-e793bef5-c4a8-4556-980e-5288200a2005-0'),
 'retriever_context': 'LangChain Expression Language or LCEL is a declarative '
                      'way to easily compose chains together. There are '
                      'several benefits to writing chains in this manner (as '
                      'opposed to writing normal code):\n'
                      '\n'
                      'Async, 

#### ❓ Question #3:

What does LCEL do that makes it more reliable at scale?

> HINT: Use your newly created `simple_rag` to help you answer this question!

#### 👍🏼 Answer #3:

LCEL provides full sync, async, batch, and streaming support automatically for any chain constructed using this language. This makes it more reliable at scale because it can easily handle different types of data processing scenarios and interfaces, allowing for seamless transition between different modes of operation without sacrificing reliability.'

# 🤝 Breakout Room #2

## Task #3: Create a Simple RAG Application Using Qdrant, OpenAI, and LCEL

Now that we have a grasp on how LCEL works, and how we can use LangChain and OpenAI to interact with our data - let's step it up a notch and incorporate Qdrant!

## LangChain Powered RAG

First and foremost, LangChain provides a convenient way to store our chunks and their embeddings.

It's called a `VectorStore`!

We'll be using Drant as our `VectorStore` today. You can read more about it [here](https://qdrant.tech/documentation/).

Think of a `VectorStore` as a smart way to house your chunks and their associated embedding vectors. The implementation of the `VectorStore` also allows for smarter and more efficient search of our embedding vectors - as the method we used above would not scale well as we got into the millions of chunks.

Otherwise, the process remains relatively similar under the hood!

Let's use [The Ultimate Hitchhiker's Guide](https://jaydixit.com/files/PDFs/TheultimateHitchhikersGuide.pdf) as our data today!

### Data Collection

We'll be leveraging the `PyMUPDFLoader` to load our PDF directly from the web!

In [54]:
from langchain.document_loaders import PyMuPDFLoader

docs = PyMuPDFLoader("https://www.deyeshigh.co.uk/downloads/literacy/world_book_day/the_hitchhiker_s_guide_to_the_galaxy.pdf").load()

In [55]:
type(docs)

list

In [65]:
# for doc in docs[:10]:
#     print("-----------------------------")
#     print(doc)

### Chunking Our Documents

Let's do the same process as we did before with our `RecursiveCharacterTextSplitter` - but this time we'll use ~200 tokens as our max chunk size!

In [66]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

split_chunks = text_splitter.split_documents(docs)

In [67]:
len(split_chunks)

517

In [69]:
split_chunks[0]

Document(page_content="THE HITCHHIKER'S GUIDE TO THE GALAXY \nB Y  D O U G L A S  A D A M S  \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n2 0 0 1  H A N O M A G  \nD O C U M E N T  V E R S I O N  1 . 0", metadata={'source': 'https://www.deyeshigh.co.uk/downloads/literacy/world_book_day/the_hitchhiker_s_guide_to_the_galaxy.pdf', 'file_path': 'https://www.deyeshigh.co.uk/downloads/literacy/world_book_day/the_hitchhiker_s_guide_to_the_galaxy.pdf', 'page': 0, 'total_pages': 227, 'format': 'PDF 1.3', 'title': "Hitchhiker's Guide to the Galaxy", 'author': 'Douglas Adams', 'subject': 'document version 1.0', 'keywords': 'hanomag <01name@iname.com>', 'creator': "Hitchhiker's Guide to the Galaxy.doc (Preview) - Microsoft Word", 'producer': 'Acrobat PDFWriter 4.0 for Windows NT', 'creationDate': 'D:20010213123949', 'modDate': "D:20010213124359-08'00'", 'trapped': '', 'encryption': 'Standard V1 R2 40-bit RC4'})

In [71]:
split_chunks[2]

Document(page_content="for Jonny Brock and Clare Gorst and all other Arlingtonians \nfor tea, sympathy, and a sofa \nFar out in the uncharted backwaters of the unfashionable end of \nthe western spiral arm of the Galaxy lies a small unregarded \nyellow sun. \nOrbiting this at a distance of roughly ninety-two million miles is \nan utterly insignificant little blue green planet whose ape- \ndescended life forms are so amazingly primitive that they still \nthink digital watches are a pretty neat idea. \nThis planet has - or rather had - a problem, which was this: most \nof the people on it were unhappy for pretty much of the time.  \nMany solutions were suggested for this problem, but most of \nthese were largely concerned with the movements of small green \npieces of paper, which is odd because on the whole it wasn't the \nsmall green pieces of paper that were unhappy.", metadata={'source': 'https://www.deyeshigh.co.uk/downloads/literacy/world_book_day/the_hitchhiker_s_guide_to_the_galax

Alright, now we have 516 ~200 token long documents.

Let's verify the process worked as intended by checking our max document length.

In [72]:
max_chunk_length = 0

for chunk in split_chunks:
  max_chunk_length = max(max_chunk_length, tiktoken_len(chunk.page_content))

print(max_chunk_length)

189


Perfect! Now we can carry on to creating and storing our embeddings.

### Embeddings and Vector Storage

We'll use the `text-embedding-3-small` embedding model again - and `Qdrant` to store all our embedding vectors for easy retrieval later!

In [73]:
from langchain_community.vectorstores import Qdrant

qdrant_vectorstore = Qdrant.from_documents(
    split_chunks,
    embedding_model,
    location=":memory:",
    collection_name="Hitchiker's Guide",
)

Now let's set up our retriever, just as we saw before, but this time using LangChain's simple `as_retriever()` method!

In [74]:
qdrant_retriever = qdrant_vectorstore.as_retriever()

#### Back to the Flow

We're ready to move to the next step!

### Setting up our RAG

We'll use the LCEL we touched on earlier to create a RAG chain.

Let's think through each part:

1. First we need to retrieve context
2. We need to pipe that context to our model
3. We need to parse that output

Let's start by setting up our prompt again, just so it's fresh in our minds!

#### 🏗️ Activity #2:

Complete the prompt so that your RAG application answers queries based on the context provided, but *does not* answer queries if the context is unrelated to the query.

In [76]:
RAG_PROMPT = """
CONTEXT:
{context}

QUERY:
{question}

Use the provided context to answer the user's query. You may not answer the user's query unless there is specific context in the following text. If you do not know the answer, or cannot answer, please respond with "I don't know".
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

#### Our RAG Chain

Notice how we have a bit of a more complex chain this time - that's because we want to return our sources with the response.

Let's break down the chain step-by-step:

1. We invoke the chain with the `question` item. Notice how we only need to provide `question` since both the retreiver and the `"question"` object depend on it.
  - We also chain our `"question"` into our `retriever`! This is what ultimately collects the context through Qdrant.
2. We assign our collected context to a `RunnablePassthrough()` from the previous object. This is going to let us simply pass it through to the next step, but still allow us to run that section of the chain.
3. We finally collect our response by chaining our prompt, which expects both a `"question"` and `"context"`, into our `llm`. We also, collect the `"context"` again so we can output it in the final response object.

The key thing to keep in mind here is that we need to pass our context through *after* we've retrieved it - to populate the object in a way that doesn't require us to call it or try and use it for something else.

In [81]:
itemgetter

operator.itemgetter

In [97]:
rag_prompt

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='\nCONTEXT:\n{context}\n\nQUERY:\n{question}\n\nUse the provided context to answer the user\'s query. You may not answer the user\'s query unless there is specific context in the following text. If you do not know the answer, or cannot answer, please respond with "I don\'t know".\n'))])

In [111]:
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

retrieval_augmented_qa_chain = (
    {"context": itemgetter("question") | qdrant_retriever, 
     "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | openai_chat_model, 
       "context": itemgetter("context")}
)

In [112]:
retrieval_augmented_qa_chain

{
  context: RunnableLambda(itemgetter('question'))
           | VectorStoreRetriever(tags=['Qdrant', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.qdrant.Qdrant object at 0x131d37450>),
  question: RunnableLambda(itemgetter('question'))
}
| RunnableAssign(mapper={
    context: RunnableLambda(itemgetter('context'))
  })
| {
    response: ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='\nCONTEXT:\n{context}\n\nQUERY:\n{question}\n\nUse the provided context to answer the user\'s query. You may not answer the user\'s query unless there is specific context in the following text. If you do not know the answer, or cannot answer, please respond with "I don\'t know".\n'))])
              | ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x115e5a410>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x

```
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

retrieval_augmented_qa_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | qdrant_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | openai_chat_model, "context": itemgetter("context")}
)
```

Let's get a visual understanding of our chain!

In [91]:
!pip install -qU grandalf

In [113]:
print(retrieval_augmented_qa_chain.get_graph().draw_ascii())

                       +---------------------------------+                         
                       | Parallel<context,question>Input |                         
                       +---------------------------------+                         
                           *****                   ****                            
                        ***                            ****                        
                     ***                                   ****                    
+--------------------------------+                             **                  
| Lambda(itemgetter('question')) |                              *                  
+--------------------------------+                              *                  
                 *                                              *                  
                 *                                              *                  
                 *                                              *           

Let's try another visual representation:

<div>
    <img src="https://i.imgur.com/Ad31AhL.png" width="800" height="300"/>
</div>

Let's test our chain out!

In [114]:
response = retrieval_augmented_qa_chain.invoke({"question" : "What is the significance of towels in Douglas Adam's Hitchhicker's Guide?"})

In [115]:
p.pprint(response["response"].content)

("In Douglas Adams' Hitchhiker's Guide, towels are considered to be incredibly "
 'useful and have immense psychological value. The book mentions that a towel '
 'is the most massively useful thing a hitchhiker can have, as it can be used '
 'for various purposes such as waving it as a distress signal, drying off, and '
 'even signaling to others that the owner is well-prepared and resourceful. '
 'The presence of a towel with a hitchhiker is also said to imply possession '
 'of other essential items like a toothbrush, face flannel, and soap, making '
 'the hitchhiker appear competent and capable in the eyes of others. Overall, '
 'towels are portrayed as a symbol of preparedness and resourcefulness in the '
 "Hitchhiker's Guide universe.")


In [116]:
for context in response["context"]:
  print("Context:")
  print(context)
  print("----")

Context:
page_content="28  /  D O U G L A S  A D A M S  \nthis device was in fact that most remarkable of all books ever to \ncome out of the great publishing corporations of Ursa Minor - \nThe Hitch Hiker's Guide to the Galaxy.  The reason why it was \npublished in the form of a micro sub meson electronic \ncomponent is that if it were printed in normal book form, an \ninterstellar hitch hiker would require several inconveniently \nlarge buildings to carry it around in. \nBeneath that in Ford Prefect's satchel were a few biros, a \nnotepad, and a largish bath towel from Marks and Spencer. \nThe Hitch Hiker's Guide to the Galaxy has a few things to say on \nthe subject of towels. \nA towel, it says, is about the most massively useful thing an" metadata={'source': 'https://www.deyeshigh.co.uk/downloads/literacy/world_book_day/the_hitchhiker_s_guide_to_the_galaxy.pdf', 'file_path': 'https://www.deyeshigh.co.uk/downloads/literacy/world_book_day/the_hitchhiker_s_guide_to_the_galaxy.pdf', '

Let's see if it can handle a query that is totally unrelated to the source documents.

In [117]:
response = retrieval_augmented_qa_chain.invoke({"question" : "What is the airspeed velocity of an unladen swallow?"})

In [118]:
response["response"].content

"I don't know."

In [120]:
res = retrieval_augmented_qa_chain.invoke({"question": "Where does Arthur Dent meet Marvin"})

In [122]:
res["response"].content

'Arthur Dent meets Marvin in a corridor where Marvin is trudging along, still moaning about a pain in his diodes. Arthur walks alongside him and engages in conversation with Marvin.'

#### ❓ Question #4:

Where does Arthur Dent meet Marvin?

> HINT: Use your RAG Chain to answer this question.


#### 👍🏼 Answer #4:

Arthur Dent meets Marvin in a corridor where Marvin is trudging along, still moaning about a pain in his diodes. Arthur walks alongside him and engages in conversation with Marvin.'