# LangChain Expression Language

In [1]:
%pip install langchain langchain_openai --upgrade

Collecting langchain
  Downloading langchain-0.3.1-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain_openai
  Downloading langchain_openai-0.2.1-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-core<0.4.0,>=0.3.6 (from langchain)
  Downloading langchain_core-0.3.6-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.129-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting openai<2.0.0,>=1.40.0 (from langchain_openai)
  Downloading openai-1.50.2-py3-none-any.whl.metadata (24 kB)
Collecting tiktoken<1,>=0.7 (from langchain_openai)
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting jsonpatch<

In [2]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

··········


In [3]:
from langchain_core.runnables import (
    RunnablePassthrough,
    RunnableParallel,
    RunnableLambda,
)

---

## Accessing Previous Values using RunnablePassThrough

A runnable to passthrough inputs unchanged or with additional keys.

This runnable behaves almost like the identity function, except that it can be configured to add additional keys to the output, if the input is a dict.

The examples below demonstrate this runnable works using a few simple chains. The chains rely on simple lambdas to make the examples easy to execute and experiment with.

In [4]:
runnable = RunnableParallel(
    origin=RunnablePassthrough(),
    modified=lambda x: x+1
)

print(runnable.invoke(1)) # {'origin': 1, 'modified': 2}


def fake_llm(prompt: str) -> str: # Fake LLM for the example
    return prompt + " world"

chain = RunnableLambda(fake_llm) | {
    'original': RunnablePassthrough(), # Original LLM output
    'parsed': lambda text: text[::-1] # Parsing logic
}

chain.invoke('hello')

{'origin': 1, 'modified': 2}


{'original': 'hello world', 'parsed': 'dlrow olleh'}

---

## Prompt + Model

In [5]:
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

chat = ChatOpenAI()
prompt = ChatPromptTemplate.from_template('Tell me a joke about {topic}')

chain = prompt | chat
print(chain)

first=ChatPromptTemplate(input_variables=['topic'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['topic'], input_types={}, partial_variables={}, template='Tell me a joke about {topic}'), additional_kwargs={})]) middle=[] last=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x78fe141434c0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x78fe141892a0>, root_client=<openai.OpenAI object at 0x78fe1577b190>, root_async_client=<openai.AsyncOpenAI object at 0x78fe141434f0>, model_kwargs={}, openai_api_key=SecretStr('**********'))


In [6]:
print("first", chain.first)
print("last", chain.last)

first input_variables=['topic'] input_types={} partial_variables={} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['topic'], input_types={}, partial_variables={}, template='Tell me a joke about {topic}'), additional_kwargs={})]
last client=<openai.resources.chat.completions.Completions object at 0x78fe141434c0> async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x78fe141892a0> root_client=<openai.OpenAI object at 0x78fe1577b190> root_async_client=<openai.AsyncOpenAI object at 0x78fe141434f0> model_kwargs={} openai_api_key=SecretStr('**********')


In [7]:
# Stream:
print('\n\nStream:\n')
for s in chain.stream({"topic": "bears"}):
    print(s.content, end="", flush=True)

# Invoke:
print('\n\nInvoke:\n')
print(chain.invoke({"topic": "bears"}).content)

# Batch:
print('\n\nBatch:\n')
print(chain.batch([{"topic": "bears"}, {"topic": "bears"}, {"topic": "bears"}]))



Stream:

Why did the bear dissolve in water?

Because it was polar!

Invoke:

Why did the bear break up with his girlfriend? 

Because he couldn't bear the relationship any longer!


Batch:

[AIMessage(content="Why did the bear break up with his girlfriend? \n\nBecause he couldn't bear the relationship any longer!", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 13, 'total_tokens': 34, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-c4a387af-ac64-498c-84dd-e41431ae3fb6-0', usage_metadata={'input_tokens': 13, 'output_tokens': 21, 'total_tokens': 34}), AIMessage(content="Why did the bear break up with his girlfriend? \n\nBecause he couldn't bear the relationship any longer!", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 13, 'total_

---

## Retrieval Augmented Generation (RAG) in LCEL

**Retrieval Augmented Generation (RAG)** is a powerful technique that combines information retrieval and generative models to improve the quality of generated text, especially in cases where the model doesn't have enough specific knowledge. RAG enables language models to answer queries or generate text based on external data sources, such as documents or knowledge bases, by retrieving relevant information before generating the final response.

In the context of **LangChain Expression Language (LCEL)**, RAG can be implemented by chaining together components that handle information retrieval and text generation using LCEL's **runnables** and **pipe operators** (`|`).

Here’s an outline of how RAG can be built using LCEL:

### Key Components for RAG in LCEL

1. **Retriever**:
   - A module that retrieves relevant documents or information from an external knowledge source (e.g., databases, search engines, or vector stores).
   
2. **Generative Model**:
   - A large language model (LLM) such as GPT-3 or similar, which takes the retrieved information as context to generate the final answer or output.
   
3. **Pipeline (RunnableSequence)**:
   - A chain of operations where the query is passed through a retriever to get relevant documents, which are then combined with the generative model for output generation.

### Example of RAG in LangChain using LCEL

Let’s build a simple RAG pipeline using LangChain’s runnable components for retrieval and generation. Here’s how you can do that step by step.

#### Step 1: Import Necessary Libraries
You will need to import the necessary `runnable` and retriever components from LangChain.

```python
from langchain.schema.runnable import RunnableSequence, RunnableLambda
from langchain.llms import OpenAI
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
```

#### Step 2: Set up the Retriever
Assume you have a **vector store** (such as FAISS) that stores document embeddings. You need a retriever to query the vector store and fetch relevant documents based on the user input.

```python
# Load embeddings and initialize a FAISS vector store for retrieval
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.load_local("path_to_your_vectorstore", embeddings)

# Create a retriever from the vector store
retriever = vectorstore.as_retriever(search_type="similarity", k=5)
```

#### Step 3: Set up the Language Model
We will use a pre-trained language model (e.g., OpenAI’s GPT-3) for the generative part of the pipeline.

```python
# Load a language model (GPT-3) from OpenAI
llm = OpenAI(model="gpt-3.5-turbo")
```

#### Step 4: Create the RAG Pipeline with LCEL
Now we combine the retrieval and generation steps using **LCEL** by defining a pipeline that first retrieves relevant documents and then generates a response using the language model.

```python
# Define the RAG pipeline using LCEL
rag_pipeline = (
    RunnableLambda(lambda query: retriever.get_relevant_documents(query))  # Retrieval step
    | RunnableLambda(lambda docs: " ".join([doc.page_content for doc in docs]))  # Combine docs
    | llm.run  # Pass the combined documents to the language model
)

# Use the RAG pipeline to generate a response
query = "What is the capital of France?"
response = rag_pipeline.invoke(query)

print(response)
```

#### Step 5: Handling Multiple Queries with Batch Processing
To process multiple queries, you can use the `batch` method, which allows you to process multiple inputs in parallel.

```python
# Process multiple queries at once
queries = ["What is the capital of France?", "Explain the theory of relativity."]
responses = rag_pipeline.batch(queries)

print(responses)
```

### Explanation of the RAG Pipeline

1. **Retriever**: The retriever gets the most relevant documents from the vector store for the given query. This is done by comparing the query's embedding with the document embeddings stored in the vector store.
   
2. **Document Combination**: The retrieved documents are combined into a single string, which serves as the context for the language model. This string is created by joining the text of the top `k` relevant documents.
   
3. **Generative Model**: The combined text is passed to the language model (e.g., GPT-3), which generates a response based on both the original query and the additional context from the retrieved documents.

### Advantages of RAG in LCEL

- **Knowledge Extension**: RAG allows models to access external information, effectively extending the knowledge base beyond what the model was trained on.
  
- **Customizable Pipelines**: With LCEL, you can easily modify the pipeline to fit different tasks by swapping out components (e.g., changing the retriever or the language model).
  
- **Scalability**: LCEL enables scalable batch processing of multiple queries, making it easier to handle large datasets or a high volume of requests.
  
- **Modular and Composable**: Each step in the RAG pipeline is modular, so you can compose different workflows for retrieval and generation depending on the use case.

### Further Enhancements

- **Contextual Preprocessing**: You can add additional preprocessing steps (like reformatting the query) before retrieval using LCEL.
  
- **Post-Processing**: You can introduce post-processing steps (like summarization or answer extraction) after generation.
  
- **Advanced Retrieval**: Incorporate more sophisticated retrievers, such as hybrid retrieval (combining keyword search with embeddings) or re-ranking techniques.

### Full Example with Pre/Post Processing

```python
# A more complex RAG pipeline with pre/post-processing
rag_pipeline = (
    RunnableLambda(lambda query: f"Find relevant documents for: {query}")  # Pre-processing step
    | RunnableLambda(lambda query: retriever.get_relevant_documents(query))  # Retrieval step
    | RunnableLambda(lambda docs: " ".join([doc.page_content for doc in docs]))  # Combine docs
    | llm.run  # Generative model call
    | RunnableLambda(lambda response: response.strip())  # Post-processing step
)

# Generate response for a query
response = rag_pipeline.invoke("What is quantum computing?")
print(response)
```

In this pipeline:
1. We add a **pre-processing step** to format the query.
2. The **retriever** fetches relevant documents.
3. The documents are passed to the language model, which generates an answer.
4. A **post-processing step** trims any unnecessary white spaces from the final response.

### Conclusion

RAG in LCEL is a flexible and powerful way to combine information retrieval with language generation. By leveraging the **modularity** and **composability** of LCEL, you can build sophisticated pipelines for various use cases like question answering, summarization, or content generation augmented with external knowledge sources.

In [8]:
%pip install langchain openai faiss-cpu tiktoken langchain-community --upgrade --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m56.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m90.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [9]:
from operator import itemgetter
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.vectorstores.faiss import FAISS

In [10]:
vectorstore = FAISS.from_texts(
    ["James Phoenix works as a data engineering and LLM consultant at JustUnderstandingData", "James has an age of 31 years old."], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()

In [11]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

# It's the same as this, but the tuple allows for line breaks:
# {"context": retriever, "question": RunnablePassthrough()} | prompt | model | StrOutputParser()

In [12]:
chain

{
  context: VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x78fe07534e80>, search_kwargs={}),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x78fe07536fb0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x78fe075310f0>, root_client=<openai.OpenAI object at 0x78fe07534ee0>, root_async_client=<openai.AsyncOpenAI object at 0x78fe07537010>, model_kwargs={}, openai_api_key=SecretStr('**********'))
| StrOutputParser()

In [13]:
chain.invoke("What company does James phoenix work at?")

'James Phoenix works at JustUnderstandingData.'

In [14]:
chain.invoke("What is James Phoenix's age?")

'James Phoenix is 31 years old.'

---

## Understanding How `itemgetter` Works with Piping

In [15]:
test = {
    "data": ['This is a test', 'Another entry...']
}

print(itemgetter(test))
print(itemgetter('data')(test))

operator.itemgetter({'data': ['This is a test', 'Another entry...']})
['This is a test', 'Another entry...']


### How does it work within the context of LCEL?

In [18]:
prompt = ChatPromptTemplate.from_template('''What is the profession of James Phoenix? His profession is {profession}.''')

first_chain = RunnableParallel(
    name=lambda x: "James Phoenix",
    age=lambda x: 31
)

second_chain = {
    # itemgetter is used to get the value from the dictionary from the previous step: (note this is only the previous step, not the whole chain)
    'name': itemgetter('name'),
    'age': itemgetter('age'),
    # You can not use string values, either use itemgetter or a lambda, or RunnablePassthrough
    'profession': lambda x: "Data Engineer"
}

chain = first_chain | second_chain |  prompt |  ChatOpenAI() | StrOutputParser()
chain.invoke({})

''