# LangChains Expression Language

LangChain is one of the most popular open source libraries for AI Engineers. It's goal is to abstract away the complexity in building AI software, provide easy-to-use building blocks, and make it easier when switching between AI service providers.

In this example, we will introduce LangChain's Expression Langauge (LCEL), abstracting a full chain and understanding how it will work. We'll provide examples for Meta's `llama3.2` via Ollama!

## Traditional Chains vs LCEL

In this section we're going to dive into a basic example using the traditional method for building chains before jumping into LCEL. We will build a pipeline where the user must input a specific topic, and then the LLM will look and return a report on the specified topic. Generating a _research report_ for the user.

### Traditional LLMChain

The `LLMChain` is the simplest chain originally introduced in LangChain. This chain takes a prompt, feeds it into an LLM, and _optionally_ adds an output parsing step before returning the result.

Let's see how we construct this using the traditional method, for this we need:

* `prompt` — a `PromptTemplate` that will be used to generate the prompt for the LLM.
* `llm` — the LLM we will be using to generate the output.
* `output_parser` — an optional output parser that will be used to parse the structured output of the LLM.

In [1]:
from langchain import PromptTemplate

prompt_template = "Give me a small report on {topic}"

prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_template
)

For the LLM, we'll start by initializing our connection to the Meta's Ollama .

In [3]:
# Import necessary libraries
from langchain_core.tools import tool
from langchain_community.llms import Ollama
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationBufferMemory
from langchain.schema import HumanMessage, AIMessage
import json
import requests
from datetime import datetime
from IPython.display import display, Markdown

# Initialize the Ollama LLM
llm = Ollama(
    model="llama3.2",  
    temperature=0.0,    
    base_url="http://localhost:11434"  # Default Ollama server URL
)



In [4]:
llm_out = llm.invoke("Hello there")
llm_out

"Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?"

Then we define our output parser, this will be used to parse the output of the LLM. In this case, we will use the `StrOutputParser` which will parse the `AIMessage` output from our LLM into a single string.

In [5]:
from langchain.schema.output_parser import StrOutputParser

output_parser = StrOutputParser()

In [6]:
out = output_parser.invoke(llm_out)
out

"Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?"

Through the `LLMChain` class we can place each of our components into a linear `chain`.

In [7]:
from langchain.chains import LLMChain

chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)

  chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)


Note that the `LLMChain` _was_ deprecated in LangChain `0.1.17`, the expected way of constructing these chains today is through LCEL, which we'll cover in a moment.

We can `invoke` our `chain`, providing a `topic` that we'd like to be researched.

In [8]:
result = chain.invoke("retrieval augmented generation")
result

{'topic': 'retrieval augmented generation',
 'text': '**Retrieval-Augmented Generation (RAG)**\n\nRetrieval-Augmented Generation (RAG) is a novel approach to natural language processing (NLP) that combines the strengths of both retrieval and generation models. The goal of RAG is to improve the efficiency, accuracy, and coherence of text generation tasks by leveraging pre-trained language models for retrieval.\n\n**Key Components:**\n\n1. **Retrieval Model:** A pre-trained language model is used as a retrieval component to search for relevant information in a large corpus.\n2. **Generation Model:** A separate pre-trained language model is used as a generation component to generate text based on the retrieved information.\n3. **Hybrid Model:** The retrieval and generation models are combined using a hybrid architecture, which allows them to interact with each other during the generation process.\n\n**How it Works:**\n\n1. **Input Text:** A query or prompt is input into the system.\n2. **

We can view a formatted version of this output using the `Markdown` display:

In [9]:
from IPython.display import display, Markdown

display(Markdown(result["text"]))

**Retrieval-Augmented Generation (RAG)**

Retrieval-Augmented Generation (RAG) is a novel approach to natural language processing (NLP) that combines the strengths of both retrieval and generation models. The goal of RAG is to improve the efficiency, accuracy, and coherence of text generation tasks by leveraging pre-trained language models for retrieval.

**Key Components:**

1. **Retrieval Model:** A pre-trained language model is used as a retrieval component to search for relevant information in a large corpus.
2. **Generation Model:** A separate pre-trained language model is used as a generation component to generate text based on the retrieved information.
3. **Hybrid Model:** The retrieval and generation models are combined using a hybrid architecture, which allows them to interact with each other during the generation process.

**How it Works:**

1. **Input Text:** A query or prompt is input into the system.
2. **Retrieval:** The retrieval model searches for relevant information in the corpus based on the query.
3. **Ranking:** The retrieved documents are ranked based on their relevance to the query.
4. **Generation:** The generation model generates text based on the top-ranked document(s) from the retrieval step.

**Advantages:**

1. **Improved Efficiency:** RAG reduces the computational cost of generating text by leveraging pre-trained models for retrieval.
2. **Increased Accuracy:** By using a retrieval component, RAG can improve the accuracy of generated text by selecting relevant information from the corpus.
3. **Enhanced Coherence:** The hybrid model allows for more coherent and context-specific generation of text.

**Applications:**

1. **Text Generation:** RAG can be used for various text generation tasks such as summarization, question answering, and text classification.
2. **Conversational AI:** RAG can improve the efficiency and accuracy of conversational AI systems by leveraging pre-trained models for retrieval.
3. **Content Generation:** RAG can be used to generate high-quality content such as articles, blog posts, and social media updates.

**Conclusion:**

Retrieval-Augmented Generation (RAG) is a promising approach to natural language processing that combines the strengths of both retrieval and generation models. By leveraging pre-trained language models for retrieval, RAG can improve the efficiency, accuracy, and coherence of text generation tasks. As the field continues to evolve, we can expect to see more applications of RAG in various domains.

That is a simple `LLMChain` using the traditional LangChain method. Now let's move onto LCEL.

## LangChain Expression Language (LCEL)

**L**ang**C**hain **E**xpression **L**anguage (LCEL) is the recommended approach to building chains in LangChain. Having superceeded the traditional methods with `LLMChain`, etc. LCEL gives us a more flexible system for building chains. The pipe operator `|` is used by LCEL to _chain_ together components. Let's see how we'd construct an `LLMChain` using LCEL.

In [10]:
lcel_chain = prompt | llm | output_parser

In [11]:
result = lcel_chain.invoke("retrieval augmented generation")
result

'**Retrieval-Augmented Generation (RAG)**\n\nRetrieval-Augmented Generation (RAG) is a novel approach to natural language processing (NLP) that combines the strengths of both retrieval and generation models. The goal of RAG is to leverage the capabilities of retrieval models, which are designed to efficiently search and retrieve relevant information from large databases or knowledge graphs, with those of generation models, which can generate coherent and context-specific text.\n\n**Key Components:**\n\n1. **Retrieval Model:** A retrieval model is used to search for relevant documents or knowledge graph entities that match the input query.\n2. **Generation Model:** A generation model is used to generate a response based on the retrieved information.\n3. **Augmentation Mechanism:** The generated response is then augmented with additional information from the retrieved documents or knowledge graph entities.\n\n**How RAG Works:**\n\n1. Input Query: A user submits an input query, which can 

The output format is slightly different, but the underlying functionality and content being output is the same. As before, we can view a formatted version of this output using the `Markdown` display:

In [12]:
display(Markdown(result))

**Retrieval-Augmented Generation (RAG)**

Retrieval-Augmented Generation (RAG) is a novel approach to natural language processing (NLP) that combines the strengths of both retrieval and generation models. The goal of RAG is to leverage the capabilities of retrieval models, which are designed to efficiently search and retrieve relevant information from large databases or knowledge graphs, with those of generation models, which can generate coherent and context-specific text.

**Key Components:**

1. **Retrieval Model:** A retrieval model is used to search for relevant documents or knowledge graph entities that match the input query.
2. **Generation Model:** A generation model is used to generate a response based on the retrieved information.
3. **Augmentation Mechanism:** The generated response is then augmented with additional information from the retrieved documents or knowledge graph entities.

**How RAG Works:**

1. Input Query: A user submits an input query, which can be a natural language sentence or phrase.
2. Retrieval Model: The retrieval model searches for relevant documents or knowledge graph entities that match the input query.
3. Generation Model: The generation model generates a response based on the retrieved information.
4. Augmentation Mechanism: The generated response is augmented with additional information from the retrieved documents or knowledge graph entities.

**Advantages of RAG:**

1. **Improved Accuracy:** RAG can generate more accurate and coherent responses by leveraging the strengths of both retrieval and generation models.
2. **Increased Contextual Understanding:** RAG can provide a better understanding of the context in which the input query is being asked, leading to more relevant and informative responses.
3. **Efficient Information Retrieval:** RAG can efficiently retrieve relevant information from large databases or knowledge graphs, reducing the need for manual searching.

**Applications of RAG:**

1. **Virtual Assistants:** RAG can be used in virtual assistants to provide more accurate and context-specific responses to user queries.
2. **Chatbots:** RAG can be used in chatbots to generate more informative and engaging conversations with users.
3. **Information Retrieval Systems:** RAG can be used in information retrieval systems to improve the accuracy and relevance of search results.

**Conclusion:**

Retrieval-Augmented Generation (RAG) is a promising approach to natural language processing that combines the strengths of both retrieval and generation models. By leveraging the capabilities of retrieval models, RAG can generate more accurate and coherent responses while providing a better understanding of the context in which the input query is being asked. The applications of RAG are vast and varied, making it an exciting area of research with significant potential for impact.

### How Does the Pipe Operator Work?

Before moving onto other LCEL features, let's take a moment to understand what the pipe operator `|` is doing and _how_ it works.

Functionality wise, the pipe tells you that whatever the _left_ side outputs will be fed as input into the _right_ side. In the example of `prompt | llm | output_parser`, we see that `prompt` feeds into `llm` feeds into `output_parser`.

The pipe operator is a way of chaining together components, and is a way of saying that whatever the _left_ side outputs will be fed as input into the _right_ side.

Let's make a basic class named `Runnable` that will transform our a provided function into a _runnable_ class that we will then use with the pipe `|` operator.

In [13]:
class Runnable:
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        def chained_func(*args, **kwargs):
            return other.invoke(self.func(*args, **kwargs))
        return Runnable(chained_func)
    def invoke(self, *args, **kwargs):
        return self.func(*args, **kwargs)

With the `Runnable` class, we will be able wrap a function into the class, allowing us to then chain together multiple of these _runnable_ functions using the `__or__` method.

First, let's create a few functions that we'll chain together:

In [14]:
def add_five(x):
    return x+5

def sub_five(x):
    return x-5

def mul_five(x):
    return x*5

Now we wrap our functions with the `Runnable`:

In [15]:
add_five_runnable = Runnable(add_five)
sub_five_runnable = Runnable(sub_five)
mul_five_runnable = Runnable(mul_five)

Finally, we can chain these together using the `__or__` method from the `Runnable` class:

In [16]:
chain = (add_five_runnable).__or__(sub_five_runnable).__or__(mul_five_runnable)

chain.invoke(3)

15

So we can see that we're able to chain together our functions using `__or__`. The pipe `|` operator is simply a shortcut for the `__or__` method, so we can create the exact same chain like so:

In [17]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

chain.invoke(3)

15

## LCEL `RunnableLambda`

The `RunnableLambda` class is LangChain's built-in method for constructing a _runnable_ object from a function. That is, it does the same thing as the custom `Runnable` class we created earlier. Let's try it out with the same functions as before.

In [18]:
from langchain_core.runnables import RunnableLambda

add_five_runnable = RunnableLambda(add_five)
sub_five_runnable = RunnableLambda(sub_five)
mul_five_runnable = RunnableLambda(mul_five)

We chain these together again with the pipe `|` operator:

In [19]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

And call them using the `invoke` method:

In [20]:
chain.invoke(3)

15

Now we want to try something a little more testing, so this time we will generate a report, and we will try and edit that report using this functionallity.

In [21]:
prompt_str = "give me a small report about {topic}"
prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_str
)

In [22]:
chain = prompt | llm | output_parser

In [23]:
result = chain.invoke("AI")
display(Markdown(result))

**Artificial Intelligence (AI) Report**

**Introduction:**
Artificial intelligence (AI) refers to the development of computer systems that can perform tasks that typically require human intelligence, such as learning, problem-solving, and decision-making. The field of AI has made significant progress in recent years, with applications in various industries, including healthcare, finance, transportation, and education.

**Key Developments:**

1. **Deep Learning:** A subset of machine learning, deep learning uses neural networks to analyze data and make predictions. This technology has led to breakthroughs in image recognition, natural language processing, and speech recognition.
2. **Natural Language Processing (NLP):** NLP enables computers to understand and generate human-like language. This technology is used in virtual assistants, chatbots, and language translation software.
3. **Robotics:** AI-powered robots are being used in manufacturing, logistics, and healthcare to perform tasks that require precision and dexterity.

**Applications:**

1. **Healthcare:** AI is being used to analyze medical images, diagnose diseases, and develop personalized treatment plans.
2. **Finance:** AI-powered systems are being used to detect fraud, predict stock prices, and optimize investment portfolios.
3. **Transportation:** Self-driving cars and trucks are being developed using AI algorithms that enable vehicles to navigate complex roads and make decisions in real-time.

**Challenges:**

1. **Bias and Fairness:** AI systems can perpetuate biases present in the data used to train them, leading to unfair outcomes.
2. **Job Displacement:** The increasing use of AI in industries may lead to job displacement for certain workers.
3. **Security:** AI-powered systems can be vulnerable to cyber attacks and data breaches.

**Conclusion:**
Artificial intelligence has made significant progress in recent years, with applications in various industries. However, there are also challenges associated with the development and deployment of AI systems, including bias, fairness, job displacement, and security concerns. As AI continues to evolve, it is essential to address these challenges and ensure that the benefits of AI are shared by all.

**Future Outlook:**
The future of AI holds much promise, with potential applications in areas such as:

1. **Autonomous Systems:** Self-driving cars, drones, and robots will become increasingly common.
2. **Personalized Medicine:** AI-powered systems will enable personalized treatment plans for patients.
3. **Smart Cities:** AI-powered systems will optimize energy consumption, traffic flow, and public services.

Overall, the future of AI is exciting and holds much potential for transforming various industries and aspects of our lives.

Here we are making two functions, `extract_fact` to pull out the main content of our text and `replace_word` that will replace AI with Skynet!

In [24]:
def extract_fact(x):
    if "\n\n" in x:
        return "\n".join(x.split("\n\n")[1:])
    else:
        return x

old_word = "AI"
new_word = "skynet"

def replace_word(x):
    return x.replace(old_word, new_word)

Lets wrap these functions and see what the output is!

In [25]:
extract_fact_runnable = RunnableLambda(extract_fact)
replace_word_runnable = RunnableLambda(replace_word)

In [26]:
chain = prompt | llm | output_parser | extract_fact_runnable | replace_word_runnable

In [27]:
result = chain.invoke("retrieval augmented generation")
display(Markdown(result))

Retrieval-Augmented Generation (RAG) is a novel approach to natural language processing (NLP) that combines the strengths of both retrieval and generation models. The goal of RAG is to leverage the capabilities of retrieval models, which excel at finding relevant information in large databases, with the power of generation models, which can create coherent and context-specific text.
**Key Components:**
1. **Retrieval Model:** A pre-trained language model (e.g., BERT or RoBERTa) that is fine-tuned on a specific task, such as question answering or text classification.
2. **Generation Model:** A sequence-to-sequence model (e.g., transformer-based) that generates text based on the retrieved information.
**How RAG Works:**
1. The retrieval model is used to retrieve relevant documents or passages from a large database.
2. The generated text is then fed into the generation model, which uses the retrieved information as input to generate coherent and context-specific text.
3. The output of the generation model is refined through multiple iterations, with the retrieval model providing feedback on the relevance and accuracy of the generated text.
**Advantages:**
1. **Improved Accuracy:** RAG can achieve state-of-the-art performance on various NLP tasks, such as question answering and text classification.
2. **Increased Efficiency:** By leveraging the strengths of both retrieval and generation models, RAG can reduce the computational resources required for training and inference.
3. **Contextual Understanding:** RAG can provide a deeper understanding of the context in which the generated text is being used.
**Applications:**
1. **Question Answering Systems:** RAG can be used to build more accurate question answering systems that can retrieve relevant information from large databases.
2. **Text Generation:** RAG can be applied to generate coherent and context-specific text for various applications, such as chatbots, content generation, and language translation.
**Future Directions:**
1. **Multitask Learning:** Exploring the use of multitask learning to fine-tune both retrieval and generation models simultaneously.
2. **Explainability:** Developing techniques to provide insights into how RAG generates text and what information is being retrieved from the database.
3. **Scalability:** Investigating ways to scale up RAG to accommodate large databases and complex NLP tasks.
Overall, Retrieval-Augmented Generation (RAG) has shown great promise in improving the accuracy and efficiency of various NLP tasks. As research continues to advance, we can expect to see even more innovative applications of this technology.

Those are our `RunnableLambda` functions. It's worth noting that all inputs to these functions are expected to be a SINGLE arguments. If you have a function that accepts multiple arguments, you can input a dictionary with keys, then unpack them inside the function.

## LCEL `RunnableParallel` and `RunnablePassthrough`

LCEL provides us with various `Runnable` classes that allow us to control the flow of data and execution order through our chains. Two of these are `RunnableParallel` and `RunnablePassthrough`.

* `RunnableParallel` — allows us to run multiple `Runnable` instances in parallel. Acting almost as a Y-fork in the chain.

* `RunnablePassthrough` — allows us to pass through a variable to the next `Runnable` without modification.

To see these runnables in action, we will create two data sources, each source provides specific information but to answer the question we will need both to fed to the LLM.

In [None]:
pip install -U sentence-transformers docarray

In [30]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

# Use a local sentence-transformers model (no OPENAI_API_KEY needed)
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

vecstore_a = DocArrayInMemorySearch.from_texts(
    [
        "half the info is here",
        "DeepSeek-V3 was released in December 2024"
    ],
    embedding=embedding
)
vecstore_b = DocArrayInMemorySearch.from_texts(
    [
        "the other half of the info is here",
        "the DeepSeek-V3 LLM is a mixture of experts model with 671B parameters"
    ],
    embedding=embedding
)

  embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


Here you can see the prompt does have three inputs, two for context and one for the question itself.

In [31]:
prompt_str = """Using the context provided, answer the user's question.
Context:
{context_a}
{context_b}
"""

In [32]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(prompt_str),
    HumanMessagePromptTemplate.from_template("{question}")
])

Here we are wrapping our vector stores as retrievers so they can be fitted into one big retrieval variable to be used by the prompt.

In [33]:
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

retriever_a = vecstore_a.as_retriever()
retriever_b = vecstore_b.as_retriever()

retrieval = RunnableParallel(
    {
        "context_a": retriever_a, "context_b": retriever_b, "question": RunnablePassthrough()
    }
)

The chain we'll be constructing will look something like this:

![](../assets/lcel-flow.png)

In [34]:
chain = retrieval | prompt | llm | output_parser

We `invoke` it as usual.

In [35]:
result = chain.invoke(
    "what architecture does the model DeepSeek released in december use?"
)
result

'The model DeepSeek-V3 uses a Mixture of Experts (MoE) architecture.'

With that we've seen how we can use `RunnableParallel` and `RunnablePassthrough` to control the flow of data and execution order through our chains.

---