[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/langchain-course/blob/main/chapters/07-lcel.ipynb)

#### LangChain Essentials Course

# LangChains Expression Language

LangChain is one of the most popular open source libraries for AI Engineers. It's goal is to abstract away the complexity in building AI software, provide easy-to-use building blocks, and make it easier when switching between AI service providers.

In this example, we will introduce LangChain's Expression Langauge (LCEL), abstracting a full chain and understanding how it will work.

In [1]:
!pip uninstall -y langchain google-generativeai
!pip install -qU \
  "pydantic>=2.11.4,<3" \
  "langchain-core>=0.3.70,<0.4" \
  "langchain-community>=0.3.27,<0.4" \
  "langchain-google-genai>=2.1.10" \
  "google-genai>=0.7.0" \
  "langsmith>=0.3.4" \
  "docarray==0.40.0"

Found existing installation: langchain 0.3.27
Uninstalling langchain-0.3.27:
  Successfully uninstalled langchain-0.3.27
Found existing installation: google-generativeai 0.8.5
Uninstalling google-generativeai-0.8.5:
  Successfully uninstalled google-generativeai-0.8.5
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.1/43.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.2/270.2 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m444.0/444.0 kB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m52.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m241.8/241.8 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━

---

> ⚠️ If using LangSmith, add your API key below:

In [3]:
import os
from getpass import getpass
from google.colab import userdata

# must enter API key
os.environ["LANGCHAIN_API_KEY"] = userdata.get("LANGCHAIN_API_KEY") or \
    getpass("Enter LangSmith API Key: ")

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"] = "aurelioai-langchain-course-lcel-openai"

---

## Traditional Chains vs LCEL

In this section we're going to dive into a basic example using the traditional method for building chains before jumping into LCEL. We will build a pipeline where the user must input a specific topic, and then the LLM will look and return a report on the specified topic. Generating a _research report_ for the user.

### Traditional LLMChain

The `LLMChain` is the simplest chain originally introduced in LangChain. This chain takes a prompt, feeds it into an LLM, and _optionally_ adds an output parsing step before returning the result.

Let's see how we construct this using the traditional method, for this we need:

* `prompt` — a `PromptTemplate` that will be used to generate the prompt for the LLM.
* `llm` — the LLM we will be using to generate the output.
* `output_parser` — an optional output parser that will be used to parse the structured output of the LLM.

In [4]:
from langchain import PromptTemplate

prompt_template = "Give me a small report on {topic}"

prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_template
)

For the LLM, we'll start by initializing our connection to the OpenAI API. We do need an OpenAI API key, which you can get from the [OpenAI platform](https://platform.openai.com/api-keys).

We will use the `gpt-4o-mini` model with a `temperature` of `0.0`:

In [5]:
from langchain_google_genai import ChatGoogleGenerativeAI

os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY") or getpass(
    "Enter Gemini API Key: "
)

openai_model = "gemini-1.5-flash"

llm = ChatGoogleGenerativeAI(
    model=openai_model,
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)


In [6]:
llm_out = llm.invoke("Hello there")
llm_out

AIMessage(content='Hello there! How can I help you today?', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-1.5-flash', 'safety_ratings': []}, id='run--cff589e1-7a11-4539-9114-66742b2bc650-0', usage_metadata={'input_tokens': 2, 'output_tokens': 11, 'total_tokens': 13, 'input_token_details': {'cache_read': 0}})

Then we define our output parser, this will be used to parse the output of the LLM. In this case, we will use the `StrOutputParser` which will parse the `AIMessage` output from our LLM into a single string.

In [7]:
from langchain.schema.output_parser import StrOutputParser

output_parser = StrOutputParser()

In [8]:
out = output_parser.invoke(llm_out)
out

'Hello there! How can I help you today?'

Through the `LLMChain` class we can place each of our components into a linear `chain`.

In [9]:
from langchain.chains import LLMChain

chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)

  chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)


Note that the `LLMChain` _was_ deprecated in LangChain `0.1.17`, the expected way of constructing these chains today is through LCEL, which we'll cover in a moment.

We can `invoke` our `chain`, providing a `topic` that we'd like to be researched.

In [10]:
result = chain.invoke("retrieval augmented generation")
result

{'topic': 'retrieval augmented generation',
 'text': "## Retrieval Augmented Generation (RAG): A Small Report\n\nRetrieval Augmented Generation (RAG) is a powerful technique that enhances large language models (LLMs) by combining their generative capabilities with external knowledge retrieval.  Instead of relying solely on the knowledge embedded within the model's parameters, RAG systems access and incorporate information from external knowledge bases, databases, or documents relevant to the user's prompt. This significantly improves the accuracy, factual consistency, and overall quality of the generated text.\n\n**How it works:**\n\nRAG typically involves three main steps:\n\n1. **Retrieval:**  Given a user prompt, a retrieval module identifies relevant information from an external knowledge source. This often involves techniques like keyword matching, semantic search (using embeddings), or vector databases.  The quality of the retrieval is crucial for the success of the entire system

We can view a formatted version of this output using the `Markdown` display:

In [11]:
from IPython.display import display, Markdown

display(Markdown(result["text"]))

## Retrieval Augmented Generation (RAG): A Small Report

Retrieval Augmented Generation (RAG) is a powerful technique that enhances large language models (LLMs) by combining their generative capabilities with external knowledge retrieval.  Instead of relying solely on the knowledge embedded within the model's parameters, RAG systems access and incorporate information from external knowledge bases, databases, or documents relevant to the user's prompt. This significantly improves the accuracy, factual consistency, and overall quality of the generated text.

**How it works:**

RAG typically involves three main steps:

1. **Retrieval:**  Given a user prompt, a retrieval module identifies relevant information from an external knowledge source. This often involves techniques like keyword matching, semantic search (using embeddings), or vector databases.  The quality of the retrieval is crucial for the success of the entire system.

2. **Augmentation:** The retrieved information is then integrated with the user's prompt. This can be done by simply concatenating the retrieved text with the prompt, or through more sophisticated methods that highlight the relevance of specific passages.

3. **Generation:** The augmented prompt is fed into the LLM, which generates the final output. The LLM now has access to both its internal knowledge and the relevant external information, leading to a more informed and accurate response.

**Advantages of RAG:**

* **Improved Accuracy and Factuality:**  By grounding the generation process in external data, RAG reduces hallucinations and ensures the output aligns with known facts.
* **Access to Up-to-Date Information:** LLMs are trained on static datasets. RAG allows access to the latest information, making the generated content more current and relevant.
* **Handling of Specialized Knowledge:** RAG can be used to incorporate domain-specific knowledge that may not be present in the LLM's training data.
* **Explainability and Transparency:**  The retrieved sources can provide context and justification for the generated output, increasing transparency and trust.

**Challenges of RAG:**

* **Retrieval Effectiveness:** The accuracy of the retrieved information is paramount.  Poor retrieval can lead to inaccurate or irrelevant outputs.
* **Computational Cost:**  The retrieval and augmentation steps add computational overhead compared to using LLMs alone.
* **Data Bias and Quality:**  The quality and potential biases in the external knowledge source directly impact the quality of the generated text.
* **Hallucination Mitigation:** While RAG reduces hallucinations, it doesn't eliminate them entirely.  The LLM can still generate incorrect information based on misinterpretations of the retrieved data.


**Conclusion:**

RAG represents a significant advancement in LLM applications. By bridging the gap between the generative capabilities of LLMs and the vastness of external knowledge, RAG systems offer a more robust, accurate, and reliable approach to natural language processing tasks.  Ongoing research focuses on improving retrieval techniques, managing computational costs, and mitigating the remaining challenges to unlock the full potential of this powerful technology.

That is a simple `LLMChain` using the traditional LangChain method. Now let's move onto LCEL.

## LangChain Expression Language (LCEL)

**L**ang**C**hain **E**xpression **L**anguage (LCEL) is the recommended approach to building chains in LangChain. Having superceeded the traditional methods with `LLMChain`, etc. LCEL gives us a more flexible system for building chains. The pipe operator `|` is used by LCEL to _chain_ together components. Let's see how we'd construct an `LLMChain` using LCEL.

In [12]:
lcel_chain = prompt | llm | output_parser

We can `invoke` this chain in the same way as we did before:

In [13]:
result = lcel_chain.invoke("retrieval augmented generation")
result

"## Retrieval Augmented Generation (RAG): A Small Report\n\nRetrieval Augmented Generation (RAG) is a paradigm shift in large language model (LLM) applications, addressing some key limitations of LLMs operating solely on their internal knowledge.  Instead of relying solely on pre-trained knowledge, RAG systems augment LLMs with external knowledge sources, typically a vector database containing relevant documents.  This allows the model to access and process information beyond its training data, leading to several advantages:\n\n**How it works:**\n\n1. **Retrieval:**  Given a user prompt, a retrieval module searches the external knowledge base for relevant documents. This often involves embedding both the prompt and the documents into a vector space and using similarity search techniques (e.g., cosine similarity) to identify the most pertinent documents.\n\n2. **Augmentation:** The retrieved documents are then provided as context to the LLM alongside the original prompt.  This allows th

The output format is slightly different, but the underlying functionality and content being output is the same. As before, we can view a formatted version of this output using the `Markdown` display:

In [14]:
display(Markdown(result))

## Retrieval Augmented Generation (RAG): A Small Report

Retrieval Augmented Generation (RAG) is a paradigm shift in large language model (LLM) applications, addressing some key limitations of LLMs operating solely on their internal knowledge.  Instead of relying solely on pre-trained knowledge, RAG systems augment LLMs with external knowledge sources, typically a vector database containing relevant documents.  This allows the model to access and process information beyond its training data, leading to several advantages:

**How it works:**

1. **Retrieval:**  Given a user prompt, a retrieval module searches the external knowledge base for relevant documents. This often involves embedding both the prompt and the documents into a vector space and using similarity search techniques (e.g., cosine similarity) to identify the most pertinent documents.

2. **Augmentation:** The retrieved documents are then provided as context to the LLM alongside the original prompt.  This allows the LLM to generate a response informed by the specific information found in the retrieved documents.

3. **Generation:** The LLM generates a response based on both its internal knowledge and the newly retrieved information.  This results in more accurate, up-to-date, and contextually relevant outputs.


**Advantages of RAG:**

* **Improved Accuracy and Factuality:** Access to external data sources reduces hallucinations and improves the accuracy of the generated text, especially for factual queries.
* **Up-to-date Information:** LLMs are trained on static datasets. RAG allows access to the latest information, overcoming the limitations of outdated training data.
* **Handling Specialized Knowledge:** RAG enables LLMs to handle niche topics or domains by accessing relevant specialized documents.
* **Explainability and Traceability:**  The retrieved documents provide a degree of explainability, allowing users to understand the basis of the LLM's response.


**Challenges of RAG:**

* **Retrieval Effectiveness:** The quality of the retrieved documents significantly impacts the quality of the generated response.  Poor retrieval can lead to inaccurate or irrelevant outputs.
* **Computational Cost:**  The retrieval and augmentation steps add computational overhead compared to using an LLM alone.
* **Data Management:**  Maintaining and updating the external knowledge base requires significant effort and resources.
* **Hallucination Mitigation:** While RAG reduces hallucinations, it doesn't eliminate them entirely.  The LLM can still misinterpret or misrepresent the retrieved information.


**Conclusion:**

RAG represents a significant advancement in LLM applications, offering a pathway to more accurate, up-to-date, and contextually relevant responses.  While challenges remain, ongoing research and development are addressing these limitations, paving the way for wider adoption of RAG in various applications, including question answering, chatbots, and content generation.

### How Does the Pipe Operator Work?

Before moving onto other LCEL features, let's take a moment to understand what the pipe operator `|` is doing and _how_ it works.

Functionality wise, the pipe tells you that whatever the _left_ side outputs will be fed as input into the _right_ side. In the example of `prompt | llm | output_parser`, we see that `prompt` feeds into `llm` feeds into `output_parser`.

The pipe operator is a way of chaining together components, and is a way of saying that whatever the _left_ side outputs will be fed as input into the _right_ side.

Let's make a basic class named `Runnable` that will transform our a provided function into a _runnable_ class that we will then use with the pipe `|` operator.

In [15]:
class Runnable:
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        def chained_func(*args, **kwargs):
            return other.invoke(self.func(*args, **kwargs))
        return Runnable(chained_func)
    def invoke(self, *args, **kwargs):
        return self.func(*args, **kwargs)

With the `Runnable` class, we will be able wrap a function into the class, allowing us to then chain together multiple of these _runnable_ functions using the `__or__` method.

First, let's create a few functions that we'll chain together:

In [16]:
def add_five(x):
    return x+5

def sub_five(x):
    return x-5

def mul_five(x):
    return x*5

Now we wrap our functions with the `Runnable`:

In [17]:
add_five_runnable = Runnable(add_five)
sub_five_runnable = Runnable(sub_five)
mul_five_runnable = Runnable(mul_five)

Finally, we can chain these together using the `__or__` method from the `Runnable` class:

In [18]:
chain = (add_five_runnable).__or__(sub_five_runnable).__or__(mul_five_runnable)

chain.invoke(3)

15

So we can see that we're able to chain together our functions using `__or__`. The pipe `|` operator is simply a shortcut for the `__or__` method, so we can create the exact same chain like so:

In [19]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

chain.invoke(3)

15

## LCEL `RunnableLambda`

The `RunnableLambda` class is LangChain's built-in method for constructing a _runnable_ object from a function. That is, it does the same thing as the custom `Runnable` class we created earlier. Let's try it out with the same functions as before.

In [20]:
from langchain_core.runnables import RunnableLambda

add_five_runnable = RunnableLambda(add_five)
sub_five_runnable = RunnableLambda(sub_five)
mul_five_runnable = RunnableLambda(mul_five)

We chain these together again with the pipe `|` operator:

In [21]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

And call them using the `invoke` method:

In [22]:
chain.invoke(3)

15

Now we want to try something a little more testing, so this time we will generate a report, and we will try and edit that report using this functionallity.

In [23]:
prompt_str = "give me a small report about {topic}"
prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_str
)

In [24]:
chain = prompt | llm | output_parser

In [25]:
result = chain.invoke("AI")
display(Markdown(result))

## A Brief Report on Artificial Intelligence

Artificial intelligence (AI) is rapidly transforming various sectors, impacting everything from healthcare and finance to transportation and entertainment.  While the term encompasses a broad range of technologies, current advancements center around machine learning (ML) and deep learning (DL).  ML algorithms allow computers to learn from data without explicit programming, while DL utilizes artificial neural networks with multiple layers to analyze complex data patterns.

**Key Developments:**

* **Generative AI:**  Models like GPT-3 and DALL-E 2 demonstrate impressive capabilities in generating human-quality text, images, and other media, opening new avenues for creativity and content creation.  However, concerns regarding misinformation and ethical implications are prominent.
* **Improved Natural Language Processing (NLP):**  AI's ability to understand and process human language continues to improve, leading to more sophisticated chatbots, language translation tools, and sentiment analysis applications.
* **Computer Vision Advancements:**  AI-powered image recognition and object detection are becoming increasingly accurate and efficient, with applications in autonomous vehicles, medical diagnosis, and security systems.

**Challenges and Concerns:**

* **Bias and Fairness:** AI systems trained on biased data can perpetuate and amplify existing societal inequalities.  Addressing bias in algorithms and datasets is crucial for responsible AI development.
* **Job Displacement:** Automation driven by AI raises concerns about potential job losses across various industries.  Reskilling and upskilling initiatives are necessary to mitigate this impact.
* **Ethical Considerations:**  The use of AI in surveillance, autonomous weapons, and decision-making processes raises significant ethical questions that require careful consideration and regulation.

**Conclusion:**

AI is a powerful technology with the potential to solve complex problems and improve lives.  However, its development and deployment must be guided by ethical principles and a focus on mitigating potential risks.  Ongoing research, responsible development, and robust regulatory frameworks are essential to harness the benefits of AI while addressing its challenges.

Here we are making two functions, `extract_fact` to pull out the main content of our text and `replace_word` that will replace AI with Skynet!

In [26]:
def extract_fact(x):
    if "\n\n" in x:
        return "\n".join(x.split("\n\n")[1:])
    else:
        return x

old_word = "AI"
new_word = "skynet"

def replace_word(x):
    return x.replace(old_word, new_word)

Lets wrap these functions and see what the output is!

In [27]:
extract_fact_runnable = RunnableLambda(extract_fact)
replace_word_runnable = RunnableLambda(replace_word)

In [28]:
chain = prompt | llm | output_parser | extract_fact_runnable | replace_word_runnable

In [29]:
result = chain.invoke("retrieval augmented generation")
display(Markdown(result))

Retrieval Augmented Generation (RAG) is a paradigm shift in large language model (LLM) applications, addressing limitations of traditional LLMs by augmenting their capabilities with external knowledge sources.  Instead of relying solely on the knowledge embedded during training, RAG systems retrieve relevant information from a knowledge base before generating a response. This allows for:
* **Access to up-to-date information:** LLMs are trained on static datasets, making them unaware of recent events or updates. RAG overcomes this by connecting to dynamic knowledge bases like databases, web pages, or internal documents.
* **Improved accuracy and factual consistency:** By grounding responses in retrieved evidence, RAG reduces hallucinations (fabricating information) and improves the accuracy and reliability of the generated text.
* **Handling complex or specialized queries:**  LLMs may struggle with niche topics or require extensive context. RAG allows them to focus on reasoning and generation while leveraging external sources for the necessary factual information.
* **Enhanced explainability:**  The retrieved sources provide context and justification for the generated response, increasing transparency and trust.

**How it works:**  A RAG system typically involves three main components:
1. **Retrieval:** A retrieval module identifies relevant information from the knowledge base based on the user's query.  This often involves techniques like keyword matching, semantic search, or vector databases.
2. **Contextualization:** The retrieved information is then processed and contextualized to be suitable for the LLM. This might involve formatting, summarization, or highlighting key aspects.
3. **Generation:** The LLM receives both the user's query and the retrieved context as input and generates a response.

**Challenges:**  Despite its advantages, RAG faces challenges including:
* **Retrieval effectiveness:**  The quality of the generated response heavily depends on the relevance and accuracy of the retrieved information.  Poor retrieval can lead to inaccurate or irrelevant outputs.
* **Computational cost:**  The retrieval and contextualization steps add computational overhead, potentially impacting performance and scalability.
* **Knowledge base management:**  Maintaining and updating a large and accurate knowledge base can be challenging and resource-intensive.

**Conclusion:** RAG represents a significant advancement in LLM applications, offering improved accuracy, up-to-dateness, and explainability.  While challenges remain, ongoing research and development are addressing these issues, paving the way for more robust and reliable skynet systems.  The future of RAG likely involves more sophisticated retrieval methods, efficient knowledge base management techniques, and seamless integration with various LLM architectures.

Those are our `RunnableLambda` functions. It's worth noting that all inputs to these functions are expected to be a SINGLE arguments. If you have a function that accepts multiple arguments, you can input a dictionary with keys, then unpack them inside the function.

## LCEL `RunnableParallel` and `RunnablePassthrough`

LCEL provides us with various `Runnable` classes that allow us to control the flow of data and execution order through our chains. Two of these are `RunnableParallel` and `RunnablePassthrough`.

* `RunnableParallel` — allows us to run multiple `Runnable` instances in parallel. Acting almost as a Y-fork in the chain.

* `RunnablePassthrough` — allows us to pass through a variable to the next `Runnable` without modification.

To see these runnables in action, we will create two data sources, each source provides specific information but to answer the question we will need both to fed to the LLM.

In [32]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch
from google.colab import userdata
import os
from pydantic import ValidationError

os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")

embedding = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

vecstore_a = DocArrayInMemorySearch.from_texts(
    [
        "half the info is here",
        "DeepSeek-V3 was released in December 2024"
    ],
    embedding=embedding
)
vecstore_b = DocArrayInMemorySearch.from_texts(
    [
        "the other half of the info is here",
        "the DeepSeek-V3 LLM is a mixture of experts model with 671B parameters"
    ],
    embedding=embedding
)

Here you can see the prompt does have three inputs, two for context and one for the question itself.

In [33]:
prompt_str = """Using the context provided, answer the user's question.
Context:
{context_a}
{context_b}
"""

In [34]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(prompt_str),
    HumanMessagePromptTemplate.from_template("{question}")
])

Here we are wrapping our vector stores as retrievers so they can be fitted into one big retrieval variable to be used by the prompt.

In [35]:
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

retriever_a = vecstore_a.as_retriever()
retriever_b = vecstore_b.as_retriever()

retrieval = RunnableParallel(
    {
        "context_a": retriever_a,
        "context_b": retriever_b,
        "question": RunnablePassthrough()
    }
)

The chain we'll be constructing will look something like this:

![](https://github.com/aurelio-labs/langchain-course/blob/main/assets/lcel-flow.png?raw=1)

In [36]:
chain = retrieval | prompt | llm | output_parser

We `invoke` it as usual.

In [37]:
result = chain.invoke(
    "what architecture does the model DeepSeek released in december use?"
)
result

'DeepSeek-V3, released in December 2024, uses a Mixture of Experts architecture.'

With that we've seen how we can use `RunnableParallel` and `RunnablePassthrough` to control the flow of data and execution order through our chains.

---