# Retrieval Augmented Generation using Runnables and Chains w/ LangChain

Enhance generation with specialized knowledge.

**Purpose**:
This notebook's purpose is to teach you how to build your own custom `Runnable`s from the `LangChain` ecosystem to build your own RAG app.

## Definitions: `Runnables` and `Chains`

### *Runnables*:

• A Runnable represents a unit of work that can be executed.

• It can perform a specific task or action, such as making an API call, processing data, or running a machine learning model.

• Runnables can have input and output types specified, and they can be composed together to form more complex workflows.

• They are designed to be flexible and reusable components that can be easily combined and configured

• Require an `invoke` method, which is used to execute the Runnable.

• Examples of Runnables include API calls, data processing functions, and machine learning models.

### *Chains*:

• A Chain is a sequence of Runnables that are executed in a specific order.

• Chains provide a way to string together multiple Runnables to create a workflow or pipeline.

• Each Runnable in the Chain takes the output of the previous Runnable as its input.

• Chains can be used to build complex applications by combining and orchestrating the execution of multiple Runnables.

• They provide a higher-level abstraction for organizing and structuring the flow of data and operations.

• Examples of Chains include data processing pipelines, machine learning workflows, and API request/response sequences.

## **Deeper explanation**:

In the process of building an AI chatbot, we often need to connect different components together to create a functional system.

One way to achieve this is by *chaining* these components, ensuring that the output of one component is properly passed to the next component for further processing. To accomplish this, we can directly call the functions or methods of each component and pass the output as arguments to the next component. 

- This straightforward approach works well when we only need to pass the output from one component to another ***without any*** additional processing or transformations in between.

However, in more complex scenarios where we require intermediate processing or transformations on the output, we can use a concept called "runnables." Runnables provide a flexible and modular way to encapsulate and compose these processing steps within a chain.

By using runnables, we can easily add additional functionality, such as filtering or modifying the output, before passing it to the next component. 

- This allows us to *customize the behavior* of the chatbot and *ensure* that the output is properly prepared for the subsequent steps.

### "How are `Runnable`s different than normal classes?"

*Similarities*:

• Runnables can have methods and attributes, just like normal classes.

• They can define and implement their own logic and functionality.

• Runnables can have constructor arguments and can be instantiated with different configurations.

*Differences*:

• Runnables are designed to be executed as part of a larger system or workflow, often in a distributed or parallelized manner.

• They are typically used for data processing, transformation, or analysis tasks.
• Runnables have specific interfaces and methods that define how they interact with other runnables and the overall system.

• They can be composed and combined with other runnables to create complex workflows.

• Runnables often have additional features and capabilities specific to the Langchain platform, such as input and output type validation, configuration management, and error handling.

• They can be executed asynchronously and in parallel, taking advantage of distributed computing resources.

• Runnables can be versioned and deployed as part of a larger system, allowing for easy updates and maintenance.

### "How do I decide to use either a `Runnable` or a `Chain`?"
Ultimately, the decision to use runnables or a more straightforward sequential approach depends on the specific requirements and complexity of the chatbot system. You might find yourself using one, both, or neither based on your needs.

In summary, 
1. Chains, which are sequences of interconnected tasks, can operate effectively on their own, without the need for Runnables. They are designed to link various components of a system in a specific order, allowing for the smooth execution of a workflow or pipeline. This makes them particularly useful in scenarios where a straightforward, sequential process is sufficient and where the complexity of Runnables is not required.

2. Runnables resemble traditional classes but offer enhanced functionality, particularly in complex AI chatbot systems. They facilitate the integration and processing of outputs between different components, allowing for customization and increased flexibility in system design. This makes Runnables ideal for scenarios requiring more than just sequential processing, such as when intermediate steps or specific transformations of data are necessary.

## Set up

In [1]:
%pip install -Uq openai tiktoken chromadb langchain langchain-openai faiss-cpu beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Set API Key Directly
import os

os.environ["OPENAI_API_KEY"] = ""

# Load from an .env file
# %pip install -Uq python-dotenv
# import os
# import dotenv
# dotenv.load_dotenv()

# Creating and Executing a Custom Runnable

In this section, we will put into practice the concepts of `Runnables` that we've discussed earlier. We will create a custom `Runnable` called `AddNumbersRunnable` and demonstrate how to execute it within our application.

#### Step 1: Define the Runnable

First, we instantiate our `AddNumbersRunnable`. This is a class that we've designed to perform a specific task—in this case, adding two numbers. The design of this class follows the principles of `Runnables` in the LangChain ecosystem, making it a modular and reusable component.

#### Step 2: Prepare the Input

Next, we prepare the input for our runnable using the `InputType` class. This class is a Pydantic model that ensures our input data is structured and typed correctly. By creating an instance of `InputType`, we are packaging our data (the two numbers we want to add) in a way that our `Runnable` can easily understand and process.

#### Step 3: Execute the Runnable

With our input ready, we call the `run` method of our `AddNumbersRunnable` instance. This method encapsulates the logic of our task and is responsible for executing the work defined by the `Runnable`. It takes our structured input, performs the addition, and returns an output in the form of an `OutputType` instance.

#### Step 4: Display the Result

Finally, we display the result of our runnable's execution. The `OutputType` class defines the expected structure of the output from our `Runnable`. By accessing the `result` attribute, we can retrieve the sum of the two numbers and present it to the user.

This simple example illustrates the power of `Runnables` and how they can be used to build clean, organized, and reusable components within your applications. By following this pattern, you can create more complex workflows and applications using the LangChain library.

In [3]:
from langchain_core.runnables.base import Runnable
from pydantic import BaseModel

class AddNumbersRunnable(Runnable):
    """
    A class representing a runnable that adds two numbers together.

    Attributes:
        InputType (BaseModel): A Pydantic model representing the input to the runnable. It has two attributes: num1 and num2, both integers.
        OutputType (BaseModel): A Pydantic model representing the output of the runnable. It has one attribute: result, an integer.

    Methods:
        run(input: InputType) -> OutputType: This method takes an instance of InputType as input, adds the two numbers together, and returns an instance of OutputType containing the sum.
        invoke(input: InputType) -> OutputType: This method is a wrapper around the run method. It takes an instance of InputType as input and returns the result of calling the run method with the same input.
    """
    class InputType(BaseModel):
        num1: int
        num2: int

    class OutputType(BaseModel):
        result: int

    def run(self, input: InputType) -> OutputType:
        result = input.num1 + input.num2
        output = self.OutputType(result=result)
        return output
    
    def invoke(self, input: InputType) -> OutputType:
        return self.run(input)

In [4]:
# Instantiate the AddNumbersRunnable class.
# This class is an example of a Runnable, which is a core concept in LangChain for creating modular and reusable units of work.
# As we've learned, Runnables like this one encapsulate specific tasks—in this case, adding two numbers.
add_numbers = AddNumbersRunnable()

# Prepare the input data for the runnable.
# The InputType class is a Pydantic model that defines the expected structure of the input, ensuring type safety and validation.
# Here, we're creating an instance of InputType with two numbers, demonstrating how inputs are structured for Runnables.
input_data = AddNumbersRunnable.InputType(num1=5, num2=3)

# Execute the runnable with the provided input data.
# The run method is where the logic of the Runnable is executed. This method is a clear example of how a Runnable performs its task.
# By calling this method, we're following the pattern of Runnables where they take an input, process it, and produce an output.
output_data = add_numbers.run(input_data)

# Display the result of the runnable's execution.
# The OutputType class defines the structure of the output, which is another aspect of Runnables that promotes consistency and predictability.
# This print statement not only shows the result but also reinforces the concept of Runnables having defined inputs and outputs.
print(f"The result of adding {input_data.num1} and {input_data.num2} is: {output_data.result}")  # Output: 8

The result of adding 5 and 3 is: 8


---

# LCEL: Creating and Executing a Chain

## Conversation Retrieval Chain
Below we'll use a FAISS index as our vectorstore retriever so that we can embed context and augment the LLM.

### Step 1: Aquire Documents for Retrieval Augmented Generation
We start with importing all necessary classes, and loading the documents we want to use for our retrieval chain. For this notebook, we'll use the `WebBaseLoader` to learn about LangChain Expression Language, a root concept for runnables and chains. To produce embeddings which produce accurate generations down the line, we'll chunk our document using smaller chunks, which can increase the focus of retrieved context, but may also require multiple rounds of retrieval which increases processing time.

### Step 2: Create a set of prompt templates for a Chain
This part is pretty straight forward. We use a couple classes that help us format a string of prompts that the chat model will have to answer. The first template takes in a single question and "chat_history" as parameters, which is then condensed by the following prompt, and finally


SUMMARY:(This simple example illustrates the power of `Runnables` and how they can be used to build clean, organized, and reusable components within your applications. By following this pattern, you can create more complex workflows and applications using the LangChain library.)

In [5]:
from operator import itemgetter

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.memory import ConversationBufferWindowMemory

from langchain.schema import format_document
from langchain_core.messages import AIMessage, HumanMessage, get_buffer_string
from langchain_core.runnables import RunnableParallel

#### Load/Split Documents

In [6]:
# Initialize a loader, and load documents. \
    # We'll use the WebBaseLoader to load documents from the web.
loader = WebBaseLoader("https://python.langchain.com/docs/expression_language/")

documents = loader.load()

# Check documents contents
documents

[Document(page_content="\n\n\n\n\nLangChain Expression Language (LCEL) | 🦜️🔗 Langchain\n\n\n\n\n\n\n\nSkip to main content🦜️🔗 LangChainDocsUse casesIntegrationsGuidesAPIMoreVersioningChangelogDeveloper's guideTemplatesCookbooksTutorialsYouTube videos🦜️🔗LangSmithLangSmith DocsLangServe GitHubTemplates GitHubTemplates HubLangChain HubJS/TS DocsChatSearchGet startedIntroductionInstallationQuickstartSecurityLangChain Expression LanguageGet startedWhy use LCELInterfaceStreamingHow toCookbookLangChain Expression Language (LCEL)ModulesModel I/ORetrievalAgentsChainsMoreLangServeLangSmithLangGraphLangChain Expression LanguageLangChain Expression Language (LCEL)LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together.\nLCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production).

In [7]:
# Initialize a CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=256, chunk_overlap=25)

# Split our documents into chunks using the CharacterTextSplitter
docs = text_splitter.split_documents(documents)

# Check docs contents
docs

[Document(page_content='LangChain Expression Language (LCEL) | 🦜️🔗 Langchain', metadata={'source': 'https://python.langchain.com/docs/expression_language/', 'title': 'LangChain Expression Language (LCEL) | 🦜️🔗 Langchain', 'description': 'LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together.', 'language': 'en'}),
 Document(page_content="Skip to main content🦜️🔗 LangChainDocsUse casesIntegrationsGuidesAPIMoreVersioningChangelogDeveloper's guideTemplatesCookbooksTutorialsYouTube videos🦜️🔗LangSmithLangSmith DocsLangServe GitHubTemplates GitHubTemplates HubLangChain HubJS/TS DocsChatSearchGet startedIntroductionInstallationQuickstartSecurityLangChain Expression LanguageGet startedWhy use LCELInterfaceStreamingHow toCookbookLangChain Expression Language (LCEL)ModulesModel I/ORetrievalAgentsChainsMoreLangServeLangSmithLangGraphLangChain Expression LanguageLangChain Expression Language (LCEL)LangChain Expression Language, or LCEL, is a declarative way t

#### Initialize Embeddings and Vectorstore

In [8]:
# Initialize the embeddings model
embeddings = OpenAIEmbeddings()

# Initialize a FAISS index with our embedded documents
vectorstore = FAISS.from_texts(["harrison worked at kensho"], embeddings)

#### Create Retriever using Vectorstore

In [9]:
# Set our vectorstore as our retriever
retriever = vectorstore.as_retriever()

# Check retriever contents
retriever

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001D9C67D5480>)

### Prompt Templates & Chaining Functionality
Let's start by building a simple prompt template that will take in a question and "chat_history" as parameters, which is then used to create a condensed question.

In [10]:
from langchain.schema import format_document
from langchain_core.messages import AIMessage, HumanMessage, get_buffer_string
from langchain_core.runnables import RunnableParallel
from langchain.prompts.prompt import PromptTemplate

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

Next, we create a separate simple retrieval augmented generation prompt to generate an answer solely on the provided context and question.

In [11]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)

Now we'll create another prompt template that will pass `page_content` through to whatever step follows it. We'll learn more about this later, once we start building and invoking our chain.

In [12]:
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")

In [13]:
# Combine the document with question prompts
def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)

Now that we've got our combined documents and prompt templates, we can create a chain to ask a chat model about our context. The model will be responsible for processing all prompts at each step of the way.

Please review our prompts if necessary for your learning, including `CONDENSE_QUESTION_PROMPT`, `ANSWER_PROMPT`, and `DEFAULT_DOCUMENT_PROMPT`.

We'll use a RunnableParallel to pass the standalone question and chat history to the condense question prompt. Next, we create `_context` to pass the standalone question as a "question" to the retriever. Finally, we create our LangChain Expression Language "Chain."

In [14]:
# Use a RunnableParallel to pass the standalone question \
    # through the retriever with parallelism
_inputs = RunnableParallel(
    standalone_question=RunnablePassthrough.assign(
        chat_history=lambda x: get_buffer_string(x["chat_history"])
    )
    | CONDENSE_QUESTION_PROMPT
    | ChatOpenAI(model_name="gpt-3.5-turbo-1106", temperature=0)
    | StrOutputParser(),
)

_context = {
    "context": itemgetter("standalone_question") | retriever | _combine_documents,
    # Pass the standalone question as "question" to the retriever
    "question": lambda x: x["standalone_question"],
}

# Create a chain to enable interfacing with a chat model using our prompts and context
conversational_qa_chain = _inputs | _context | ANSWER_PROMPT | ChatOpenAI()

In [15]:
# Test the chain
conversational_qa_chain.invoke(
    {
        "question": "where did he work?",
        "chat_history": [
            HumanMessage(content="Who wrote this notebook?"),
            AIMessage(content="Harrison"),
        ],
    }
)

AIMessage(content='Harrison worked at Kensho.')

##### Memory

In [16]:
from operator import itemgetter
from langchain.memory import ConversationBufferMemory

# Initialize memory
memory = ConversationBufferMemory(
    return_messages=True, output_key="answer", input_key="question"
)

In [17]:
# First we add a step to load memory
# This adds a "memory" key to the input object
loaded_memory = RunnablePassthrough.assign(
    chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)
# Now we calculate the standalone question
standalone_question = {
    "standalone_question": {
        "question": lambda x: x["question"],
        "chat_history": lambda x: get_buffer_string(x["chat_history"]),
    }
    | CONDENSE_QUESTION_PROMPT
    | ChatOpenAI(temperature=0)
    | StrOutputParser(),
}
# Now we retrieve the documents
retrieved_documents = {
    "docs": itemgetter("standalone_question") | retriever,
    "question": lambda x: x["standalone_question"],
}
# Now we construct the inputs for the final prompt
final_inputs = {
    "context": lambda x: _combine_documents(x["docs"]),
    "question": itemgetter("question"),
}
# And finally, we do the part that returns the answers
answer = {
    "answer": final_inputs | ANSWER_PROMPT | ChatOpenAI(),
    "docs": itemgetter("docs"),
}
# And now we put it all together!
final_chain = loaded_memory | standalone_question | retrieved_documents | answer

In [18]:
inputs = {"question": "where did harrison work?"}
result = final_chain.invoke(inputs)
result

{'answer': AIMessage(content='Harrison was employed at Kensho.'),
 'docs': [Document(page_content='harrison worked at kensho')]}

In [19]:

# Manually save context
memory.save_context(inputs, {"answer": result["answer"].content})

# Load context
memory.load_memory_variables({})

{'history': [HumanMessage(content='where did harrison work?'),
  AIMessage(content='Harrison was employed at Kensho.')]}

---

## Multiple Converational Chains

### Lesson:
Runnables are easily used to string multiple chains together which allows for more complex workflows. We will also learn about Branching and Merging RunnableParallels so that we can process multiple chains at the same time, while their outputs can be combined to synthesize a finalized response.