<h1>
🦜🔗LangChain
</h1>


# **Brief Recap**

**LangChain** is an open source framework for building applications based on large language models (LLMs). LangChain provides tools and abstractions to improve the customization, accuracy, and relevancy of the information the models generate. For example, developers can use LangChain components to build new prompt chains or customize existing templates. LangChain also includes components that allow LLMs to access new data sets without retraining.

With LangChain, organizations can repurpose LLMs for domain-specific applications without retraining or fine-tuning. Development teams can build complex applications referencing proprietary information to augment model responses. For example, you can use LangChain to build applications that read data from stored internal documents and summarize them into conversational responses.

# **Architecture**

<img src='assets/arch.png' width=450>

Here's an architecture and workflow of a LangChain-powered document processing and question-answering system. Let me break down its key components:

* **Input Processing**
  * The system starts with PDF documents on the left side
  * These documents are split into multiple chunks of text/documents
  * Each chunk goes through an embedding process (represented by binary code icons)
* **Data Processing Flow**
  * The text chunks are converted into embeddings (vector representations)
  * These embeddings are stored in a Vector Store (shown as a database icon)
  * The system uses specific technologies like:
    * Amazon Aurora
    * PostgreSQL with pgvector
    * Knowledge Base functionality
* **Query Processing**
  * A user inputs a question ("What is a neural network?")
  * The question goes through Question Embedding
  * A Semantic Search is performed against the vector store
  * The search produces Ranked Results
* **Response Generation**
  * The ranked results are processed by an LLM (Language Learning Model)
  * The LLM generates the final Answer back to the user

LangChain combines document processing, vector embeddings, and language models to create a comprehensive question-answering system.




# **Use Cases**

* **Document Processing and Analysis**
  * Parsing complex documents to extract structured information into JSON or tables
  * Automated extraction of dates, quantities, and transaction details from financial documents
  * Document classification using few-shot prompting
  * Processing and analyzing large volumes of text documents
* **Question Answering Systems**
  * Building chatbots and virtual assistants for customer support
  * Creating retrieval-based QA systems with document context
  * Implementing conversational agents that can access and reference internal documents
  * Developing SQL-based QA systems that work with various database dialects
* **Content Generation and Summarization**
  * Generating executive summaries of documents and meeting notes
  * Creating summaries of large documents using map-reduce techniques
  * Producing concise summaries of financial reports and earnings documents
  * Translating content across multiple languages
* **Conversational AI**
  * Context-aware chatbots for customer support
  * Virtual agents that can access and reference company documentation
  * Automated appointment scheduling and customer service systems

* **Code-Related Tasks**
  * Code analysis for detecting bugs and security vulnerabilities
  * Development of coding assistants to improve programmer productivity
  * Custom development environments with integrated LLM capabilities



---



# **Implementation**


There are majorly 3 main components in LangChain:

1. **Components**
  * **LLM Wrappers**: There are LLM wrappers that allows us to connect to the LLMs like GPT 4, ollama, gemini.
  * **Prompt Templates**: They allow us to avoid hard code text the input to the LLMs.
  * **Indexes**: Allow us to extract relevant info from the LLMs. For eg: PineCone VectorStore.

2. **Chains**
  * They allow us to combine multiple components together to solve a specific task building an entire LLM application

3. **Agents**
  * It allows the LLMs to interact with external APIs.

We will be unpacking these concepts discussing and implementing each of them.





### **Initial Setup**

Download ollama from [here](https://ollama.com/).

Then,

```
ollama pull llama3.2
```

Then pip install the following libraries:
```python
langchain
langchain_ollama
```



### **LLM Wrapper**

In this section, we're going to see how LangChain allows us to interact with LLMs.

In [None]:
# Run basic query with OpenAI wrapper

from langchain_ollama import ChatOllama
from langchain.schema import HumanMessage
llm = ChatOllama(
    model="llama3.2",
    temperature=0,
    # other params...
)
response = llm([HumanMessage(content="Explain large language models in one sentence")])
print(response)

* It takes in a "text-davinci-003" LLM.
* The output is something similar to when you run it by the OpenAI API directly.


In [None]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain_ollama import ChatOllama

**Breakdown**

1. **Initializing the Ollama instance:**

  ```python
  chat = ChatOllama(model="llama3.2",temperature=0.3)

  ```

* This creates an instance of the `Ollama` class, which is a wrapper for interacting with Meta's chat models that have been setup locally.
* `model_name="llama3.2"` specifies that we're using the "llama3.2" model.
* `temperature=0.3` sets the temperature parameter to 0.3, which controls the randomness of the model's output. A lower temperature like 0.3 makes the output more focused and deterministic, while a higher temperature makes it more creative and unpredictable.

2. **Defining the messages:**

  ```python
  messages = [
      SystemMessage(content="You are an expert data scientist"),
      HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
  ]
  ```

* This creates a list of messages that will be sent to the chat model.
* **`SystemMessage`** provides overall instructions or context to the model. Here, it's telling the model to act as an "expert data scientist."
* **`HumanMessage`** represents the user's input or query. In this case, it's asking the model to "Write a Python script that trains a neural network on simulated data."

**In essence,**

This sets up a conversation with the "llama3.2" chat model, provides it with a system prompt and a user query, and then prints the model's response.



In [None]:
# Initializing the LLM
chat = ChatOllama(model="llama3.2",temperature=0.3)

# List of messages to send to the chat model
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]

# Response is stored here
response=chat(messages)

print(response.content,end='\n')

### **Prompt Templates**

Prompt templates are pre-defined recipes for generating prompts to feed to Language Models (LLMs). Instead of hardcoding the entire prompt string every time, you can use templates with placeholders (variables) that are filled in later. This makes your prompts more flexible and reusable.

LangChain provides the `PromptTemplate` class for creating and managing these templates.

In [None]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models.
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

# Run LLM with PromptTemplate

llm([HumanMessage(content = prompt.format(concept="autoencoder"))])


**Breakdown**

1. **Prompt Creation**
* `input_variables=["concept"]`: This specifies the name of the variable(s) that will be used in the template.
* `template=template`: This assigns the template string you defined earlier.

2. **Using the Prompt**
* `prompt.format(concept="autoencoder")`: This fills in the {concept} placeholder with the value "autoencoder", generating the complete prompt string.
* `llm(...)`: This sends the formatted prompt to the LLM (which you initialized earlier) to get a response.

**In essence,**

* It creates a reusable template for asking the LLM to explain a data science concept.
* Provides a way to easily change the concept being asked about without rewriting the entire prompt.
* Makes the code more organized and easier to understand.


### **Chains**


In this section we're going to see how to link multiple LangChain components together to create more complex workflows. This is useful for building applications where you want to process information in stages or combine the outputs of different components.

LangChain provides the `LLMChain` class for creating chains.


**Initializing Chain**

* We will first define a chain using the language model (`llm`) and prompt (`prompt`) as arguments.
* This chain combines the OpenAI language model (`llm`) with the previously defined prompt (`prompt`) that asks for an explanation of a data science concept.

In [None]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))

**Defining a Sequential Chain**

* Then we define a second prompt (`second_prompt`) that asks for a simplified explanation of the concept, as if explaining it to a five-year-old.

In [None]:
# Define a second prompt

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

* Now we import `SimpleSequentialChain` and define a new chain (`overall_chain`) that combines the first two chains in sequence.

* This means that the output of the first chain (the initial explanation) will be used as input to the second chain (to generate the simplified explanation).

* Finally, we run the `overall_chain` by specifying only the input variable for the first chain (e.g., "autoencoder").

In [None]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)



**In essence,**

* We saw how to create and run individual chains using `LLMChain`.
* It shows how to combine chains sequentially using `SimpleSequentialChain` to create more complex workflows.
* The sequential chain enables the output of one chain to be used as input to another, allowing for multi-step information processing.

### **Agents**

We will learn how to use LangChain to create an agent that can execute Python code. This allows you to combine the power of LLMs with the ability to run code and interact with the outside world.

In [None]:
agent_executor = create_python_agent(
    llm=OllamaLLM(model="llama3.2", temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)


In [None]:
# Execute the Python agent

agent_executor.run("Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")

**Initialize the LLM**

This creates an instance of the ChatOllama class, specifying the model name, temperature, and verbosity.

In [None]:
from langchain_ollama import ChatOllama
from langchain.agents import tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents import AgentExecutor
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser
from langchain.agents.format_scratchpad.openai_tools import format_to_openai_tool_messages

# Initialize the LLM
llm = ChatOllama(
    model="llama3.2",
    temperature=0,
    verbose=True
)


**Defne and setup a simple tool**

* This defines a custom tool called `get_word_length` that takes a word as input and returns its length. The `@tool` decorator registers this function as a tool that the agent can use.
* Then create a list of tools that the agent will have access to. In this case, it only includes the `get_word_length` tool.


In [None]:
# Define a simple tool
@tool
def get_word_length(word: str) -> int:
    """Returns the length of a word."""
    return len(word)

# Set up tools
tools = [get_word_length]


**Create a prompt template**

This defines a prompt template for the agent. The prompt includes system instructions, user input, and a placeholder for the agent's scratchpad, where it can store intermediate steps.


In [None]:
# Create prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are very powerful assistant"),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])


**Bind tools to LLM**

This binds the defined tools to the LLM instance, allowing the agent to access and use the tools.


In [None]:
# Bind tools to LLM
llm_with_tools = llm.bind_tools(tools)


**Create agent**

This creates the agent by combining the input, scratchpad, prompt, LLM with tools, and output parser.


In [None]:
# Create agent
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_tool_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | llm_with_tools
    | OpenAIToolsAgentOutputParser()
)


**Create agent executor**

This creates an agent executor that manages the execution of the agent's actions.


In [None]:
# Create agent executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)


**Test the agent**

This invokes the agent with a test input and prints the output.


In [None]:

# Test the agent
result = agent_executor.invoke({"input": "How many letters in the word education?"})
print(f"[Output] --> {result['output']}")


**In essence**,

This demonstrates how to empower a Language Model (LLM) with the ability to interact with the external world and execute actions beyond simple text generation. It achieves this by creating an agent that can leverage external tools, in this case, a custom Python function.