# **Introduction to LangChain**

**Build Applications That Can Reason**

LangChain is a framework for developing applications powered by language models.  
LangChain enables building application that connect external sources of data and computation to LLMs. 

It enables applications that:

- **Are context-aware:** connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- **Reason:** rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

This framework consists of several parts.

- **LangChain Libraries:** The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
- **LangChain Templates:** A collection of easily deployable reference architectures for a wide variety of tasks.
- **LangServe:** A library for deploying LangChain chains as a REST API.
- **LangSmith:** A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

Together, these products simplify the entire application lifecycle:

- **Develop:** Write your applications in LangChain/LangChain.js. Hit the ground running using Templates for reference.
- **Productionize:** Use LangSmith to inspect, test and monitor your chains, so that you can constantly improve and deploy with confidence.
- **Deploy:** Turn any chain into an API with LangServe.

### **Langchain vs Langchain-Community vs Langchain-Core**  
The old langchain package into three separate packages to improve developer experience

`langchain-core` contains simple, core abstractions that have emerged as a standard, as well as LangChain Expression Language as a way to compose these components together. This package is now at version 0.1 and all breaking changes will be accompanied by a minor version bump.

`langchain-community` contains all third party integrations. We will work with partners on splitting key integrations out into standalone packages over the next month.

`langchain` contains higher-level and use-case specific chains, agents, and retrieval algorithms that are at the core of your application's cognitive architecture. We are targeting a launch of a stable 0.1 release for langchain in early January. The latest version (v2) came in March2024

<img src="images/langchain_stack.JPG">

## **Prompt Template + Model + Chains + Output Parser**

**Model**  
- LLMs handle various language operations such as translation, summarization, question answering, and content creation.

**Prompt Template**  
- Prompt Templates are used to convert raw user input to a better input to the LLM. Templates allow us to easily configure and modify our input prompts to LLM calls.

**Output Parser**  
- It's often much more convenient to work with strings. We can add a simple output parser to convert the chat message to a string.

**Chains**  
- The `|` symbol chains together the different components feeds the output from one component as input into the next component.


<img src="images/langchain_LCEL.JPG">

# Step 1 - Models (LLM and ChatLLM)

In [5]:
# Setup API key
f = open('keys/.openai_api_key.txt')
OPEN_API_KEY = f.read()

In [11]:
# Import OpenAI ChatModel
from langchain_openai import ChatOpenAI

# Set the ChatOpenAI KEY and initialise a model
chat_model = ChatOpenAI(api_key=OPEN_API_KEY)

# creating a promp
prompt = "What is Feature Engineering in Data Science?"

# Printing the output of model
print(chat_model.invoke(prompt))

content='Feature engineering is the process of selecting, creating, and transforming variables (features) in a dataset to improve the performance of machine learning algorithms. It involves identifying the most relevant features, handling missing data, encoding categorical variables, scaling numerical variables, and creating new features through techniques like polynomial features, interaction terms, and feature selection. Feature engineering is a crucial step in the data preprocessing pipeline as it can significantly impact the accuracy and efficiency of machine learning models.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 90, 'prompt_tokens': 15, 'total_tokens': 105, 'completion_tokens_details': {'audio_tokens': None, 'reasoning_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-7007b117-a1ce

In [13]:
# Set the ChatOpenAI KEY and initialise a model
chat_model = ChatOpenAI(api_key=OPEN_API_KEY)

# creating a promp
prompt = "What is covarience?"

# Printing the output of model
print(chat_model.invoke(prompt))

content='Covariance is a statistical measure that describes the relationship between two random variables. It indicates whether there is a positive or negative relationship between the variables, as well as the strength of that relationship. A positive covariance indicates that as one variable increases, the other variable also tends to increase, while a negative covariance indicates that as one variable increases, the other tends to decrease. A covariance of zero indicates that there is no relationship between the variables.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 88, 'prompt_tokens': 13, 'total_tokens': 101, 'completion_tokens_details': {'audio_tokens': None, 'reasoning_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-9efb7763-fb79-4540-975e-045146a04de2-0' usage_metadata={'input_token

# Prompts - can be a "string" or "chat messages"
# Template - Logic1

In [19]:
from langchain_core.prompts import ChatPromptTemplate
prompt_template = ChatPromptTemplate.from_messages([
("system","You are a helpful AI Tutor with expertise in Data Science and Artificial Intelligence."),
    ("human", "What is {topic}?"),
])
prompt_template.input_variables

['topic']

# Chains

### Chains allows us to link the output of one LLM call as the input of another call.

In [23]:
chain = prompt_template | chat_model

input = {'topic':"NLP"}

chain.invoke(input)

AIMessage(content='NLP stands for Natural Language Processing. It is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. NLP allows computers to understand, interpret, and generate human language, enabling machines to communicate with humans in a more natural way. NLP techniques are used in various applications such as chatbots, sentiment analysis, language translation, and text summarization.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 78, 'prompt_tokens': 31, 'total_tokens': 109, 'completion_tokens_details': {'audio_tokens': None, 'reasoning_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-a6e05e86-3ae5-4f7f-ada5-3d4ba8d6b5a6-0', usage_metadata={'input_tokens': 31, 'output_tokens': 78, 'total_tokens': 109})

# Output Parsing

In [28]:
from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()

In [30]:
# Modifying the chain

chain = prompt_template | chat_model | output_parser

input = {'topic':"NLP"}

chain.invoke(input)

'NLP stands for Natural Language Processing. It is a branch of artificial intelligence that focuses on the interaction between computers and humans using natural language. NLP enables computers to understand, interpret, and generate human language in a way that is valuable. This field involves tasks such as text/speech recognition, sentiment analysis, language translation, and more. NLP techniques are widely used in various applications such as chatbots, virtual assistants, language translation services, and text analysis tools.'

In [32]:
# Modifying the chain

chain = prompt_template | chat_model | output_parser

input = {'topic':"Deep Learning"}

chain.invoke(input)

'Deep learning is a subset of machine learning that utilizes artificial neural networks to model and solve complex problems. These neural networks consist of multiple layers of interconnected nodes that process and learn from data to make predictions or decisions. Deep learning algorithms are capable of automatically learning hierarchical representations of data, extracting relevant features, and making accurate predictions or classifications. Deep learning has been successfully applied to various tasks such as image and speech recognition, natural language processing, and more.'

# Example : Create an AI Tutor APP that uses Prompts and Chat internally to given Python implementation tutorial for Data Science Topics

In [37]:
# Import OpenAI ChatModel
from langchain_openai import ChatOpenAI
# Set the ChatOpenAI KEY and initialise a model
chat_model = ChatOpenAI(api_key=OPEN_API_KEY)

In [61]:
f = open('keys/.gemini.txt')
GOOGLE_API_KEY = f.read()

In [63]:
from langchain_google_genai import ChatGoogleGenerativeAI

In [65]:
llm = ChatGoogleGenerativeAI(api_key=GOOGLE_API_KEY, model="gemini-1.5-flash", temperature=0.9)
prompt = "How many states are there in India?"
print(llm.invoke(prompt))

content='There are **28 states** and **8 union territories** in India. \n' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}]} id='run-a25306a1-bafc-4882-b637-3f8b89b9db6b-0' usage_metadata={'input_tokens': 9, 'output_tokens': 16, 'total_tokens': 25}


In [55]:
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
# construct system prompt
system_prompt = SystemMessagePromptTemplate.from_template("""You are a friendly AI Tutor with expertise in Data Science and AI who tells step by step Python implementation for topics asked by user.""")

human_prompt = HumanMessagePromptTemplate.from_template("Tell me a python implementation for {topic_name}.")

chat_template = ChatPromptTemplate.from_messages([system_prompt, human_prompt])

chat_template

ChatPromptTemplate(input_variables=['topic_name'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are a friendly AI Tutor with expertise in Data Science and AI who tells step by step Python implementation for topics asked by user.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['topic_name'], input_types={}, partial_variables={}, template='Tell me a python implementation for {topic_name}.'), additional_kwargs={})])

In [57]:
from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()

In [59]:
chain = chat_template | chat_model | output_parser

input = {'topic_name':"Logistic Regression"}

response = chain.invoke(input)

print(response)

Sure! Here is a step-by-step Python implementation for Logistic Regression using the popular machine learning library `scikit-learn`:

1. Import the necessary libraries:
```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
```

2. Load your dataset (for this example, I'll use a sample dataset from scikit-learn):
```python
from sklearn.datasets import load_iris
data = load_iris()
X = data.data
y = data.target
```

3. Split the dataset into training and testing sets:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

4. Initialize the Logistic Regression model and fit it to the training data:
```python
model = LogisticRegression()
model.fit(X_train, y_train)
```

5. Make predictions on the test set:
```python
predictions = model.predict(X_test)
```

6. Evaluate the model by c

# CSV Parser

In [72]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser
csv_output_parser = CommaSeparatedListOutputParser()
csv_output_parser.get_format_instructions()

'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'

In [74]:
example_input = "Python, DA, SQL, ML, DL, AI"
type(example_input)

str

In [76]:
csv_output_parser.parse(example_input)

['Python', 'DA', 'SQL', 'ML', 'DL', 'AI']

In [80]:
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
# construct system prompt
system_prompt = SystemMessagePromptTemplate.from_template(""" You are a helpful AI chef Assistant. Given a dish name by user, you can provide the ingredients to prepare the dish. Output Format Instructions : {output_format_instructions} """)
human_prompt = HumanMessagePromptTemplate.from_template("Give me the ingredient for cooking {dish_name}.")

chat_template = ChatPromptTemplate.from_messages([system_prompt, human_prompt])

chat_template

ChatPromptTemplate(input_variables=['dish_name', 'output_format_instructions'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['output_format_instructions'], input_types={}, partial_variables={}, template=' You are a helpful AI chef Assistant. Given a dish name by user, you can provide the ingredients to prepare the dish. Output Format Instructions : {output_format_instructions} '), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['dish_name'], input_types={}, partial_variables={}, template='Give me the ingredient for cooking {dish_name}.'), additional_kwargs={})])

In [82]:
chain = chat_template | chat_model | csv_output_parser

user_input = {'dish_name':"paneer biryani", "output_format_instructions" : csv_output_parser.get_format_instructions()}

response = chain.invoke(user_input)

print(response)

['paneer', 'basmati rice', 'onions', 'tomatoes', 'yogurt', 'ginger', 'garlic', 'green chilies', 'biryani masala', 'turmeric powder', 'red chili powder', 'garam masala', 'mint leaves', 'coriander leaves', 'lemon juice', 'ghee', 'oil', 'salt', 'water']


# Datetime Parser

In [87]:
!pip install langchain



In [91]:
from langchain.output_parsers import DatetimeOutputParser
output_parser = DatetimeOutputParser()

In [93]:
output_parser.get_format_instructions()

"Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.\n\nExamples: 0029-11-03T02:45:34.777941Z, 0081-02-17T18:35:55.348793Z, 0381-09-14T02:16:18.353714Z\n\nReturn ONLY this string, no other words!"

In [97]:
from langchain.prompts import PromptTemplate

template = """Answer the users question:
Question : {question}
Output Format Instructions : {output_format_instructions} """
prompt_template = PromptTemplate.from_template(template)

In [99]:
chain = prompt_template | chat_model | output_parser

input = {"question": "What is Indian Independence Day?", 
         "output_format_instructions" : output_parser.get_format_instructions()}

response = chain.invoke(input)

print(response)
         

1947-08-15 00:00:00


# Pydantic Parser