## Introduction

Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using LLMs in isolation is often not enough in practice to create a truly powerful or useful business application - the real power comes when you are able to combine them with other sources of computation, services or knowledge. [LangChain](https://python.langchain.com/en/latest/index.html) is an intuitive open-source python framework created to simplify the development of useful applications using large language models (LLMs), such as OpenAI or Hugging Face. 

In this article, we will give an overview of the LangChain framework and then look in more detail at 3 key components: Models, Prompts and Parsers.

## LangChain Overview & Key Components

### Principles

The LangChain development team believes that the strongest and most distinctive LLM applications won't just reference a language model, they'll also be:

- **Data-aware:** connect a language model to other sources of data

- **Agentic:** allow a language model to interact with its environment

These concepts serve as the foundation for the LangChain framework.

### Modules

The fundamental abstractions that serve as the foundation for any LLM-powered programme are known as LangChain modules.
LangChain offers standardised, expandable interfaces for each module. Additionally, LangChain offers third-party integrations and complete implementations for commercial use.

The modules are (from least to most complex):

- **Models:** Supported model types and integrations.

- **Prompts:** Prompt management, optimization, and serialization.

- **Memory:** Memory refers to state that is persisted between calls of a chain/agent.

- **Indexes:** Language models become much more powerful when combined with application-specific data - this module contains interfaces and integrations for loading, querying and updating external data.

- **Chains:** Chains are structured sequences of calls (to an LLM or to a different utility).

- **Agents:** An agent is a Chain in which an LLM, given a high-level directive and a set of tools, repeatedly decides an action, executes the action and observes the outcome until the high-level directive is complete.

- **Callbacks:** Callbacks let you log and stream the intermediate steps of any chain, making it easy to observe, debug, and evaluate the internals of an application.

<img src="https://github.com/pranath/blog/raw/master/images/langchain-overview.png" width="800"/>

### Use Cases

LangChain provides ready to go built in implementations of common useful LLM usecases for the following:

- **Autonomous Agents:** Autonomous agents are long-running agents that take many steps in an attempt to accomplish an objective. Examples include AutoGPT and BabyAGI.

- **Agent Simulations:** Putting agents in a sandbox and observing how they interact with each other and react to events can be an effective way to evaluate their long-range reasoning and planning abilities.

- **Personal Assistants:** One of the primary LangChain use cases. Personal assistants need to take actions, remember interactions, and have knowledge about your data.

- **Question Answering:** Another common LangChain use case. Answering questions over specific documents, only utilizing the information in those documents to construct an answer.

- **Chatbots:** Language models love to chat, making this a very natural use of them.

- **Querying Tabular Data:** Recommended reading if you want to use language models to query structured data (CSVs, SQL, dataframes, etc).

- **Code Understanding:** Recommended reading if you want to use language models to analyze code.

- **Interacting with APIs:** Enabling language models to interact with APIs is extremely powerful. It gives them access to up-to-date information and allows them to take actions.

- **Extraction:** Extract structured information from text.

- **Summarization:** Compressing longer documents. A type of Data-Augmented Generation.

- **Evaluation:** Generative models are hard to evaluate with traditional metrics. One promising approach is to use language models themselves to do the evaluation.



## OpenAI Setup

For our examples we will be using OpenAi ChatGPT models, so lets load the required libs and config.

First we need to load certain python libs and connect the OpenAi api.

The OpenAi api library needs to be configured with an account's secret key, which is available on the [website](https://platform.openai.com/account/api-keys). 

You can either set it as the `OPENAI_API_KEY` environment variable before using the library:
 ```
 !export OPENAI_API_KEY='sk-...'
 ```

Or, set `openai.api_key` to its value:

```
import openai
openai.api_key = "sk-..."
```

In [1]:
#| include: false
!pip install python-dotenv
!pip install openai

Collecting python-dotenv
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.0
Collecting openai
  Downloading openai-0.27.7-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp
  Downloading aiohttp-3.8.4-cp39-cp39-macosx_10_9_x86_64.whl (360 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m360.3/360.3 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m00:01[0m
Collecting async-timeout<5.0,>=4.0.0a3
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting multidict<7.0,>=4.5
  Downloading multidict-6.0.4-cp39-cp39-macosx_10_9_x86_64.whl (29 kB)
Collecting frozenlist>=1.1.1
  Downloading frozenlist-1.3.3-cp39-cp39-macosx_10_9_x86_64.whl (36 kB)
Collecting yarl<2.0,>=1.0
  Downloading yarl-1.9.2-cp39-cp39-macosx_10_9_x86_64.whl (65 kB)
[2K     [90m━━━━━━━━━━

In [2]:
import os
import openai

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

## Using OpenAI without LangChain

In [earlier articles](https://livingdatalab.com/#category=openai) we looked at how to use the OpenAI API directly to use the ChatGPT model, so lets recap on how thats done without using a framework like LangChain.

We'll define this helper function to make it easier to use prompts and examine outputs that are generated. GetCompletion is a function that just accepts a prompt and returns the completion for that prompt.

We will use OpenAI's `gpt-3.5-turbo` model.

In [3]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, 
    )
    return response.choices[0].message["content"]

In [4]:
get_completion("What is 1+1?")

'As an AI language model, I can tell you that the answer to 1+1 is 2.'

## Use Case Example - Translating Customer Emails
Lets imagine we have a use case where we get multiple emails from customers in different languages. If our primary language is English it might be useful for us to convert all customer emails into English.

Lets have a bit of fun along the way, and create a customer email about a product in the 'English Pirate' Language.

### Email Transformation using ChatGPT API

First we will use the ChatGPT API to do the task without LangChain.

In [5]:
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse,\
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

Let's say we want to transform this into American English, in a calm and respectful tone. We can define a style for our transformation thus:

In [6]:
style = """American English \
in a calm and respectful tone
"""

Now as we have in previous articles, manually construct a prompt for our LLM from these two parts:

In [7]:
prompt = f"""Translate the text \
that is delimited by triple backticks 
into a style that is {style}.
text: ```{customer_email}```
"""

print(prompt)

Translate the text that is delimited by triple backticks 
into a style that is American English in a calm and respectful tone
.
text: ```
Arrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse,the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!
```



Now let's get the transformation from ChatGPT:

In [8]:
response = get_completion(prompt)

In [9]:
response

'I am quite upset that my blender lid came off and caused my smoothie to splatter all over my kitchen walls. Additionally, the warranty does not cover the cost of cleaning up the mess. Would you be able to assist me, please? Thank you kindly.'

### Email Transformation using LangChain

Let's try how we can do the same using LangChain.

In [10]:
#| include: false
!pip install --upgrade langchain

Collecting langchain
  Downloading langchain-0.0.188-py3-none-any.whl (969 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m969.4/969.4 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pydantic<2,>=1
  Downloading pydantic-1.10.8-cp39-cp39-macosx_10_9_x86_64.whl (2.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.9/2.9 MB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting numexpr<3.0.0,>=2.8.4
  Downloading numexpr-2.8.4-cp39-cp39-macosx_10_9_x86_64.whl (99 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m99.9/99.9 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dataclasses-json<0.6.0,>=0.5.7
  Downloading dataclasses_json-0.5.7-py3-none-any.whl (25 kB)
Collecting tenacity<9.0.0,>=8.1.0
  Downloading tenacity-8.2.2-py3-none-any.whl (24 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2
  Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
[2K     [90

First we need to load the LangChain library for OpenAI, this is basically a wrapper around the OpenAI API.

In [11]:
from langchain.chat_models import ChatOpenAI

In [12]:
# To control the randomness and creativity of the generated
# text by an LLM, use temperature = 0.0
chat = ChatOpenAI(temperature=0.0)
chat

ChatOpenAI(verbose=False, callbacks=None, callback_manager=None, client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, model_name='gpt-3.5-turbo', temperature=0.0, model_kwargs={}, openai_api_key=None, openai_api_base=None, openai_organization=None, openai_proxy=None, request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=None)

#### Email Transformation using LangChain Create Prompt template

LangChain allows us to create a template object for the prompt, in doing so this creates something we can more easily re-use.

In [13]:
template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""

In [14]:
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(template_string)


In [15]:
prompt_template.messages[0].prompt

PromptTemplate(input_variables=['style', 'text'], output_parser=None, partial_variables={}, template='Translate the text that is delimited by triple backticks into a style that is {style}. text: ```{text}```\n', template_format='f-string', validate_template=True)

In [16]:
prompt_template.messages[0].prompt.input_variables

['style', 'text']

Using this syntax for the template, the object knows there are 2 input variables.

We can now define the style and combine this with the template to create the prompt in a more structured way than before.

In [17]:
customer_style = """American English \
in a calm and respectful tone
"""

In [18]:
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse, \
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

In [19]:
customer_messages = prompt_template.format_messages(
                    style=customer_style,
                    text=customer_email)

In [20]:
print(type(customer_messages))
print(type(customer_messages[0]))

<class 'list'>
<class 'langchain.schema.HumanMessage'>


In [21]:
print(customer_messages[0])

content="Translate the text that is delimited by triple backticks into a style that is American English in a calm and respectful tone\n. text: ```\nArrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse, the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!\n```\n" additional_kwargs={} example=False


Lets now get the model response.

In [22]:
# Call the LLM to translate to the style of the customer message
customer_response = chat(customer_messages)

In [23]:
print(customer_response.content)

I'm really frustrated that my blender lid flew off and made a mess of my kitchen walls with smoothie. To add to my frustration, the warranty doesn't cover the cost of cleaning up my kitchen. Can you please help me out, friend?


The advantage of using LangChain this way means we can reuse this approach with just a few changes.

Let's imagine a different customer message we want to transform.

In [24]:
service_reply = """Hey there customer, \
the warranty does not cover \
cleaning expenses for your kitchen \
because it's your fault that \
you misused your blender \
by forgetting to put the lid on before \
starting the blender. \
Tough luck! See ya!
"""

In [25]:
service_style_pirate = """\
a polite tone \
that speaks in English Pirate\
"""

In [26]:
service_messages = prompt_template.format_messages(
    style=service_style_pirate,
    text=service_reply)

print(service_messages[0].content)

Translate the text that is delimited by triple backticks into a style that is a polite tone that speaks in English Pirate. text: ```Hey there customer, the warranty does not cover cleaning expenses for your kitchen because it's your fault that you misused your blender by forgetting to put the lid on before starting the blender. Tough luck! See ya!
```



In [27]:
service_response = chat(service_messages)
print(service_response.content)

Ahoy there, me hearty customer! I be sorry to inform ye that the warranty be not coverin' the expenses o' cleaning yer galley, as 'tis yer own fault fer misusin' yer blender by forgettin' to put the lid on afore startin' it. Aye, tough luck! Farewell and may the winds be in yer favor!


As you build more sophisticated applications using prompts and LLM's, prompts can become longer and more detailed. Prompt Templates can help with efficiency to be able to re-use good prompts. LangChain conveniently provides pre-defined prompts for common operations to speed up development such as text summarisation, question-answering, and connecting to databases etc.

#### Output Parsers

LangChain also supports output parsing. When you build a complex application using an LLM, you often instruct the LLM to generate the output in a certain format - for example using specific keywords to separate different parts of the response. One format for example is called ['Chain of Thought Reasoning' (ReAct)](https://ai.googleblog.com/2022/11/react-synergizing-reasoning-and-acting.html) which uses keywords such as **Thought, Action & Observation** encourages the model to take more time thinking through a problem/request/prompt which tends to lead to better outputs and solutions as we learned in a [previous article](https://livingdatalab.com/posts/2023-05-01-best-practice-for-prompting-large-language-models.html#principle-2-give-the-model-time-to-think). Using LangChain can help us ensure we are using some of the best and most upto date methods for LLM prompting - much like the [PyCaret](https://livingdatalab.com/posts/2021-12-04-python-power-tools-pycaret.html) library does for conventional machine learning.

Let's look at an example and start with defining how we would like the LLM output to look like. Let's say we have a JSON output from the LLM and we would like to be able to parse that output. 

For example lets say we want to extract information from a product review, and output that in a particular JSON format:

In [28]:
{
  "gift": False,
  "delivery_days": 5,
  "price_value": "pretty affordable!"
}

{'gift': False, 'delivery_days': 5, 'price_value': 'pretty affordable!'}

Let's also define a customer review text and a prompt template we want to use that will help generate that JSON output.

In [29]:
customer_review = """\
This leaf blower is pretty amazing.  It has four settings:\
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversary present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.
"""

review_template = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

Format the output as JSON with the following keys:
gift
delivery_days
price_value

text: {text}
"""

In [30]:
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(review_template)
print(prompt_template)

input_variables=['text'] output_parser=None partial_variables={} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['text'], output_parser=None, partial_variables={}, template='For the following text, extract the following information:\n\ngift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.\n\ndelivery_days: How many days did it take for the product to arrive? If this information is not found, output -1.\n\nprice_value: Extract any sentences about the value or price,and output them as a comma separated Python list.\n\nFormat the output as JSON with the following keys:\ngift\ndelivery_days\nprice_value\n\ntext: {text}\n', template_format='f-string', validate_template=True), additional_kwargs={})]


Let's now generate the JSON response

In [31]:
messages = prompt_template.format_messages(text=customer_review)
chat = ChatOpenAI(temperature=0.0)
response = chat(messages)
print(response.content)

{
    "gift": true,
    "delivery_days": 2,
    "price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]
}


So this looks like a JSON but is it? let's check the type

In [32]:
type(response.content)

str

Because its a string and not a JSON dictionary, we can't index into it to get the values.

In [33]:
# We will get an error by running this line of code 
# because 'gift' is not a dictionary
# 'gift' is a string
response.content.get('gift')

AttributeError: 'str' object has no attribute 'get'

#### Parse the LLM output string into a Python dictionary

So we can use LangChain's parser to help with this.

In [34]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

So for each of the parts of the JSON we want we can define a text schema. These tell the library what we want to parse and how.

In [35]:
gift_schema = ResponseSchema(name="gift",
                             description="Was the item purchased\
                             as a gift for someone else? \
                             Answer True if yes,\
                             False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
                                      description="How many days\
                                      did it take for the product\
                                      to arrive? If this \
                                      information is not found,\
                                      output -1.")
price_value_schema = ResponseSchema(name="price_value",
                                    description="Extract any\
                                    sentences about the value or \
                                    price, and output them as a \
                                    comma separated Python list.")

response_schemas = [gift_schema, 
                    delivery_days_schema,
                    price_value_schema]

Now that we have defined the schema's for each of the parts we want, LangChain can help generate the prompt that will put these together to generate the prompt we need to generate our desired output. The output parser will basically tell you what kind of prompt you need to send to the LLM.

In [36]:
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [37]:
format_instructions = output_parser.get_format_instructions()

Let's have a look at the format instructions for the prompt our parser has generated to use for our LLM.

In [38]:
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"gift": string  // Was the item purchased                             as a gift for someone else?                              Answer True if yes,                             False if not or unknown.
	"delivery_days": string  // How many days                                      did it take for the product                                      to arrive? If this                                       information is not found,                                      output -1.
	"price_value": string  // Extract any                                    sentences about the value or                                     price, and output them as a                                     comma separated Python list.
}
```


Let's now put these format instructions together with the prompt template and submit it to the LLM.

In [39]:
review_template_2 = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: {text}

{format_instructions}
"""

prompt = ChatPromptTemplate.from_template(template=review_template_2)

messages = prompt.format_messages(text=customer_review, 
                                format_instructions=format_instructions)

In [40]:
print(messages[0].content)

For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the productto arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,and output them as a comma separated Python list.

text: This leaf blower is pretty amazing.  It has four settings:candle blower, gentle breeze, windy city, and tornado. It arrived in two days, just in time for my wife's anniversary present. I think my wife liked it so much she was speechless. So far I've been the only one using it, and I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.


The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "

In [41]:
response = chat(messages)

Let's see what response we got for our prompt:

In [42]:
print(response.content)

```json
{
	"gift": true,
	"delivery_days": "2",
	"price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]
}
```


Now we can use the output parser we created earlier to output a dict, and notice its of type dict not string - and so we can extract the different value parts.

In [43]:
output_dict = output_parser.parse(response.content)

In [44]:
output_dict

{'gift': True,
 'delivery_days': '2',
 'price_value': ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]}

In [45]:
type(output_dict)

dict

In [46]:
output_dict.get('delivery_days')

'2'

## Acknowledgements

I'd like to express my thanks to the wonderful [LangChain for LLM Application Development Course](https://www.deeplearning.ai/short-courses/langchain-for-llm-application-development/) by DeepLearning.ai - which i completed, and acknowledge the use of some images and other materials from the course in this article.