### Understanding LangChain: A Modular Framework for LLMs

* LangChain is fundamentally a framework designed for Large Language Models (LLMs).

* It enables the development of various applications such as chatbots, Generative Question-Answering (GQA), content summarization, and beyond.

* The essence of the framework lies in its ability to "chain" diverse components, facilitating the creation of sophisticated functionalities utilizing LLMs.
  * Chains are composed of various elements across different modules, including:

* These are pre-designed templates tailored for specific interactions, ranging from chatbot dialogues to Explain Like I'm Five (ELI5) question-responding formats.

* This encompasses a range of Large Language Models such as ChatGPT, Bard, Claude, etc.
* Agents leverage LLMs to determine necessary actions. They can employ tools like web search or calculators, integrated into a cohesive operational loop.
* Incorporating both short-term and long-term memory functionalities.

* Our primary aim here is to delve into the functionality that enables the transformation of unstructured text into structured data, extracting valuable insights.

### Core Components of LangChain

* Chains are composed of various modules that can be combined to enhance the capabilities of LLMs.

Key Modules Include:

  * Prompt Templates: Customizable templates suited for different interaction styles, including chatbot  conversations.
  * LLMs: Incorporation of various Large Language Models such as ChatGPT, Bard, Claude, etc.
  *  Agents: Agents utilize LLMs to determine the necessary actions, employing tools like web searches or calculators within a logical operational loop.
  * Memory Modules: These include both short-term and long-term memory functionalities.



In [1]:
prompt = """ What is the most populated city in the state of Hawaii. 
Provide city name and no additional information."""

import os
import openai

# openai.api_key = "ADD API KEY HERE"

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {
      "role": "user",
      "content": prompt
    }
  ],
  temperature=0,
  max_tokens=128,
)
print(response)


{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Honolulu",
        "role": "assistant"
      }
    }
  ],
  "created": 1698781836,
  "id": "chatcmpl-8Fp2y37BB4XnVABL7L0yWB4Vr6tzn",
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 2,
    "prompt_tokens": 28,
    "total_tokens": 30
  }
}


In [2]:
response["choices"][0]["message"]["content"]

'Honolulu'

In [3]:
from langchain.prompts import PromptTemplate

from langchain.chat_models import ChatOpenAI


In [4]:
model = ChatOpenAI(model="gpt-3.5-turbo")

In [5]:
prompt_str = """What is the most populated city in the state of Hawaii. 
Provide city name and no additional information."""

prompt = PromptTemplate.from_template(prompt_str)


In [6]:
chain = prompt | model


In [7]:
chain.invoke({})

AIMessage(content='Honolulu.', additional_kwargs={}, example=False)

### Prompts Are First Class objects in LangChain

* Prompts can be easily tailored to incorporate runtime variables.
* They can also be customized with examples for more precise and context-relevant responses.

In [8]:
prompt_str = """What is the most populated city in the state of {state}.

Provide city name and no additional information."""

prompt = PromptTemplate.from_template(prompt_str)

In [9]:
chain = prompt | model

In [10]:
response = chain.invoke({"state": "Hawaii"})
response.content

'Honolulu'

In [11]:
response = chain.invoke({"state": "California"})
response.content

'Los Angeles'

In [12]:
response = chain.invoke({"state": "Georgia"})
response.content

'Atlanta'

In [14]:
prompt_str = """What is the most populated city in the state provided below.

Provide city name and no additional information. 

Examples:

State: Hawaii
City: Honolulu

State: California
City: Los Angeles

State: {state}
"""

prompt = PromptTemplate.from_template(prompt_str)

chain = prompt | model


In [15]:
response = chain.invoke({"state": "Georgia"})

response

AIMessage(content='City: Atlanta', additional_kwargs={}, example=False)

In [16]:
response.content

'City: Atlanta'

In [18]:
prompt_str = """What is the most populated city in the state provided below.

Provide city name and no additional information. 

Examples:

State: Hawaii
{{"City": "Honolulu"}}

State: California
{{"City": "Los Angeles"}}

State: {state}
"""

prompt = PromptTemplate.from_template(prompt_str)

chain = prompt | model


In [19]:
response = chain.invoke({"state": "Georgia"})

response

AIMessage(content='{"City": "Atlanta"}', additional_kwargs={}, example=False)

In [20]:
response.content

'{"City": "Atlanta"}'

In [21]:
import json
data = json.loads(response.content)
data

{'City': 'Atlanta'}

In [22]:
data["City"]

'Atlanta'

In [23]:
prompt_prefix = """What is the most populated city in the state provided below. 
Provide city name and no additional information. """


In [24]:
prompt_examples = [
    {"ExampleState": "Hawaii", "ExampleCity": "Honolu"},
    {"ExampleState": "California", "ExampleCity": "Los Angeles"}   
]
prompt_examples

[{'ExampleState': 'Hawaii', 'ExampleCity': 'Honolu'},
 {'ExampleState': 'California', 'ExampleCity': 'Los Angeles'}]

In [25]:
example_prompt_str ="State: {ExampleState}\nCity: {ExampleCity}"
print(example_prompt_str)

State: {ExampleState}
City: {ExampleCity}


In [26]:
example_prompt = PromptTemplate(input_variables=["ExampleState", "ExampleCity"], template = example_prompt_str)

example_prompt


PromptTemplate(input_variables=['ExampleState', 'ExampleCity'], output_parser=None, partial_variables={}, template='State: {ExampleState}\nCity: {ExampleCity}', template_format='f-string', validate_template=True)

In [27]:
print(example_prompt.format(**prompt_examples[0]))

State: Hawaii
City: Honolu


In [28]:
print(example_prompt.format(**prompt_examples[1]))

State: California
City: Los Angeles


In [29]:
from langchain.prompts.few_shot import FewShotPromptTemplate

execute_fewshot_prompt = FewShotPromptTemplate(
    prefix = prompt_prefix,
    input_variables=["state"],
    examples= prompt_examples,
    example_prompt = example_prompt,
    example_separator="\n\n",
    suffix = "State: {state}"
)

In [31]:
data = {"state": "Georgia"}
print(execute_fewshot_prompt.format(**data))

What is the most populated city in the state provided below. 
Provide city name and no additional information. 

State: Hawaii
City: Honolu

State: California
City: Los Angeles

State: Georgia


In [32]:
chain = execute_fewshot_prompt | model
chain.invoke(data)

AIMessage(content='City: Atlanta', additional_kwargs={}, example=False)

In [33]:
example_prompt_str_json = """ State: {ExampleState}\n  {open_curly} "City": "{ExampleCity}" {close_curly} """
print(example_prompt_str_json)

 State: {ExampleState}
  {open_curly} "City": "{ExampleCity}" {close_curly} 


In [34]:
example_prompt = PromptTemplate(
    input_variables=["ExampleState", "ExampleCity"],  
    partial_variables={"open_curly": "{{", "close_curly": "}}"},
    template = example_prompt_str_json)
example_prompt


PromptTemplate(input_variables=['ExampleState', 'ExampleCity'], output_parser=None, partial_variables={'open_curly': '{{', 'close_curly': '}}'}, template=' State: {ExampleState}\n  {open_curly} "City": "{ExampleCity}" {close_curly} ', template_format='f-string', validate_template=True)

In [35]:
prompt_examples[0]

{'ExampleState': 'Hawaii', 'ExampleCity': 'Honolu'}

In [36]:
print(example_prompt.format(**prompt_examples[1]))

 State: California
  {{ "City": "Los Angeles" }} 


In [37]:
example_prompt

PromptTemplate(input_variables=['ExampleState', 'ExampleCity'], output_parser=None, partial_variables={'open_curly': '{{', 'close_curly': '}}'}, template=' State: {ExampleState}\n  {open_curly} "City": "{ExampleCity}" {close_curly} ', template_format='f-string', validate_template=True)

In [38]:
from langchain.prompts.few_shot import FewShotPromptTemplate

execute_fewshot_prompt = FewShotPromptTemplate(
    prefix = prompt_prefix,
    input_variables=["state"], 

    examples= prompt_examples,
    example_prompt = example_prompt,
    example_separator="\n\n",
    suffix = "State: {state}"
)



In [39]:
data = {"state": "Georgia"}
print(execute_fewshot_prompt.format(**data))

What is the most populated city in the state provided below. 
Provide city name and no additional information. 

 State: Hawaii
  { "City": "Honolu" } 

 State: California
  { "City": "Los Angeles" } 

State: Georgia


In [40]:
chain = execute_fewshot_prompt | model
response = chain.invoke(data)
response

AIMessage(content='{ "City": "Atlanta" }', additional_kwargs={}, example=False)

In [41]:
response.content

'{ "City": "Atlanta" }'

In [42]:
data = json.loads(response.content)
data

{'City': 'Atlanta'}

In [43]:
data['City']

'Atlanta'

In [44]:
from pydantic import BaseModel, Field


In [45]:
class CityParser(BaseModel):
    City: str = Field(..., description="The name of the most populous city") 

In [46]:
from langchain.output_parsers import PydanticOutputParser
cityParser = PydanticOutputParser(pydantic_object=CityParser)


In [47]:
cityParser.parse("""{"City": "Atlanta"}""")



CityParser(City='Atlanta')

In [50]:
output = cityParser.parse("""{"City": "Atlanta"}""")
output.City


'Atlanta'

In [56]:
print(execute_fewshot_prompt.format(**data))

What is the most populated city in the state provided below. 
Provide city name and no additional information. 

 State: Hawaii
  { "City": "Honolu" } 

 State: California
  { "City": "Los Angeles" } 

State: Georgia


In [57]:
data = {"state": "Georgia"}
chain = execute_fewshot_prompt | model | cityParser
reponse = chain.invoke(data)
reponse

CityParser(City='Atlanta')

In [58]:
reponse.City


'Atlanta'

In [59]:
data = {"state": "Georgia"}
print(execute_fewshot_prompt.format(**data))

What is the most populated city in the state provided below. 
Provide city name and no additional information. 

 State: Hawaii
  { "City": "Honolu" } 

 State: California
  { "City": "Los Angeles" } 

State: Georgia


In [60]:
print(prompt_prefix)

What is the most populated city in the state provided below. 
Provide city name and no additional information. 


In [61]:
prompt_prefix = """What is the most populated city in the state provided below. 
Provide city name and no additional information. 

{format_instructions}

"""

In [63]:
print(prompt_prefix)

What is the most populated city in the state provided below. 
Provide city name and no additional information. 

{format_instructions}




In [65]:
print(cityParser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"City": {"description": "The name of the most populous city", "title": "City", "type": "string"}}, "required": ["City"]}
```


In [66]:
execute_fewshot_prompt = FewShotPromptTemplate(
    prefix = prompt_prefix,
    input_variables=["state"], 
    partial_variables={"format_instructions": cityParser.get_format_instructions()},
    examples= prompt_examples,
    example_prompt = example_prompt,
    example_separator="\n\n",
    suffix = "State: {state}\n"
)
data = {"state": "Georgia"}
print(execute_fewshot_prompt.format(**data))

What is the most populated city in the state provided below. 
Provide city name and no additional information. 

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"City": {"description": "The name of the most populous city", "title": "City", "type": "string"}}, "required": ["City"]}
```



 State: Hawaii
  { "City": "Honolu" } 

 State: California
  { "City": "Los Angeles" } 

State: Georgia



In [67]:
data = {"state": "Georgia"}
chain = execute_fewshot_prompt | model | cityParser
reponse = chain.invoke(data)
reponse

CityParser(City='Atlanta')

In [68]:
pip install huggingface_hub

[33mDEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [69]:
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

········


In [70]:
from langchain.llms import HuggingFaceHub
repo_id_flan = "google/flan-t5-xxl" 


llm_google_flan = HuggingFaceHub(
    repo_id= repo_id_flan, model_kwargs={"temperature": 1, "max_length": 64},
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN
)

In [71]:
data

{'state': 'Georgia'}

In [72]:
print(execute_fewshot_prompt.format(**data))

What is the most populated city in the state provided below. 
Provide city name and no additional information. 

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"City": {"description": "The name of the most populous city", "title": "City", "type": "string"}}, "required": ["City"]}
```



 State: Hawaii
  { "City": "Honolu" } 

 State: California
  { "City": "Los Angeles" } 

State: Georgia



In [73]:
chain = execute_fewshot_prompt | llm_google_flan 
reponse = chain.invoke(data)


In [74]:
reponse

'"City": "Atlanta"'

In [76]:
from langchain.llms import HuggingFaceHub
# repo_id_llama_2 = "meta-llama/Llama-2-13b-chat-hf"
repo_id_mistral = "mistralai/Mistral-7B-Instruct-v0.1" 


llm_mistral = HuggingFaceHub(
    repo_id= repo_id_mistral, model_kwargs={"temperature": 1, "max_length": 64},
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN
)

chain = execute_fewshot_prompt | llm_mistral

reponse = chain.invoke(data)

reponse

'{ "City": "Atlanta" } \n\nState: Texas\n{ "City'

In [78]:
chain = execute_fewshot_prompt | llm_mistral.bind(stop="\n")

reponse = chain.invoke(data)

reponse

'{ "City": "Atlanta" } '

In [80]:
chain = execute_fewshot_prompt | llm_mistral.bind(stop="\n") | cityParser

reponse = chain.invoke(data)
reponse

CityParser(City='Atlanta')

In [87]:
# from langchain.llms import Ollama
# from langchain.callbacks.manager import CallbackManager
# from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# ollama_llama_llm = Ollama(
#     model="llama2", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),    
# )

ValidationError: 1 validation error for Ollama
model_kwargs
  extra fields not permitted (type=value_error.extra)

In [85]:
data

{'state': 'Georgia'}

In [86]:
chain = execute_fewshot_prompt | ollama_llama_llm

reponse = chain.invoke(data)
reponse

Sure, I can help you with that! The most populated city in the state provided is:

{ "City": "Los Angeles" }

'Sure, I can help you with that! The most populated city in the state provided is:\n\n{ "City": "Los Angeles" }'