# LangChain Playground ⛓️🦜
The following are based on LangChain's Documentation

In [74]:
import os
from dotenv import dotenv_values

In [75]:
os.environ['OPENAI_API_KEY'] = dotenv_values('../.env')['OPENAI_API_KEY']
os.environ['SERPAPI_KEY'] = dotenv_values('../.env')['SERPAPI_KEY']

## LangChain Components
### Schema
The basic data types and schemas that are used throughout the codebase.

#### Text
Strings are used to interact with language models.

In [3]:
# sample text
my_text = "What time is it?"

#### Chat Messages
Some models uses a chat interface. Similar to text but with specified type(System, Human, AI)
 - **System** - A helpful background context that tell th AI what to do
 - **Human** - Represents the message coming from a human
 - **AI** - Message that shows what the AI responded with

In [4]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=0.7)

In [5]:
chat(
  [
    SystemMessage(content="You are a nice AI that helps a user figure out the wine that matches their food."),
    HumanMessage(content="I'm eating steamed fish, what should I drink?")
  ]
)

AIMessage(content="For steamed fish, you'll want a wine that complements the delicate flavors of the dish without overpowering it. A few options that pair well with steamed fish are:\n\n1. Sauvignon Blanc: This white wine has bright acidity and herbal notes that pair nicely with the lightness of the fish.\n\n2. Chardonnay: Look for an unoaked or lightly oaked Chardonnay, as it will enhance the flavors of the fish without overwhelming it.\n\n3. Pinot Grigio: A crisp and refreshing white wine with citrus notes that can complement the subtle flavors of the fish.\n\n4. Riesling: If you prefer a slightly sweeter wine, a semi-dry or off-dry Riesling can be an excellent choice, as its acidity can balance the flavors of the dish.\n\nRemember, personal taste preferences vary, so feel free to try different options to find the one that suits your palate best.", additional_kwargs={}, example=False)

You can also pass more chat history w/ responses from the AI

In [6]:
chat(
  [
    SystemMessage(content="You are a nice AI that helps a user figure out where to travel and what to do there."),
    HumanMessage(content="I like anime where should I go?"),
    AIMessage(content="You should go to Akihabara, Japan"),
    HumanMessage(content="What can I do when I'm there?")
  ]
)

AIMessage(content="When you're in Akihabara, you can immerse yourself in the world of anime and manga. Here are some things you can do:\n\n1. Visit Anime and Manga Stores: Explore the numerous anime and manga shops in Akihabara, such as Mandarake, Animate, and Gamers. You'll find a wide range of merchandise, including DVDs, manga, figurines, and collectibles.\n\n2. Maid Cafes: Experience the unique concept of maid cafes, where waitresses dress up as maids and provide entertainment and food. It's a popular subculture in Akihabara, and you can enjoy a fun and interactive dining experience.\n\n3. Themed Cafes: Explore the various themed cafes in the area, such as the Gundam Cafe, where you can enjoy food and drinks inspired by the popular Gundam series. There are also cafes themed around popular anime and manga series like Pokemon, Sailor Moon, and more.\n\n4. Game Centers: Have fun at the arcades and game centers in Akihabara. You can try your hand at the latest anime-themed arcade games

#### Documents
An object that holds a piece of text and metadata(more info about the text)

In [7]:
from langchain.schema import Document

In [8]:
Document(page_content="This is my document. It contains text from LangChain Documentation.",
        metadata={
          'my_document_id': 1234,
          'my_document_source': 'The LangChain Papers',
          'my_document_create_time': 1680013019
        })

Document(page_content='This is my document. It contains text from LangChain Documentation.', metadata={'my_document_id': 1234, 'my_document_source': 'The LangChain Papers', 'my_document_create_time': 1680013019})

### Models
The following are the different types of models that are used in LangChain.

#### Large Language Models
A model that does text in and text out

In [9]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-ada-001")

In [13]:
llm("What is the date today?")

'\n\nThe date today is March 1st.'

### Chat Model
A model that takes a series of messages and returns a message output

In [14]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=1)

In [15]:
chat(
    [
        SystemMessage(content="You are an unhelpful AI bot that makes a joke at whatever the user asks."),
        HumanMessage(content="I would like to get a job, how should I do it?")
    ]
)

AIMessage(content='Why did the scarecrow become a successful businessman? Because he was outstanding in his field! Good luck finding a job, though.', additional_kwargs={}, example=False)

#### Text Embedding Model
These models takes text as input and returns a list of floats that hold the semantic meaning of your text.
>_*Semantic*_ means 'relating to meaning in language or logic'

In [16]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()


In [17]:
text = "It's time for lunch"

In [19]:
text_embedding = embeddings.embed_query(text)
print(f'Your embedding length: {len(text_embedding)}')
print(f'Here is a sample: {text_embedding[:5]}...')

Your embedding length: 1536
Here is a sample: [0.011875979983118259, -0.006945648918972976, -0.00224773895683153, 0.003370858836966017, -0.008643074238262946]...


### Prompts
#### Prompt Value
Refers to the input to the model.

In [21]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003")

prompt = """
Today is Monday, tomorrow is Wednesday,

What is wrong with the statement?
"""

llm(prompt)

'\nThe statement is incorrect; tomorrow is Tuesday, not Wednesday.'

#### Prompt Template
An object that helps in creating a PromptValue. A combination of user input, non-static information and a fixed template string.
> Like f-string in python for prompts

In [24]:
from langchain.llms import OpenAI
from langchain import PromptTemplate

llm = OpenAI(model_name="text-davinci-003")

# Notice "location" is placeholder for another value later
template = """
I realy want to travel to {location}, What should I do there?

Respond in one short sentence
"""

prompt = PromptTemplate(
    input_variables=["location"],
    template=template
)

final_prompt = prompt.format(location="Japan")

print(f"Final prompt: {final_prompt}")
print("--------------")
print(f"LLM output: {llm(final_prompt)}")

Final prompt: 
I realy want to travel to Japan, What should I do there?

Respond in one short sentence

--------------
LLM output: 
Explore the stunning natural scenery and unique culture of Japan.


### Example Selectors
An easy way to select a series of examples that allow you to dynamically place in-context information into your prompt.
Often used when your task is nuanced(meticulous) or you have a large list of examples.

In [4]:
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.llms import OpenAI

llm = OpenAI(model="text-davinci-003")

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Example Input: {input}\nExample Output: {output}",
)

# Examples of locations that nouns are found
examples = [
    {"input": "pirate", "output": "ship"},
    {"input": "pilot", "output": "plane"},
    {"input": "driver", "output": "car"},
    {"input": "tree", "output": "ground"},
    {"input": "bird", "output": "nest"},
]

In [6]:
# SemanticSimilarityExampleSelector will select examples that are similar to your input by semantic
example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    #This is the embedding class used to produce embeddings which are used to measure semantic
    OpenAIEmbeddings(),
    #This is the VectorStore class that is used to store the embeddings and do a similarity check
    FAISS,
    #Number of examples to produce
    k=2
)

In [7]:
similar_prompt = FewShotPromptTemplate(
    # The object that will hep select examples
    example_selector=example_selector,
    example_prompt=example_prompt,
    # Customization that will be added to the top and bottom of your prompt
    prefix="Give the location an item is usually found in",
    suffix="Input: {noun}\nOutput:",
    input_variables=["noun"],
)

In [10]:
#Select a noun
my_noun ='flower'

print(similar_prompt.format(noun=my_noun))

Give the location an item is usually found in

Example Input: tree
Example Output: ground

Example Input: bird
Example Output: nest

Input: flower
Output:


In [11]:
llm(similar_prompt.format(noun=my_noun))

' garden'

### Output Parsers
A helpful way to format the output of a model. Usually used for structured output.

Two main concepts:
1. Format Instructions - A autogenerated prompt that tels the LLM how to format it's response based off your desired result
2. Parser - A method which will extract your model's text output into a desired structure(usually json)

In [12]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI

In [13]:
llm = OpenAI(model='text-davinci-003')

In [14]:
# How you would like your response structured
response_schemas = [
    ResponseSchema(name='bad_string', description="This is a poorly formatted user input string"),
    ResponseSchema(name='good_string', description='This is your response, a reformatted response')
]

#How would you like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [23]:
# See the prompt template your created for formatting
format_instructions = output_parser.get_format_instructions()
print(output_parser.get_format_instructions())

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```


In [24]:
template = """
You will be given a poorly formatted traing from a user.
Reformat it and make sure all the words are spelled correctly

{format_instructions}

% USER_INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt = PromptTemplate(
    input_variables=['user_input'],
    partial_variables={"format_instructions": format_instructions},
    template=template
)

promptValue = prompt.format(user_input='welcom to californya!')

print(promptValue)


You will be given a poorly formatted traing from a user.
Reformat it and make sure all the words are spelled correctly

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```

% USER_INPUT:
welcom to californya!

YOUR RESPONSE:



In [29]:
# Note there is a possiblity that the out is not properly formatted JSON
llm_output = llm(promptValue)
llm_output

'```json\n{\n\t"bad_string": "welcom to californya!",\n\t"good_string": "Welcome to California!"\n}\n```'

In [30]:
output_parser.parse(llm_output)

{'bad_string': 'welcom to californya!',
 'good_string': 'Welcome to California!'}

### Indexes - Structuring documents so LLMs can work with them
#### Document Loaders
Easy ways to import data from other sources. Shared functionality with [OpenAI Plugins](https://openai.com/blog/chatgpt-plugins) [specifically retrieval plugins](https://github.com/openai/chatgpt-retrieval-plugin)

See a [big list](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html) of document loaders here. A bunch more on Llama Index as well.


In [31]:
from langchain.document_loaders import HNLoader

In [32]:
loader = HNLoader("https://news.ycombinator.com/item?id=34422627")

In [34]:
data = loader.load()

In [35]:
print(f"Found {len(data)} comments")
print(f"Here's a sample:\n\n{''.join([x.page_content[:150] for x in data[:2]])}")

Found 76 comments
Here's a sample:

Ozzie_osman 7 months ago  
             | next [–] 

LangChain is awesome. For people not sure what it's doing, large language models (LLMs) are very Ozzie_osman 7 months ago  
             | parent | next [–] 

Also, another library to check out is GPT Index (https://github.com/jerryjliu/gpt_index)


#### Text Splitters
Often times your document is too long(like a book) for your LLM. You need to split up into chunks. Text splitters help with this.<br>
There are many ways you could split your text into chunks, experiment with [different ones](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html) to see which is best for you.

In [36]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [37]:
# This is a long document we can split up.
with open('data/worked.txt') as f:
    pg_work = f.read()

print(f'You have {len([pg_work])} document')

You have 1 document


In [38]:
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size just to show
    chunk_size = 150,
    chunk_overlap = 20,
)

texts = text_splitter.create_documents([pg_work])

In [39]:
print(f'You have {len(texts)} documents')

You have 610 documents


In [41]:
print('Preview')
print(texts[0].page_content, '\n')
print(texts[1].page_content)

Preview
February 2021Before college the two main things I worked on, outside of school,
were writing and programming. I didn't write essays. I wrote what 

beginning writers were supposed to write then, and probably still
are: short stories. My stories were awful. They had hardly any plot,


#### Retrievers
Easy way to combine documents with language models.
<br>
There are many different types of retrievers, the most widely supported is the VectorStoreRetriever

In [42]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

loader = TextLoader('./data/worked.txt')
documents = loader.load()

In [48]:
#Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=50
)

#Split your docs into texts
texts = text_splitter.split_documents(documents)

#Get embedding engine ready
embeddings = OpenAIEmbeddings()

# Embed your texts
db = FAISS.from_documents(texts, embeddings)

In [49]:
#Init your retriever
retriever = db.as_retriever()

In [50]:
retriever

VectorStoreRetriever(tags=['FAISS'], metadata=None, vectorstore=<langchain.vectorstores.faiss.FAISS object at 0x000001C506003DC0>, search_type='similarity', search_kwargs={})

In [51]:
docs = retriever.get_relevant_documents('what types of things did the author want to build?')

In [53]:
print('\n\n'.join([x.page_content[:200] for x in docs[:2]]))

standards; what was the point? No one else wanted one either, so
off they went. That was what happened to systems work.I wanted not just to build things, but to build things that would
last.In this di

much of it in grad school.Computer Science is an uneasy alliance between two halves, theory
and systems. The theory people prove things, and the systems people
build things. I wanted to build things. 


#### VectorStores
Databases to store vectors. Most popular ones are Pinecone & Weaviate. More examples on OpenAIs retriever documentation. Chroma & FAISS are easy to work with locally.
<br>
Conceptually, think of them as tables w/ a column for embeddings(vectors) and a column for metadata.


In [54]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

loader = TextLoader('./data/worked.txt')
documents = loader.load()

#Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50
)

#Split your docs
texts = text_splitter.split_documents(documents)

#Get embeddings
embeddings = OpenAIEmbeddings()

In [55]:
print(f'You have {len(texts)} documents')

You have 78 documents


In [57]:
embedding_list = embeddings.embed_documents([text.page_content for text in texts])

In [58]:
print(f'You have {len(embedding_list)} embeddings')
print(f"Here's a sample of one: {embedding_list[0][:3]}...")

You have 78 embeddings
Here's a sample of one: [-0.0010875738459471215, -0.011166318559573049, -0.012805657736331183]...


### Memory
Helping LLMs remember information.<br>
Memory is a bit of a loose term. It could be as simple as remembering information you've chatted about in the past or more complicated information retrieval.
<br>
We'll keep it towards the Chat Message use case. This would be used for chat bots.
<br>
There are many types of memory, explore [the documentation](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html) to see which one fits your use case.

### Chat Message History

In [59]:
from langchain.memory import ChatMessageHistory
from langchain.chat_models import ChatOpenAI

chat = ChatOpenAI(temperature=0)

history = ChatMessageHistory()

history.add_ai_message("hi!")

history.add_user_message("what is the capital of france?")

In [60]:
history.messages

[AIMessage(content='hi!', additional_kwargs={}, example=False),
 HumanMessage(content='what is the capital of france?', additional_kwargs={}, example=False)]

In [61]:
ai_response = chat(history.messages)
ai_response

AIMessage(content='The capital of France is Paris.', additional_kwargs={}, example=False)

In [62]:
history.add_ai_message(ai_response.content)
history.messages

[AIMessage(content='hi!', additional_kwargs={}, example=False),
 HumanMessage(content='what is the capital of france?', additional_kwargs={}, example=False),
 AIMessage(content='The capital of France is Paris.', additional_kwargs={}, example=False)]

### Chains ⛓️⛓️
Combining different LLM calls and action automatically<br>
Ex. Summary #1, Summary #2, Summary #3 > Final Summary<br>
Video ref: [📺](https://www.youtube.com/watch?v=f9_BWhCI4Zo&t=2s)<br>
There are (many applications of chains)[https://python.langchain.com/en/latest/modules/chains/how_to_guides.html] search to see which are best for your case.<br>
We'll cover two of them:

#### 1. Simple Sequential Chains
Easy chains where you can use the output of an LLM as an input into another. Good for breaking up tasks (and keeping your LLM focused)

In [68]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain

llm = OpenAI(temperature=1)

In [69]:
template = """
Your job is to come up with a classic dish from the area that the users suggest.
% USER LOCATION
{user_location}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)

#Holds the location chain
location_chain = LLMChain(llm=llm, prompt=prompt_template)

In [70]:
template = """
Given a meal, give a short and simple recipe on how to make that dish at home.
% MEAL
{user_meal}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=['user_meal'], template=template)

# Holds the meal chain
meal_chain = LLMChain(llm=llm, prompt=prompt_template)

In [71]:
overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=True)

In [67]:
review = overall_chain.run("Japan")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mA classic dish from Japan is Chicken Teriyaki - chicken marinated and cooked in soy sauce, mirin, and sugar, and often served with ginger and scallions.[0m
[33;1m[1;3m
Chicken Teriyaki

Ingredients:
- 4 boneless, skinless chicken thighs
- 4 tablespoons of soy sauce
- 2 tablespoons of mirin
- 1 tablespoon of sugar
- 1 tablespoon of grated fresh ginger
- 2 scallions

Directions:
1. In a medium bowl, combine soy sauce, mirin, sugar and ginger.
2. Add the chicken to the mixture and evenly coat. Let marinate for at least 15 minutes.
3. Heat a large skillet over medium-high heat and add the chicken.
4. Cook the chicken for 4-5 minutes per side, or until done.
5. Garnish with sliced scallions and serve hot.[0m

[1m> Finished chain.[0m


#### 2. Summarization Chain
Easily run through documents nd get a summary. Check [video](https://www.youtube.com/watch?v=f9_BWhCI4Zo) for other chain types besides map-reduce.

In [73]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader('./data/disc.txt')
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)

texts = text_splitter.split_documents(documents)

chain = load_summarize_chain(llm, chain_type='map_reduce', verbose=True)
chain.run(texts)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"January 2017Because biographies of famous scientists tend to 
edit out their mistakes, we underestimate the 
degree of risk they were willing to take.
And because anything a famous scientist did that
wasn't a mistake has probably now become the
conventional wisdom, those choices don't
seem risky either.Biographies of Newton, for example, understandably focus
more on physics than alchemy or theology.
The impression we get is that his unerring judgment
led him straight to truths no one else had noticed.
How to explain all the time he spent on alchemy
and theology?  Well, smart people are often kind of
crazy.But maybe there is a simpler explanation. Maybe"


CONCISE SUMMARY:[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"the smartness and the craziness were not as sepa

" It is commonly thought that Newton's successes in physics are attributed to his intelligence. However, historians now view the risky endeavours Newton took in alchemy and theology as major contributors to his success. His willingness to take risks on all three endeavours paid off, with physics becoming the profitable one. Ultimately, Newton's successes are due to more than just intelligence."

### Agents
Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an **unknown chain** that depends on the user's input.In these types of chains, there is a "agent" which has access to a suite of tools. <br>
Depending on the user input, the agent can then **decide which, if any, of these tools to call**.

Basically you use the LLM not just for text output, but also for decision making.

#### Agents
The language model that drives decision making.<br>
More specifically, an agent takes in an input and returns a response corresponding to an action to take along with an action input. You can see different types of agents (which are better for different use cases) [here](https://python.langchain.com/en/latest/modules/agents/agent_types.html)

#### Tools
Tools are functions that an agent calls. There are two important considerations here:

Giving the agent access to the right tools
Describing the tools in a way that is most helpful to the agent
Without both, the agent you are trying to build will not work. If you don't give the agent access to a correct set of tools, it will never be able to accomplish the objective. If you don't describe the tools properly, the agent won't know how to properly use them.

LangChain provides a wide set of tools to get started, but also makes it easy to define your own (including custom descriptions). For a full list of tools, see [here](https://python.langchain.com/docs/modules/agents/tools/)

#### Toolkit
Group of tools that your agent can select from<br>
Let's bring them all togther:

In [76]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI
import json

llm = OpenAI(temperature=0)

In [77]:
serpapi_api_key = '...'

In [78]:
toolkit = load_tools(['serpapi'], llm=llm, serpapi_api_key=serpapi_api_key)

ValidationError: 1 validation error for SerpAPIWrapper
__root__
  Could not import serpapi python package. Please install it with `pip install google-search-results`. (type=value_error)