#### Quick run-through of Part 2 of the amazing langchain tutorial: [fundamentals-part1](https://github.com/gkamradt/langchain-tutorials/tree/main).    

A more comprehensive overview can be found in the [LangChain Technical Documentation](https://python.langchain.com/en/latest/index.html).

**NOTE**: Global (`RUN_GLOBAL`) and cell-specific (`RUN_LOCAL`) toggles mask the execution of code which makes OpenAI API calls to avoid running out of free OpenAI credits.

`DISCLAIMER`: OpenAI API was misbehaving when committing to Github. Hence, all output has been cleared.

In [None]:
# Remove before pushing to Github.
openai_api_key='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'
serpapi_api_key='69740bec8ea89a4df511a2be0b592f5fd479fa8e932ba0d2e08312e2ae1f3f79'

# LangChain Use Cases

## Summarization

### Summaries Of Short Text

In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate

llm = OpenAI(temperature=0,
                model_name='text-davinci-003',
                openai_api_key=openai_api_key)

# Create template
template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

# Create a LangChain prompt template that we can insert values into later
prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

In [None]:
confusing_text = """
For the next 130 years, debate raged.
Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.
“The problem is that when you look up close at the anatomy, it’s evocative of a lot of different things, but it’s diagnostic of nothing,” says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.
“And it’s so damn big that when whenever someone says it’s something, everyone else’s hackles get up: ‘How could you have a lichen 20 feet tall?’”
"""

Let's take a look at what prompt will be sent to the LLM

In [None]:
print ("------- Prompt Begin -------")

final_prompt = prompt.format(text=confusing_text)
print(final_prompt)

print ("------- Prompt End -------")

Finally let's pass it through the LLM

In [None]:
output = llm(final_prompt)
print (output)

This method works fine, but longer text can run into token limits. This is where LangChain's [load_summarize_chain](https://python.langchain.com/en/latest/use_cases/summarization.html) becomes very useful.

### Summaries Of Longer Text

In [None]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

Let's load up a longer document

In [None]:
with open('../data/langchain-tutorials/PaulGraham-good.txt', 'r') as file:
    text = file.read()

# Printing the first 285 characters as a preview
print (text[:285])

Then let's check how many tokens are in this document. [get_num_tokens](https://python.langchain.com/en/latest/reference/modules/llms.html#langchain.llms.OpenAI.get_num_tokens) is a nice method for this.

In [None]:
num_tokens = llm.get_num_tokens(text)

print (f"There are {num_tokens} tokens in your file")

~4k tokens are acceptable for LLMs, but let's try to chunk/ split it up.

See [text-splitters](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html) in the LangChain documentation for a comprehensive overview.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"],
                                                chunk_size=5000,
                                                chunk_overlap=350)

docs = text_splitter.create_documents([text])

print (f"You now have {len(docs)} docs intead of 1 piece of text")

Next, load a chain which will make successive calls to the LLM.    
See the [load_summarize_chain](https://python.langchain.com/en/latest/use_cases/summarization.html) documentation for a comprehensive overview.

In [None]:
# Get your chain ready to use
chain = load_summarize_chain(llm=llm, chain_type='map_reduce')

In [None]:
# Run through the 4 documents + summarize the chunks + return a summary of the summary.
output = chain.run(docs)
print (output)

## Question & Answering Using Documents As Context

*[LangChain Q&A Docs](https://python.langchain.com/en/latest/use_cases/question_answering.html)*

Two-step process:

1. Pass the LLM relevant context it needs to answer a question
2. Pass question(s) to be answered

### Simple Q&A Example

In [None]:
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [None]:
context = """
Rachel is 30 years old
Bob is 45 years old
Kevin is 65 years old
"""

question = "Who is under 40 years old?"

output = llm(context + question)

# I strip the text to remove the leading and trailing whitespace
print (output.strip())

The hard part is figuring out how to selectively send relevant data as context.    
See the [document retrieval](https://python.langchain.com/en/latest/modules/indexes/retrievers.html) documentation for an overview of LangChain retrievers.

### Using embeddings from a VectorStore to select relevant context

In [None]:
from langchain import OpenAI

# The vectorstore we'll be using
from langchain.vectorstores import FAISS

# The LangChain component we'll use to get the documents
from langchain.chains import RetrievalQA

# The easy document loader for text
from langchain.document_loaders import TextLoader

# The embedding engine that will convert our text to vectors
from langchain.embeddings.openai import OpenAIEmbeddings

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

Let's load up a longer document

In [None]:
loader = TextLoader('../data/langchain-tutorials/PaulGraham-worked.txt')
doc = loader.load()
print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")

Now let's split our long doc into smaller pieces

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

In [None]:
# Get the total number of characters so we can see the average later
num_total_characters = sum([len(x.page_content) for x in docs])

print (f"Now you have {len(docs)} documents that have an average of" 
        "{num_total_characters / len(docs):,.0f} characters (smaller pieces)")

In [None]:
# Get your embeddings engine ready
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Embed your documents and combine with the raw text in a pseudo db
docsearch = FAISS.from_documents(docs, embeddings)

Create your retrieval engine

In [None]:
qa = RetrievalQA.from_chain_type(llm=llm,
                                 chain_type="stuff",
                                 retriever=docsearch.as_retriever())

Now, ask a question. The retriever will get the similar documents and combine them with your question for the LLM to reason through.

In [None]:
query = "What does the author describe as good work?"
qa.run(query)

## Extraction
[LangChain Documentation](https://python.langchain.com/en/latest/use_cases/extraction.html)

**Use Cases:** Extract a structured row from a sentence to insert into a database, extract multiple rows from a long document to insert into a database, or extract parameters from a user query to make an API call

In [None]:
# To help construct our Chat Messages
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate

# We will be using a chat model, defaults to gpt-3.5-turbo
from langchain.chat_models import ChatOpenAI

# To parse outputs and get structured data back
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo', 
                        openai_api_key=openai_api_key)

### Vanilla Extraction

Let's start off with an easy example. Here I simply supply a prompt with instructions with the type of output I want.

In [None]:
instructions = """
You will be given a sentence with fruit names, extract those fruit names and assign an emoji to them
Return the fruit name and emojis in a python dictionary
"""

fruit_names = """
Apple, Pear, this is an kiwi
"""

In [None]:
# Make your prompt which combines the instructions w/ the fruit names
prompt = (instructions + fruit_names)

# Call the LLM
output = chat_model([HumanMessage(content=prompt)])

print (output.content)
print (type(output.content))

Let's turn this into a proper python dictionary

In [None]:
# Laughably hacky way to get the output into a dictionary.
output_dict = eval(output.content)

print (output_dict)
print (type(output_dict))

### Using LangChain's Response Schema

LangChain's response schema does two things:

1. Autogenerate the prompt with format instructions (prompt engineering taken care of by LangChain)

2. Read the output from the LLM and turn it into a Python object

Define the schema for our usecase:    
Pull out the song and artist that a user wants to play from a pseudo chat message.

In [None]:
# The schema I want out
response_schemas = [
    ResponseSchema(name="artist", description="The name of the musical artist"),
    ResponseSchema(name="song", description="The name of the song that the artist plays")
]

# The parser that will look for the LLM output in my schema and return it back to me
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [None]:
# The format instructions that LangChain makes. Let's look at them
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

In [None]:
# The prompt template that brings it all together
# Note: This is a different prompt template than before because we are using a Chat Model

prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("Given a command from the user, extract the artist and song names \n \
                                                    {format_instructions}\n{user_prompt}")  
    ],
    input_variables=["user_prompt"],
    partial_variables={"format_instructions": format_instructions}
)

In [None]:
f_prompt = prompt.format_prompt(user_prompt="I really like So Young by Portugal. The Man")
print (f_prompt.messages[0].content)

In [None]:
output_ = chat_model(f_prompt.to_messages())
output = output_parser.parse(output_.content)

print (output)
print (type(output))

## Evaluation

*[LangChain Evaluation Docs](https://python.langchain.com/en/latest/use_cases/evaluation.html)*

LangChain provides tools to check the quality of LLM outputs.

**Use Cases:** Run quality checks on your summarization or Question & Answer pipelines or check the output of you summarization pipeline

In [None]:
# Embeddings, store, and retrieval
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

# Model and doc loader
from langchain import OpenAI
from langchain.document_loaders import TextLoader

# Eval!
from langchain.evaluation.qa import QAEvalChain

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [None]:
# Our long essay from before
loader = TextLoader('../data/langchain-tutorials/PaulGraham-worked.txt')
doc = loader.load()

print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")

Split and VectorStore-ize.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

# Get the total number of characters so we can see the average later
num_total_characters = sum([len(x.page_content) for x in docs])

print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")

In [None]:
# Embeddings and docstore
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
docsearch = FAISS.from_documents(docs, embeddings)

Construct the retrieval chain.    
Note the additional `input_key` parameter (directs the chain towards the key from a user-defined dictionary which includes the prompt/ query).    
The `question` match the question in the dict.

In [None]:
chain = RetrievalQA.from_chain_type(llm=llm,
                                    chain_type="stuff",
                                    retriever=docsearch.as_retriever(),
                                    input_key="question")

First, pass in a list of QnAs to the LLM that are known to be correct.

In [None]:
question_answers = [
    {'question' : "Which company sold the microcomputer kit that his friend built himself?", 'answer' : 'Healthkit'},
    {'question' : "What was the small city he talked about in the city that is the financial capital of USA?", 'answer' : 'Yorkville, NY'}
]

I'll use `chain.apply` to run both my questions one by one separately.

One of the cool parts is that I'll get my list of question and answers dictionaries back, but there'll be another key in the dictionary `result` which will be the output from the LLM.

In [None]:
predictions = chain.apply(question_answers)
predictions

Now, compare the ground truth answers (the `answer` key) with the result from the LLM (`result` key).

In [None]:
# Start your eval chain
eval_chain = QAEvalChain.from_llm(llm)

# Have it grade itself. The code below helps the eval_chain know where the different parts are
graded_outputs = eval_chain.evaluate(question_answers,
                                     predictions,
                                     question_key="question",
                                     prediction_key="result",
                                     answer_key='answer')

In [None]:
graded_outputs

Note:
1. The answer for Q1 was "Healthkit" and the prediction was "The microcomputer kit was sold by Heathkit." The LLM knew that the answer and result were the same and gave us a "correct" label.
2. The result for Q2 is not the same as the answer and the LLM gave an "incorrect" label.

## Querying Tabular Data

See the [LangChain documentation](https://python.langchain.com/en/latest/use_cases/tabular.html) on querying tabular data for a comprehensive overview.

Use Cases: Use LLMs to query data about users, do data analysis, get real time information from your DBs

Task: Query the [San Francisco Trees](https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq) dataset stored in a SQLite DB with natural language.

In [None]:
from langchain import OpenAI, SQLDatabase, SQLDatabaseChain

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

sqlite_db_path = '../data/langchain-tutorials/San_Francisco_Trees.db'
db = SQLDatabase.from_uri(f"sqlite:///{sqlite_db_path}")

In [None]:
# Create a chain that take our LLM, and DB.
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)

In [None]:
db_chain.run("How many Species of trees are there in San Francisco?")

This is awesome! There are actually a few steps going on here.

**Steps:**
1. Find which table to use
2. Find which column to use
3. Construct the correct sql query
4. Execute that query
5. Get the result
6. Return a natural language reponse back

Let's confirm via pandas

In [None]:
import sqlite3
import pandas as pd

# Connect to the SQLite database
connection = sqlite3.connect(sqlite_db_path)

# Define your SQL query
query = "SELECT count(distinct qSpecies) FROM SFTrees"

# Read the SQL query into a Pandas DataFrame
df = pd.read_sql_query(query, connection)

# Close the connection
connection.close()

In [None]:
# Display the result in the first column first cell
print(df.iloc[0,0])

## Code Understanding

LangChain documentation for [Code Understanding](https://python.langchain.com/en/latest/use_cases/code.html).

Use Cases: Co-Pilot-esque functionality that can help answer questions from a specific library, help you generate new code

In [None]:
# Helper to read local files
import os

# Vector Support
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

# Model and chain
from langchain.chat_models import ChatOpenAI

# Text splitters
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader

llm = ChatOpenAI(model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)

In [None]:
embeddings = OpenAIEmbeddings(disallowed_special=(), openai_api_key=openai_api_key)

In [None]:
# The loop goes through each file in the source code of `thefuzz`, and
# loads it up as a doc
root_dir = '../data/langchain-tutorials/thefuzz'
docs = []

# Go through each folder
for dirpath, dirnames, filenames in os.walk(root_dir):
    
    # Go through each file
    for file in filenames:
        try: 
            # Load up the file as a doc and split
            loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')
            docs.extend(loader.load_and_split())
        except Exception as e: 
            pass

Let's look at an example of a document. It's just code!

In [None]:
print (f"You have {len(docs)} documents\n")
print ("------ Start Document ------")
print (docs[0].page_content[:300])

Embed and store them in a docstore. This will make an API call to OpenAI

In [None]:
docsearch = FAISS.from_documents(docs, embeddings)

In [None]:
# Get our retriever ready
qa = RetrievalQA.from_chain_type(llm=llm,
                                 chain_type="stuff",
                                 retriever=docsearch.as_retriever())

In [None]:
query = "What function do I use if I want to find the most similar item in a list of items?"
output = qa.run(query)

In [None]:
print (output)

In [None]:
query = "Can you write the code to use the process.extractOne() function? Only respond with code. No other text or explanation"
output = qa.run(query)

In [None]:
print (output)

## Interacting with APIs

*[LangChain API Interaction Docs](https://python.langchain.com/en/latest/use_cases/apis.html)*


Use Cases: Understand a request from a user and carry out an action, be able to automate more real-world workflows

See [LangChain + plugins](https://python.langchain.com/en/latest/use_cases/agents/custom_agent_with_plugin_retrieval_using_plugnplai.html) documentation for a comprehensive overview of the ecosystem.

In [None]:
from langchain.chains import APIChain
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

LangChain's APIChain has the ability to read API documentation and understand which endpoint it needs to call.

In [None]:
# (Purposefully) sloppy API documentation to demonstrate how API interaction works.
api_docs = """

BASE URL: https://restcountries.com/

API Documentation:

The API endpoint /v3.1/name/{name} Used to find informatin about a country. All URL parameters are listed below:
    - name: Name of country - Ex: italy, france
    
The API endpoint /v3.1/currency/{currency} Uesd to find information about a region. All URL parameters are listed below:
    - currency: 3 letter currency. Example: USD, COP
    
Woo! This is my documentation
"""

chain_new = APIChain.from_llm_and_api_docs(llm, api_docs, verbose=True)

Make an API call that is meant for the country endpoint

In [None]:
chain_new.run('Can you tell me information about france?')

Let's try to make an API call that is meant for the currency endpoint

In [None]:
chain_new.run('Can you tell me about the currency COP?')

In both cases the APIChain read the instructions and understood which API call was required. Once the response returned, it was parsed and the question was answered.

## Chatbots

*[LangChain Chatbot Docs](https://python.langchain.com/en/latest/use_cases/chatbots.html)*

Chatbots use many of the tools discussed so far, in addition to Memory. The LangChain documentation provides a comprehensive overview of different [types of memory](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html).

Use Cases: Have a real time interaction with a user, provide an approachable UI for users to ask natural language questions

In [None]:
from langchain.llms import OpenAI
from langchain import LLMChain
from langchain.prompts.prompt import PromptTemplate

# Chat specific components
from langchain.memory import ConversationBufferMemory

The below code demonstrates how the context given to a chatbot can be customized.    
We can pass instructions on how the bot should respond, but also any additional relevant information it needs.

In [None]:
template = """
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template
)
memory = ConversationBufferMemory(memory_key="chat_history")

In [None]:
llm_chain = LLMChain(
    llm=OpenAI(openai_api_key=openai_api_key), 
    prompt=prompt, 
    verbose=True, 
    memory=memory
)

In [None]:
llm_chain.predict(human_input="Is an pear a fruit or vegetable?")

In [None]:
llm_chain.predict(human_input="What was one of the fruits I first asked you about?")

Note: The 1st interaction was put into the prompt for the 2nd interaction. This is memory at work.

There are many ways to structure a conversation. See the official [documentation](https://python.langchain.com/en/latest/use_cases/chatbots.html) for a comprehensive overview.