# LangChain Basic

In [1]:
!pip install -r ./requirements.txt -q

In [2]:
!pip show langchain

Name: langchain
Version: 0.0.335
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /Users/richard/opt/anaconda3/lib/python3.9/site-packages
Requires: aiohttp, anyio, async-timeout, dataclasses-json, jsonpatch, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: langchain-experimental


In [3]:
!pip install langchain --upgrade -q

### Python-dotenv

In [4]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)
os.environ.get('OPENAI_API_KEY')
os.environ.get('PINECONE_API_KEY')
os.environ.get('PINECONE_ENV')

'gcp-starter'

In [5]:
from langchain.llms import OpenAI
llm = OpenAI(model_name='text-davinci-003', temperature=0.7, max_tokens=512)
print(llm)

[1mOpenAI[0m
Params: {'model_name': 'text-davinci-003', 'temperature': 0.7, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'logit_bias': {}, 'max_tokens': 512}


In [6]:
output = llm('explain computer engineering in one sentence')
print(output)



Computer Engineering is the study of design, development, and maintenance of computer hardware and software systems.


In [7]:
print(llm.get_num_tokens('explain computer engineering in one sentence'))

7


In [8]:
output = llm.generate(['... is the capital of England', 'What is the largest animal in the world?'])

In [9]:
print(output.generations)

[[Generation(text='\nLondon.\n', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nThe largest animal in the world is the blue whale. It can grow up to 30 meters (100 feet) long and weigh up to 180 metric tons.', generation_info={'finish_reason': 'stop', 'logprobs': None})]]


In [10]:
print(output.generations[1][0].text)



The largest animal in the world is the blue whale. It can grow up to 30 meters (100 feet) long and weigh up to 180 metric tons.


In [11]:
len(output.generations)

2

In [12]:
output = llm.generate(['Write an original tagline for sushi restaurant'] * 3)

In [13]:
print(output)

generations=[[Generation(text='\n\n"Savour the flavour of fresh sushi!"', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\n"Experience the Taste of Japan with Our Fresh Sushi!"', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\n"Sushi So Good, You\'ll Want Seconds!"', generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'completion_tokens': 38, 'total_tokens': 62, 'prompt_tokens': 24}, 'model_name': 'text-davinci-003'} run=[RunInfo(run_id=UUID('60db426a-428a-453e-9904-ac9017a357fe')), RunInfo(run_id=UUID('173c80af-4fda-45cf-879f-1e7b5adf55eb')), RunInfo(run_id=UUID('9a5ec24e-e79d-4cf1-aeb0-bed85a4548d8'))]


In [14]:
for output in output.generations:
    print(output[0].text, end='')



"Savour the flavour of fresh sushi!"

"Experience the Taste of Japan with Our Fresh Sushi!"

"Sushi So Good, You'll Want Seconds!"

### ChatModels: GPT-3.5-Turbo and GPT-4

In [15]:
from langchain.schema import(AIMessage, HumanMessage, SystemMessage)
from langchain.chat_models import ChatOpenAI

In [16]:
chat = ChatOpenAI(model_name='gpt-4', temperature=0.5, max_tokens=1024)
messages = [
    SystemMessage(content='You are a physicist.'),
    HumanMessage(content='explain quantum mechanics in one sentence') 
]
output = chat(messages)

In [17]:
print(output.content)

Quantum mechanics is the branch of physics that deals with the smallest particles in the universe, like atoms and photons, where traditional laws of physics no longer apply and instead, phenomena such as superposition and quantum entanglement occur.


### Prompt Templates

In [18]:
from langchain import PromptTemplate

In [19]:
template = '''You are an experienced virologist.
Write a few sentences about the following {virus} in {language}.'''

prompt = PromptTemplate(
    input_variables=['virus', 'language'],
    template=template
)
print(prompt)

input_variables=['language', 'virus'] template='You are an experienced virologist.\nWrite a few sentences about the following {virus} in {language}.'


In [20]:
from langchain.llms import OpenAI
llm = OpenAI(model_name='text-davinci-003', temperature=0.7)
output = llm(prompt.format(virus='HIV', language='hindi'))
print(output)



एचआईवी एक केवल मानवों के लिए खतरनाक वायरस है। यह एक अनुभूतिवादी वायरस है जो एक बार शरीर में फैलता है तो वह बिना आधारित उपचार के सदैव शरीर में रहता है। इसके प्रभाव अनेक


### Simple Chains

In [21]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains import LLMChain

llm = ChatOpenAI(model_name='gpt-4', temperature=0.5)

template = '''You are an experienced virologist.
Write a few sentences about the following {virus} in {language}.'''

prompt = PromptTemplate(
    input_variables=['virus', 'language'],
    template=template
)

chain = LLMChain(llm=llm, prompt=prompt)

output = chain.run({'virus': 'HSV', 'language': 'french'})

In [22]:
print(output)

L'HSV, ou Herpes Simplex Virus, est un virus qui provoque des infections chez l'homme. Il existe deux types principaux : HSV-1, généralement associé aux infections orales comme les boutons de fièvre, et HSV-2, généralement associé à l'herpès génital. Ces virus sont très courants et peuvent causer des symptômes allant de légères éruptions cutanées à des complications plus graves, notamment des méningites ou des encéphalites.


### Sequential Chains

In [23]:
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains import LLMChain, SimpleSequentialChain

llm1 = OpenAI(model_name='text-davinci-003', temperature=0.7, max_tokens=1024)
prompt1 = PromptTemplate(
    input_variables=['concept'],
    template='''You are an experienced scientist and and Python programmer.
    Write a function that implements the concept of {concept}.'''
)

chain1 = LLMChain(llm=llm1, prompt=prompt1)

llm2 = ChatOpenAI(model_name='gpt-4', temperature=0.7, max_tokens=1024)
prompt2 = PromptTemplate(
    input_variables=['function'],
    template='''Given the Python function {function}, describe it as detailed as possible.'''
)

chain2 = LLMChain(llm=llm2, prompt=prompt2)

overall_chain = SimpleSequentialChain(chains=[chain1, chain2], verbose=True)
output = overall_chain.run('decision tree')



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m

def decision_tree(data, features, target):
    '''
    Function to build a decision tree for a given dataset.

    Parameters:
    data: The dataset (Pandas DataFrame) to be used
    features: The list of features (as column names) to be used for the decision tree
    target: The target variable (as column name) to be used for the decision tree

    Returns:
    tree: The decision tree (as a sklearn DecisionTreeClassifier object)
    '''

    # Store the features and target variables
    X = data[features]
    y = data[target]
    
    # Create the decision tree
    tree = DecisionTreeClassifier()
    tree.fit(X, y)
    
    return tree[0m
[33;1m[1;3mThe provided Python function, `decision_tree()`, is intended to build a decision tree classifier based on the provided dataset, features, and target variable.

This function has three parameters:

1. `data`: This parameter is expected to be a pandas DataFrame that co

### LangChain Agents

In [24]:
!pip install langchain_experimental -q

In [25]:
from langchain_experimental.agents.agent_toolkits import create_python_agent
from langchain_experimental.tools.python.tool import PythonREPLTool
from langchain.llms import OpenAI

In [26]:
llm = OpenAI(temperature=0)
agent_executor = create_python_agent(
    llm=llm,
    tool=PythonREPLTool(),
    verbose=True
)
agent_executor.run('Calculate the square root of the factorial of 20 and display it with 4 decimal points')



[1m> Entering new AgentExecutor chain...[0m


Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3m I need to calculate the factorial of 20 and then take the square root of that
Action: Python_REPL
Action Input: from math import factorial; print(round(factorial(20)**0.5, 4))[0m
Observation: [36;1m[1;3m1559776268.6285
[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 1559776268.6285[0m

[1m> Finished chain.[0m


'1559776268.6285'

In [27]:
agent_executor.run('what is the answer to 5.1 ** 7.3?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to use the Python REPL to calculate this
Action: Python_REPL
Action Input: print(5.1 ** 7.3)[0m
Observation: [36;1m[1;3m146306.05007233328
[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 146306.05007233328[0m

[1m> Finished chain.[0m


'146306.05007233328'

### Splitting and Embedding Text Using LangChain

In [28]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
with open('./speech.txt') as f:
    churchill_speech = f.read()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
    length_function=len
)

In [29]:
chunks = text_splitter.create_documents([churchill_speech])
print(chunks[0].page_content)

Winston Churchill Speech - We Shall Fight on the Beaches
We Shall Fight on the Beaches June 4, 1940


In [30]:
print(f'Now you have {len(chunks)}')

Now you have 271


### Embedding Cost

In [31]:
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('text-embedding-ada-002')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.0004:.6f}')

print_embedding_cost(chunks)

Total Tokens: 5411
Embedding Cost in USD: 0.002164


In [32]:
from langchain.embeddings import OpenAIEmbeddings
embedding = OpenAIEmbeddings()

In [33]:
vector = embedding.embed_query(chunks[0].page_content)
print(vector)

[-0.04163887412437536, -0.03931994048443049, -0.0029783007191567586, -0.009364924530082527, 0.017162656661664204, 0.022335663763372475, -0.027878169376654358, -0.009428631651727594, 0.0021564809454112355, 0.004338444739480498, 0.0069440594955055635, 0.0339685571674064, 0.007683060243943148, -0.009868209394094662, 0.0047366130856089225, -0.003908422366868246, 0.015608207550137546, -0.0035357366365672003, 0.011690228416530599, -0.00953056211503711, -0.006217800171396992, -0.027037237233584668, 0.026553064971727346, -0.006357955373354842, -0.01337846294917319, -0.01637269021591057, 0.010798331507467444, -0.020692022818897544, 0.001434999422595397, -0.013047186847941438, 0.006179575898409953, -0.005558433092185093, -0.01893370998678409, 0.004959587545705358, -0.017162656661664204, -0.022628715591617186, -0.02112523124608399, -0.01097671051675104, 0.023571577266673804, -0.013696998557398518, 0.014155687970597814, 0.005526579531362559, 0.008874381090399404, 0.0037905646574861685, -0.03001872

### Inserting the Embeddings into a Pinecone Index

In [35]:
import os
import pinecone
from langchain.vectorstores import Pinecone
pinecone.init(api_key=os.environ.get('PINECONE_API_KEY'), environment=os.environ.get('PINECONE_ENV'))

In [36]:
# deleting all indexes
indexes = pinecone.list_indexes()
for i in indexes:
    print('Deleting all indexes ... ', end='')
    pinecone.delete_index(i)
    print('Done')

Deleting all indexes ... Done


In [37]:
index_name = 'churchill-speech'
if index_name not in pinecone.list_indexes():
    print(f'Creating index {index_name} ...')
    pinecone.create_index(index_name, dimension=1536, metric='cosine')
    print('Done')

Creating index churchill-speech ...
Done


In [38]:
vector_store = Pinecone.from_documents(chunks, embedding, index_name=index_name)

### Asking Questions (Similarity Search)

In [40]:
query = 'Where should we fight?'
result = vector_store.similarity_search(query)
print(result)

[Document(page_content='on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the'), Document(page_content='Winston Churchill Speech - We Shall Fight on the Beaches\nWe Shall Fight on the Beaches June 4, 1940'), Document(page_content='fields and in the streets, we shall fight in the hills; we shall never surrender, and even if,'), Document(page_content='When we consider how much greater would be our advantage in defending the air above this Island')]


In [41]:
for r in result:
    print(r.page_content)
    print('-' * 50)

on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the
--------------------------------------------------
Winston Churchill Speech - We Shall Fight on the Beaches
We Shall Fight on the Beaches June 4, 1940
--------------------------------------------------
fields and in the streets, we shall fight in the hills; we shall never surrender, and even if,
--------------------------------------------------
When we consider how much greater would be our advantage in defending the air above this Island
--------------------------------------------------


In [42]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model='gpt-4', temperature=1)

retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={'k': 3})

chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

In [43]:
query = 'Where should we fight?'
answer = chain.run(query)
print(answer)

According to Winston Churchill's speech, "We Shall Fight on the Beaches", we should fight on the beaches, on the landing grounds, in the fields, in the streets, and in the hills.


In [45]:
query2 = 'What about the French Armies?'
answer = chain.run(query2)
print(answer)

The French Army was expected to advance across the Somme in great strength. Additionally, there were efforts from the Armies of the north to reopen their communication channels with the main French Armies. The British and French Armies also engaged in combat together.
