In [1]:
!pip install -r ./requirements.txt -q
# !pip - installs the packages in the base environment
# pip - installs the packages in the virtual environment

In [2]:
!pip show langchain

Name: langchain
Version: 0.0.325
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: C:\Users\germa\anaconda3\Lib\site-packages
Requires: aiohttp, anyio, dataclasses-json, jsonpatch, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


In [3]:
! pip install langchain --upgrade -q

### Python-dotenv

In [1]:
import os
from dotenv import load_dotenv, find_dotenv
# load_dotenv(./.env)
load_dotenv(find_dotenv(), override=True)

os.environ.get("PINECONE_API_KEY")

'4bc01ab7-1af2-4cdb-8f97-4ea339680b6d'

### LLM Models (Wrappers): GPT3

In [4]:
from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003", temperature=0.7, max_tokens=512)
print(llm)

[1mOpenAI[0m
Params: {'model_name': 'text-davinci-003', 'temperature': 0.7, 'max_tokens': 512, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'request_timeout': None, 'logit_bias': {}}


In [3]:
output = llm("explain quantum mechanics in one sentence")
print(output)



Quantum mechanics is a theory that describes the behavior of matter and energy at the subatomic level.


In [4]:
print(llm.get_num_tokens("explain quantum mechanics in one sentence"))

7


In [5]:
output = llm.generate(["... is the capital of France.", "What is the formula of the area of a circle?"])
print(output.generations)

[[Generation(text='\n\nParis.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nThe formula for the area of a circle is A = πr^2, where A is the area, π is pi (3.14159…) and r is the radius of the circle.', generation_info={'finish_reason': 'stop', 'logprobs': None})]]


In [7]:
print(output.generations[1][0].text)



The formula for the area of a circle is A = πr^2, where A is the area, π is pi (3.14159…) and r is the radius of the circle.


In [8]:
len(output.generations)

2

In [9]:
output = llm.generate(["Write an original tagline for a burger restaurant"]*3)
print(output)

generations=[[Generation(text='\n\n"Taste Our Burgers - They\'ll Make Your Mouth Water!"', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\n"Taste the burger that\'s out of this world!"', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\n"Burger Bliss: Where Every Bite is a Bite of Heaven!"', generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'total_tokens': 73, 'completion_tokens': 46, 'prompt_tokens': 27}, 'model_name': 'text-davinci-003'} run=[RunInfo(run_id=UUID('912c8ce7-451c-44ee-8880-36ff96864ee9')), RunInfo(run_id=UUID('b189dc4f-9ac0-4b01-9b3a-f63ea5bc188a')), RunInfo(run_id=UUID('6497d6f1-e756-4d76-99f1-3b3729af3501'))]


In [11]:
for o in output.generations:
    print(o[0].text, end="")



"Taste Our Burgers - They'll Make Your Mouth Water!"

"Taste the burger that's out of this world!"

"Burger Bliss: Where Every Bite is a Bite of Heaven!"

### ChatModels: GPT-3.5 Turbo and GPT-4

In [5]:
from langchain.schema import(
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [7]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.5, max_tokens=1024)
messages = [
    SystemMessage(content="You are a physicist and respond only in German."),
    HumanMessage(content="explain quantum mechanics in one sentence")
]
output = chat(messages)

In [9]:
print(output.content)

Quantenmechanik beschreibt das Verhalten von Teilchen auf subatomarer Ebene, wobei ihre Eigenschaften wie Position und Impuls nur mit Wahrscheinlichkeiten angegeben werden können.


### Prompt Template

In [10]:
from langchain import PromptTemplate

In [12]:
template = '''You are an experienced virologist.
Write a few sentences about the following {virus} in {language}.'''

prompt = PromptTemplate(
    input_variables=["virus","language"],
    template=template
)
print(prompt)

input_variables=['language', 'virus'] template='You are an experienced virologist.\nWrite a few sentences about the following {virus} in {language}.'


In [13]:
from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003", temperature=0.7)
output = llm(prompt.format(virus="ebola", language="romanian"))
print(output)



Ebola este o boală virală gravă care poate provoca sângerări interne și externe și poate fi fatală. Se răspândește prin contact direct cu sânge, secreții sau alte fluide corporale sau prin contact cu obiecte infestate. Prevenirea bolii se face prin igiena personală, evitarea contactului cu persoane bolnave și prin vaccinare. Tratamentul bolii constă în tratament simptomatic și de susținere.


### Simple Chains

In [14]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains import LLMChain

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.5)
template = '''You are an experienced virologist.
Write a few sentences about the following {virus} in {language}.'''

prompt = PromptTemplate(
    input_variables=["virus","language"],
    template=template
)

chain = LLMChain(llm=llm, prompt=prompt)
output = chain.run({"virus" : "HSV", "language" : "french"})
output = chain.run("HSV")

In [15]:
print(output)

L'HSV, ou herpès simplex virus, est un virus couramment rencontré chez les humains. Il existe deux types d'HSV : le HSV-1, qui provoque généralement des lésions buccales telles que les boutons de fièvre, et le HSV-2, qui est principalement associé aux infections génitales. Ces virus se propagent par contact direct avec les lésions ou les sécrétions infectées. Bien que l'herpès soit une infection virale chronique, il peut être géré avec des médicaments antiviraux et des mesures préventives telles que l'utilisation de préservatifs.


### Sequential Chains

In [5]:
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains import LLMChain, SimpleSequentialChain

llm1 = OpenAI(model_name="text-davinci-003", temperature=0.7, max_tokens=102)
prompt1 = PromptTemplate(
    input_variables=["concept"],
    template='''You are an experienced scientist and Python programmer.
    Write a function that implements the concept of {concept}.'''
)
chain1 = LLMChain(llm=llm1, prompt=prompt1)

llm2 = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=1.2)
prompt2 = PromptTemplate(
    input_variables=["function"],
    template="Given the Python {function}, describe it as detailed as possible."
)
chain2 = LLMChain(llm=llm2, prompt=prompt2)

overall_chain = SimpleSequentialChain(chains=[chain1, chain2], verbose=True)
output = overall_chain.run("linear regression")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m

def linear_regression(x_data, y_data):
    """
    Implements the concept of linear regression.
    Takes in x and y data points as arguments and returns the slope and 
    intercept of the linear regression line.
    
    Parameters:
    x_data (list): x-coordinates
    y_data (list): y-coordinates
    
    Returns:
    slope (float): slope of the linear[0m
[33;1m[1;3mThe Python function `linear_regression` takes in two lists `x_data` and `y_data` as arguments which represent the x and y coordinates respectively. It implements the concept of linear regression to calculate the slope and intercept of the linear regression line that best fits the given data points.

The linear regression line represents the best fitting straight line that minimizes the average distance between the line and the data points. The slope of the linear regression line represents the rate of change of the dependent variable (y) with respe

### LangChain Agents

In [6]:
!pip install langchain_experimental
# This only in case there's some problem with langchain and you need to use the experimental!



In [8]:
# I used the experimental only because there was some error to be corrected with langchain!
from langchain_experimental.agents.agent_toolkits import create_python_agent
# from langchain.agents.agent_toolkits import create_python_agent
from langchain_experimental.tools.python.tool import PythonREPLTool
# from langchain.tools.python.tool import PythonREPLTool
from langchain.llms import OpenAI

In [9]:
llm = OpenAI(temperature=0)
agent_executor = create_python_agent(
    llm = llm,
    tool = PythonREPLTool(),
    verbose = True
)
agent_executor.run("Calculate the square root of the factorial of 20 \
and display it with 4 decimal points")



[1m> Entering new AgentExecutor chain...[0m


Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3m I need to calculate the factorial of 20 and then take the square root of that
Action: Python_REPL
Action Input: from math import factorial; print(round(factorial(20)**0.5, 4))[0m
Observation: [36;1m[1;3m1559776268.6285
[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 1559776268.6285[0m

[1m> Finished chain.[0m


'1559776268.6285'

### Splitting and Embedding Text using LangChain

In [21]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [22]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
with open("files/churchill_speech.txt") as f:
    churchill_speech = f.read()
    
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap = 20,
    length_function = len
)

In [23]:
chunks = text_splitter.create_documents([churchill_speech])
print(chunks[0])
print(chunks[0].page_content)
print(f"Now you have {len(chunks)}")

page_content='Winston Churchill Speech - We Shall Fight on the Beaches\nWe Shall Fight on the Beaches\nJune 4, 1940'
Winston Churchill Speech - We Shall Fight on the Beaches
We Shall Fight on the Beaches
June 4, 1940
Now you have 300


#### Embedding Cost

In [24]:
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model("text-embedding-ada-002")
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f"Total tokens: {total_tokens}")
    print(f"Embedding Cost in USD: {total_tokens / 1000 * 0.0004:.6f}")
    
print_embedding_cost(chunks)

Total tokens: 4820
Embedding Cost in USD: 0.001928


In [32]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [33]:
vector = embeddings.embed_query(chunks[0].page_content)
print(vector)

[-0.04451710436383128, -0.037782137013891874, -0.0029513307899337103, -0.007985016471202087, 0.015740432121409912, 0.022590198019201264, -0.02847053719354729, -0.009732536211038392, 0.0010858219074641557, 0.007200545757708229, 0.007908482243322272, 0.03278193680355078, 0.007379124536218109, -0.011779812371323297, 0.00632997473634152, -0.005379681569866144, 0.013240330153594179, -0.0025399623418781224, 0.013533709808262486, -0.010957075475212122, -0.008093439184596334, -0.026761284567650895, 0.02959303237275225, -0.0037979849815033004, -0.014401091515416457, -0.01849564383598627, 0.010784873928143831, -0.01869973386856899, 0.0030884538387828874, -0.014350069007270776, 0.007111256601283936, -0.008603663334730543, -0.01650576881227252, 0.005162836143077651, -0.018368088031283366, -0.023929536762549535, -0.022424374169235862, -0.008737597767858925, 0.022577442624995493, -0.0127619944889744, 0.01359748771061394, 0.004725955975287927, 0.008750353162064697, 0.002921036263033706, -0.0280113355

### Inserting the Embeddings into a Pinecone Index

In [27]:
import os
import pinecone
from langchain.vectorstores import Pinecone

pinecone.init(api_key=os.environ.get("PINECONE_API_KEY"), environment=os.environ.get("PINECONE_ENV"))

In [28]:
# Deleting all indexes
indexes = pinecone.list_indexes()
for i in indexes:
    print("Deleting all indexes ...", end="")
    pinecone.delete_index(i)
    print("Done")

Deleting all indexes ...Done


In [29]:
index_name = "churchill-speech"
if index_name not in pinecone.list_indexes():
    print(f"Creating index {index_name} ...")
    pinecone.create_index(index_name, dimension=1536, metric="cosine")
    print("Done!")

Creating index churchill-speech ...
Done!


In [34]:
vector_store = Pinecone.from_documents(chunks, embeddings, index_name=index_name)

### Asking Questions (Similarity Search)

In [35]:
query = "Where should we fight?"
result = vector_store.similarity_search(query)
print(result)

[Document(page_content='shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and'), Document(page_content='front, now on that, fighting'), Document(page_content='end, we shall fight in France, we shall fight on the seas and oceans, we shall fight with growing'), Document(page_content='Winston Churchill Speech - We Shall Fight on the Beaches\nWe Shall Fight on the Beaches\nJune 4, 1940')]


In [36]:
for r in result:
    print(r.page_content)
    print("-" * 50)

shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and
--------------------------------------------------
front, now on that, fighting
--------------------------------------------------
end, we shall fight in France, we shall fight on the seas and oceans, we shall fight with growing
--------------------------------------------------
Winston Churchill Speech - We Shall Fight on the Beaches
We Shall Fight on the Beaches
June 4, 1940
--------------------------------------------------


In [39]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=1)

retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k" : 3})

chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

In [42]:
#query = "Where should we fight?"
#query = "Who was the king of Belgium at that time?"
query = "What about the French Armies?"
answer = chain.run(query)
print(answer)

The French Armies were involved in the fighting against the British troops. They were holding the area that the British were trying to capture. The French Army had plans to launch a strong advance across the Somme to secure the territory.
