* [1. The Power and Limitations of Large Language Models](#power_and_lim)
* [2. The LLMs](#llms)
* [3. The Chains](#chains)
* [4. The Memory](#memory)
* [5. Deep Lake VectorStore](#deeplake)
    * [5.1. Using Deep Lake as a vector database](#db)
    * [5.2. Creating an agent with a RetrievalQA chain as a tool to answer questions based on the given document](#RetrievalQA)
    * [5.3. Reloading an existing vector store and adding more data](#adding_data)
* [6. Agents in LangChain](#agents)
* [7. Tools in LangChain](#tools)

<hr>
<a class="anchor" id="power_and_lim">
    
## 1. The Power and Limitations of Large Language Models
    
</a>

Large Language Models (LLMs) are trained to understand the distribution of words in a language, enabling them to **generate meaningful text** without memorizing specific data. However, their knowledge is limited to the training data, leading to potential fabrications when asked about information beyond their training period, known as **"hallucination."**

To address this issue, **retrievers** can be used alongside LLMs. Retrievers fetch information from trusted sources, and LLMs are prompted to rearrange the retrieved information without adding new details. LLMs like GPT-4 and Claude can handle large context window sizes, but an efficient retriever is needed to find the most relevant documents due to the cost of execution.

Efficient retrievers are built using embedding models that map texts to vectors. These vectors are then stored in specialized databases called vector stores. This is where **Deep Lake** comes in: it provides a seamless way to store embeddings and their corresponding metadata. Deep Lake also enables hybrid searches on these embeddings and their attributes for efficient data retrieval. 

In [None]:
#!pip install langchain
#!pip install deeplake
#!pip install openai
#!pip install tiktoken

In [1]:
import os
from keys import ACTIVELOOP_TOKEN, OPENAI_API_KEY

os.environ["ACTIVELOOP_TOKEN"] = ACTIVELOOP_TOKEN
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

<hr>
<a class="anchor" id="llms">
    
## 2. The LLMs
    
</a>

In [2]:
# Import the LLM wrapper
from langchain.llms import OpenAI

The temperature parameter in OpenAI models manages the randomness of the output.
- 0: output is mostly predetermined and result is stable;
- 1: output can be inconsistent and interesting, but isn't generally advised for most tasks;
- between 0.70 and 0.90: offers a balance of reliability and creativity for creative tasks.

In [3]:
# Initialize the GPT-3 model’s Davinci variant
llm = OpenAI(model="text-davinci-003", temperature=0.9)

In [4]:
# Example of calling the initialized LLM
text = "Suggest a personalized workout routine for someone looking to improve cardiovascular endurance and prefers outdoor activities."

print(llm(text))



1. Jogging: Aim to jog for at least 30 minutes 3-4 times per week. Start at a slow pace and gradually increase your running speed and intensity as your endurance improves.

2. Trail Running: Trail running improves coordination and balance. Aim to run 2-3 times per week on off-road trails. Start at a slow pace and gradually increase your speed and distance.

3. Hill Sprinting: Hill sprinting is a great way to improve cardiovascular endurance and build leg strength. Sprint up a steep hill for 30 seconds and then walk or jog back down. Aim to do this 3-4 times per week.

4. Swimming: Swimming is an excellent way to improve cardiovascular endurance and build upper body strength. Aim to swim for 30 minutes 3-4 times per week.

5. Cycling: Cycling is a great way to build endurance. Aim to bike for 30 minutes 3-4 times a week. Start at an easy pace and gradually increase your speed and intensity as your endurance improves.

6. Rowing: Rowing is an excellent cardiovascular exercise. Aim to r

<hr>
<a class="anchor" id="chains">
    
## 3. The Chains
    
</a>

A chain is a combination of multiple individual components in a specific sequence. The most commonly used type of chain is the **LLMChain**, which consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser.

The LLMChain works as follows:

1. Takes input variable(s);
2. Format the input variables into a prompt by using the PromptTemplate;
3. Passes the formatted prompt to the model (LLM or ChatModel);
4. Parses the output of the LLM into a final format, if an OutputParser is provided.

In [6]:
# Creating a chain that generates a possible name for a company that produces a given product

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

llm = OpenAI(model="text-davinci-003", temperature=0.9)

prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}?",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [7]:
# Run the chain only specifying the input variable.
print(chain.run("eco-friendly water bottles"))



EcoBottles.


In [8]:
print(chain.run("personalized tea cups"))



Brewed Moments.


<hr>
<a class="anchor" id="memory">
    
## 4. The Memory
    
</a>

In LangChain, Memory refers to the mechanism that stores and manages the conversation history between a user and the AI. It helps maintain context and coherency throughout the interaction, enabling the AI to generate more relevant and accurate responses.

In [9]:
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory # a wrapper around ChatMessageHistory

llm = OpenAI(model="text-davinci-003", temperature=0)
conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)

# Start the conversation
conversation.predict(input="Tell me about yourself.")

# Continue the conversation
conversation.predict(input="What can you do?")
conversation.predict(input="How can you help me with data analysis?")



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Tell me about yourself.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Tell me about yourself.
AI:  Hi there! I'm an AI created to help people with their daily tasks. I'm programmed to understand natural language and respond to questions and commands. I'm also able to learn from my interactions with people, so I'm constantly growing and impr

" I'm not familiar with data analysis, but I'm sure I can help you find the information you need. I can search the web for articles and resources related to data analysis, and I can also provide you with links to helpful websites."

In [12]:
# To display the conversation:
# print(conversation)            ### Careful: it displays the openai_api_key!

<hr>
<a class="anchor" id="deeplake">
    
## 5. Deep Lake VectorStore
    
</a>

Deep Lake provides storage for embeddings and their corresponding metadata in the context of LLM apps. It enables hybrid searches on these embeddings and their attributes for efficient data retrieval. It also integrates with LangChain.

<hr>
<a class="anchor" id="db">
    
### 5.1. Using Deep Lake as a vector database
    
</a>

In [4]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

# Instantiate the LLM and embeddings models
llm = OpenAI(model="text-davinci-003", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

# Create documents
texts = [
    "Napoleon Bonaparte was born in 15 August 1769",
    "Louis XIV was born in 5 September 1638"
]

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.create_documents(texts)

In [5]:
# Create Deep Lake dataset
my_activeloop_org_id = "iryna" 
my_activeloop_dataset_name = "langchain_course_from_zero_to_hero"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

Deep Lake Dataset in hub://iryna/langchain_course_from_zero_to_hero already exists, loading from the storage


In [17]:
# Add documents to the Deep Lake dataset
db.add_documents(docs)

-

Dataset(path='hub://iryna/langchain_course_from_zero_to_hero', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
 embedding  embedding  (2, 1536)  float32   None   
    id        text      (2, 1)      str     None   
 metadata     json      (2, 1)      str     None   
   text       text      (2, 1)      str     None   


 

['1a4a8ff6-1614-11ee-bc41-12ee7aa5dbdc',
 '1a4a9190-1614-11ee-bc41-12ee7aa5dbdc']

<hr>
<a class="anchor" id="RetrievalQA">
    
### 5.2. Creating an agent with a RetrievalQA chain as a tool to answer questions based on the given document
    
</a>

In [6]:
# Creating a RetrievalQA chain
retrieval_qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever()
)

In [7]:
# Creating an agent that uses the RetrievalQA chain as a tool
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

tools = [
    Tool(
        name="Retrieval QA System",
        func=retrieval_qa.run,
        description="Useful for answering questions."
    ),
]

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

In [8]:
# Using the agent to ask a question
response = agent.run("When was Napoleone born?")
print(response)



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out when Napoleone was born.
Action: Retrieval QA System
Action Input: When was Napoleone born?[0m
Observation: [36;1m[1;3m Napoleon Bonaparte was born on 15 August 1769.[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: Napoleon Bonaparte was born on 15 August 1769.[0m

[1m> Finished chain.[0m
Napoleon Bonaparte was born on 15 August 1769.


<hr>
<a class="anchor" id="adding_data">
    
### 5.3. Reloading an existing vector store and adding more data
    
</a>

In [21]:
# Load the existing Deep Lake dataset and specify the embedding function
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# Create new documents
texts = [
    "Lady Gaga was born in 28 March 1986",
    "Michael Jeffrey Jordan was born in 17 February 1963",
    "Claude Monet was born on November 14, 1840."
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.create_documents(texts)

# Add documents to our Deep Lake dataset
db.add_documents(docs)

Deep Lake Dataset in hub://iryna/langchain_course_from_zero_to_hero already exists, loading from the storage


/

Dataset(path='hub://iryna/langchain_course_from_zero_to_hero', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
 embedding  embedding  (5, 1536)  float32   None   
    id        text      (5, 1)      str     None   
 metadata     json      (5, 1)      str     None   
   text       text      (5, 1)      str     None   


 

['9252369a-1617-11ee-bc41-12ee7aa5dbdc',
 '92523898-1617-11ee-bc41-12ee7aa5dbdc',
 '92523906-1617-11ee-bc41-12ee7aa5dbdc']

In [9]:
##########################################################
## Recreate the previous agent and to ask new questions ##
##########################################################

# Instantiate the wrapper class for GPT3
llm = OpenAI(model="text-davinci-003", temperature=0)

# Create a retriever from the db
retrieval_qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=db.as_retriever()
)

# Instantiate a tool that uses the retriever
tools = [
    Tool(
        name="Retrieval QA System",
        func=retrieval_qa.run,
        description="Useful for answering questions."
    ),
]

# Create an agent that uses the tool
agent = initialize_agent(tools,
                         llm,
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
                         verbose=True)

In [10]:
response = agent.run("When was Michael Jordan born?")
print(response)



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out when Michael Jordan was born.
Action: Retrieval QA System
Action Input: When was Michael Jordan born?[0m
Observation: [36;1m[1;3m Michael Jordan was born on 17 February 1963.[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: Michael Jordan was born on 17 February 1963.[0m

[1m> Finished chain.[0m
Michael Jordan was born on 17 February 1963.


In [11]:
response = agent.run("When is birthday of Claude Monet?")
print(response)



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out when Claude Monet's birthday is.
Action: Retrieval QA System
Action Input: When is Claude Monet's birthday?[0m
Observation: [36;1m[1;3m Claude Monet's birthday is November 14, 1840.[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: Claude Monet's birthday is November 14, 1840.[0m

[1m> Finished chain.[0m
Claude Monet's birthday is November 14, 1840.


In [13]:
# Trying to ask something outside of the database knowledge
response = agent.run("When is birthday of Iryna Savchuk?")
print(response)



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out when Iryna Savchuk's birthday is.
Action: Retrieval QA System
Action Input: When is Iryna Savchuk's birthday?[0m
Observation: [36;1m[1;3m I don't know.[0m
Thought:[32;1m[1;3m I need to find a source that has the answer.
Action: Retrieval QA System
Action Input: What is the birthday of Iryna Savchuk?[0m
Observation: [36;1m[1;3m I don't know.[0m
Thought:[32;1m[1;3m I need to find a reliable source with the answer.
Action: Retrieval QA System
Action Input: Where can I find the birthday of Iryna Savchuk?[0m
Observation: [36;1m[1;3m I don't know.[0m
Thought:[32;1m[1;3m I need to search for the answer online.
Action: Retrieval QA System
Action Input: What is the date of Iryna Savchuk's birthday?[0m
Observation: [36;1m[1;3m I don't know.[0m
Thought:[32;1m[1;3m I need to search for a reliable source with the answer.
Action: Retrieval QA System
Action Input: What is the birthdate of Iryna Savchuk?[0m
Ob

<hr>
<a class="anchor" id="agents">
    
## 6. Agents in LangChain
    
</a>

**Agents** are high-level components that use language models (LLMs) to determine which actions to take and in what order. An **action**: either using a tool and observing its output or returning it to the user. **Tools** are functions that perform specific duties, such as Google Search, database lookups, or Python REPL.

Types of agents are available in LangChain:

- The `zero-shot-react-description` agent uses the ReAct framework to decide which tool to employ based purely on the tool's description. It necessitates a description of each tool.
- The `react-docstore` agent engages with a docstore through the ReAct framework. It needs two tools: a Search tool and a Lookup tool. The Search tool finds a document, and the Lookup tool searches for a term in the most recently discovered document.
- The `self-ask-with-search` agent employs a single tool named Intermediate Answer, which is capable of looking up factual responses to queries. It is identical to the original self-ask with the search paper, where a Google search API was provided as the tool.
- The `conversational-react-description` agent is designed for conversational situations. It uses the ReAct framework to select a tool and uses memory to remember past conversation interactions.

In [14]:
#!pip install google-api-python-client

In [15]:
from keys import GOOGLE_API_KEY, GOOGLE_CSE_ID

os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
os.environ["GOOGLE_CSE_ID"] = GOOGLE_CSE_ID

In [16]:
from langchain.llms import OpenAI 

from langchain.agents import AgentType 
from langchain.agents import load_tools
from langchain.agents import initialize_agent

from langchain.agents import Tool
from langchain.utilities import GoogleSearchAPIWrapper

In [17]:
# Initialize the LLM
llm = OpenAI(model="text-davinci-003", temperature=0)  # temperature to 0 for the precise answer

In [18]:
# Defining the Google search wrapper
search = GoogleSearchAPIWrapper()

In [19]:
tools = [
    # a tool for performing Google searches
    Tool( 
        name = "google-search",
        func=search.run,
        description="useful for when you need to search google to answer questions about current events"
    )
]

In [20]:
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True,
                         max_iterations=6)

In [21]:
response = agent("What's the latest update about Ukraine counteroffensive?")
print(response['output'])



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out the latest news about Ukraine counteroffensive
Action: google-search
Action Input: "Ukraine counteroffensive"[0m
Observation: [36;1m[1;3m4 days ago ... The Ukrainian Army is encountering an array of challenges that has complicated the early stages of its counteroffensive, especially the ... Jun 15, 2023 ... The counteroffensive is complicated by Ukraine's lack of air power. Kyiv has lobbied the West for months to supply F-16 fighter jets but they ... 2 days ago ... Ukrainian officials have said the counteroffensive is going as planned, even though it's clear, through open source accounts, that Ukrainian ... 7 days ago ... In its early phases, Ukraine's counteroffensive is having less success and Russian forces are showing more competence than western ... Jun 12, 2023 ... Ukrainian President Volodymyr Zelenskyy said Saturday that counteroffensive and defensive actions are underway against Russian forces, asserting ..

In [22]:
response

{'input': "What's the latest update about Ukraine counteroffensive?",
 'output': "The latest update about Ukraine counteroffensive is that Ukrainian forces are blunted and the counteroffensive has stalled, taking little or no territory. Command and control issues have complicated the early stages of the counteroffensive, and Ukraine has been lobbying the West for months to supply F-16 fighter jets. Senior U.S. officials are convinced that President Joe Biden's global reputation hinges on the success of Ukraine's counteroffensive."}

<hr>
<a class="anchor" id="tools">
    
## 7. Tools in LangChain
    
</a>

LangChain provides a variety of tools for agents to interact with the outside world. These tools can be used to create custom agents that perform various tasks: web searching, answering questions, or running Python code.

In [23]:
from langchain.llms import OpenAI
from langchain.agents import Tool
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.agents import initialize_agent, AgentType

In [24]:
# Instantiate a  LLMChain specifically for text summarization
llm = OpenAI(model="text-davinci-003", temperature=0)

prompt = PromptTemplate(
    input_variables=["query"],
    template="Write a summary of the following text: {query}"
)

summarize_chain = LLMChain(llm=llm, prompt=prompt)

In [25]:
search = GoogleSearchAPIWrapper()

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="useful for finding information about recent events"
    ),
    Tool(
       name='Summarizer',
       func=summarize_chain.run,
       description='useful for summarizing texts'
    )
]

In [26]:
# Creating an agent that leverages two tools
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True  
)

In [27]:
# Run the agent
response = agent("What's the latest news about the Mars rover? Then please summarize the results.")
print(response['output'])



[1m> Entering new  chain...[0m
[32;1m[1;3m I should search for recent news about the Mars rover and then summarize the results.
Action: Search
Action Input: Latest news about the Mars rover[0m
Observation: [36;1m[1;3mMars 2020 Perseverance Rover · The largest and most capable rover ever sent to Mars. ... Curiosity Rover · Measures Mars' atmosphere to understand its climate ... Dec 15, 2021 ... Mars Sample Return is going to have great stuff to choose from!” Get the Latest JPL News. SUBSCRIBE TO THE NEWSLETTER. The mission has concluded that the solar-powered lander has run out of energy after more than four years on the Red Planet. Oct 19, 2022 ... NASA's Curiosity Mars rover used its Mast Camera, or Mastcam, to capture this panorama of a hill nicknamed ... Get the Latest JPL News. NASA's Mars 2020 Perseverance rover will look for signs of past microbial life, cache rock and soil samples, and prepare for future human exploration. Curiosity rover finds water-carved 'book' rock 

LangChain provides a toolkit that integrates various functions to improve the functionality of conversational agents, for instance:
- `SerpAPI`: an interface for the SerpAPI search engine, allowing the agent to perform robust online searches to pull in relevant data for a conversation or task.
- `PythonREPLTool`: tool that enables the writing and execution of Python code within an agent. This opens up a wide range of possibilities for advanced computations and interactions within the conversation.
- `custom tools`: allows adding more specialized capabilities to the LangChain conversational agent.