# Module 01: From Zero to Hero

In [3]:
from constants import openai_key
from constants import tokens_key

The `temperature` parameter in OpenAI models manages the randomness of the output. 

* When set to 0, the output is mostly predetermined and suitable for tasks requiring stability and the most probable result. 
* At a setting of 1.0, the output can be inconsistent and interesting but isn't generally advised for most tasks. 
* For creative tasks, a temperature between 0.70 and 0.90 offers a balance of reliability and creativity. 

The best setting should be determined by experimenting with different values for each specific use case. The code initializes the GPT-3 model’s Davinci variant.

In [7]:
from langchain.llms import OpenAI


In [8]:
# Before executing the following code, make sure to have
# your OpenAI key saved in the “OPENAI_API_KEY” environment variable.
llm = OpenAI(model="text-davinci-003", temperature=0.9)

Now we can call it on some input!

In [9]:
text = "Suggest a personalized workout routine for someone looking to improve cardiovascular endurance and prefers outdoor activities."
print(llm(text))



1. Monday - 45 minutes of running or jogging outdoors.

2. Tuesday - 30 minutes of biking outdoors.

3. Wednesday - 30 minutes of swimming outdoors.

4. Thursday - 30 minutes of brisk walking outdoors.

5. Friday - 20 minutes of outdoor hill sprints.

6. Saturday - 45 minutes of outdoor HIIT circuit.

7. Sunday - 40 minutes of outdoor circuit training.


## The Chains

In LangChain, a chain is an end-to-end wrapper around multiple individual components, providing a way to accomplish a common use case by combining these components in a specific sequence. The most commonly used type of chain is the `LLMChain`, which consists of a `PromptTemplate`, a model (either an LLM or a ChatModel), and an optional output parser.

The `LLMChain` works as follows:

1. Takes (multiple) input variables.
2. Uses the PromptTemplate to format the input variables into a prompt.
3. Passes the formatted prompt to the model (LLM or ChatModel).
4. If an output parser is provided, it uses the OutputParser to parse the output of the LLM into a final format.

In the next example, demonstrated how to create a chain that generates a possible name for a company that produces eco-friendly water bottles. By using LangChain's `LLMChain`, `PromptTemplate`, and `OpenAIclasses`, we can easily define our prompt, set the input variables, and generate creative outputs. 

In [4]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

llm = OpenAI(model="text-davinci-003", temperature=0.9)

# Define a prompt template with an input variable.
prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}?",
)

# Create an instance of the LLMChain.
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain by providing the value for the input variable.
print(chain.run("eco-friendly water bottles"))




EcoPure Water Bottles.


This example showcases the flexibility and ease of using LangChain to create custom chains for various language generation tasks.

## The Memory

In LangChain, Memory refers to the mechanism that stores and manages the conversation history between a user and the AI. It helps maintain context and coherency throughout the interaction, enabling the AI to generate more relevant and accurate responses. Memory, such as `ConversationBufferMemory`, acts as a wrapper around `ChatMessageHistory`, extracting the messages and providing them to the chain for better context-aware generation.

In [5]:
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

llm = OpenAI(model="text-davinci-003", temperature=0)

# Instantiate a ConversationChain with an OpenAI model and a ConversationBufferMemory.
conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)

# Start the conversation by providing an initial input.
conversation.predict(input="Tell me about yourself.")

# Continue the conversation by providing additional inputs.
conversation.predict(input="What can you do?")
conversation.predict(input="How can you help me with data analysis?")

# Print the conversation to display the complete exchange.
print(conversation)




[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Tell me about yourself.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Tell me about yourself.
AI:  Hi there! I'm an AI created to help people with their daily tasks. I'm programmed to understand natural language and respond to questions and commands. I'm also able to learn from my interactions with people, so I'm constantly growing and impr

In this output, we can see the memory being used by observing the "Current conversation" section. After each input from the user, the conversation is updated with both the user's input and the AI's response. This way, the memory maintains a record of the entire conversation. When the AI generates its next response, it will use this conversation history as context, making its responses more coherent and relevant.


'''conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)'''


1. `ConversationChain` is a class from the `langchain.chains` module. It is designed to facilitate conversational interactions with a language model.
2. The `ConversationChain` constructor takes several parameters:
    * *llm*: This parameter expects an instance of the OpenAI language model (OpenAI object). It specifies the language model to be used for generating responses in the conversation.
    * *verbose*: This parameter is set to True, enabling verbose mode. In verbose mode, the conversation chain prints the input and output of each interaction, making it easier to track the conversation flow.
    * *memory*: This parameter expects an instance of a memory class that implements a conversation memory interface. In the code, ConversationBufferMemory is used, which is a built-in memory class provided by the LangChain library. It stores the conversation history, allowing the chain to maintain context and generate more coherent responses based on the previous interactions.
3. The `ConversationChain` object is instantiated with the provided parameters and assigned to the variable conversation. This creates a new conversation chain with the specified language model, verbosity, and conversation memory.
4. The conversation object can be used to interact with the language model in a conversational manner using the predict method. The conversation history is automatically managed by the conversation chain, allowing for back-and-forth exchanges.

By instantiating a `ConversationChain` and configuring it with an OpenAI model and a conversation memory, you can engage in dynamic and interactive conversations with the language model, maintaining context and generating more coherent responses based on the ongoing conversation.

## Deep Lake VectorStore

Deep Lake provides storage for embeddings and their corresponding metadata in the context of LLM apps. It enables hybrid searches on these embeddings and their attributes for efficient data retrieval. It also integrates with LangChain, facilitating the development and deployment of applications.

Deep Lake provides several advantages over the typical vector store:

* It’s **multimodal**, which means that it can be used to store items of diverse modalities, such as texts, images, audio, and video, along with their vector representations. 
* It’s **serverless**, which means that we can create and manage cloud datasets without creating and managing a database instance. This aspect gives a great speedup to new projects.
* Last, it’s possible to easily create a **data loader** out of the data loaded into a Deep Lake dataset. It is convenient for fine-tuning machine learning models using common frameworks like PyTorch and TensorFlow.

In [8]:
import os 
os.environ["ACTIVELOOP_TOKEN"] = tokens_key
os.environ["OPENAI_API_KEY"] = openai_key

Let’s install the `deeplake` library.

In [None]:
# !pip install deeplake

In [7]:
from langchain.embeddings.openai import OpenAIEmbeddings # used for obtaining text embeddings from the OpenAI model.
from langchain.vectorstores import DeepLake
from langchain.text_splitter import RecursiveCharacterTextSplitter # used to split long texts into smaller chunks for processing.
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA # used to create a retrieval-based question-answering chain.

# Before executing the following code, make sure to have your
# Activeloop key saved in the “ACTIVELOOP_TOKEN” environment variable.

# instantiate the LLM and embeddings models
llm = OpenAI(model="text-davinci-003", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

# create our documents
texts = [
    "Napoleon Bonaparte was born in 15 August 1769",
    "Louis XIV was born in 5 September 1638"
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.create_documents(texts)

# create Deep Lake dataset
# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = "komaldiwe" 
my_activeloop_dataset_name = "langchain_course_from_zero_to_hero"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# add documents to our Deep Lake dataset
db.add_documents(docs)

Deep Lake Dataset in hub://komaldiwe/langchain_course_from_zero_to_hero already exists, loading from the storage


/

Dataset(path='hub://komaldiwe/langchain_course_from_zero_to_hero', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
 embedding  embedding  (2, 1536)  float32   None   
    id        text      (2, 1)      str     None   
 metadata     json      (2, 1)      str     None   
   text       text      (2, 1)      str     None   


 

['e7269bf8-141a-11ee-b082-dc215c0499d1',
 'e7269bf9-141a-11ee-b8f9-dc215c0499d1']

Now, let's create a `RetrievalQA` chain:

In [9]:
retrieval_qa = RetrievalQA.from_chain_type(
	llm=llm,
	chain_type="stuff",
	retriever=db.as_retriever()
)

Next, let's create an agent that uses the `RetrievalQA` chain as a tool:

In [10]:
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

tools = [
    Tool(
        name="Retrieval QA System",
        func=retrieval_qa.run,
        description="Useful for answering questions."
    ),
]

agent = initialize_agent(
	tools,
	llm,
	agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
	verbose=True
)

Finally, we can use the agent to ask a question:

In [11]:
response = agent.run("When was Napoleone born?")
print(response)



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out when Napoleone was born.
Action: Retrieval QA System
Action Input: When was Napoleone born?[0m
Observation: [36;1m[1;3m Napoleon Bonaparte was born on 15 August 1769.[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: Napoleon Bonaparte was born on 15 August 1769.[0m

[1m> Finished chain.[0m
Napoleon Bonaparte was born on 15 August 1769.


 Here, the agent used the “Retrieval QA System” tool with the query “When was Napoleone born?” which is then run on our new Deep Lake dataset, returning the most similar document (i.e., the document containing the date of birth of Napoleon). This document is eventually used to generate the final output.


This example demonstrates how to use Deep Lake as a vector database and create an agent with a `RetrievalQA` chain as a tool to answer questions based on the given document.

Let’s add an example of reloading an existing vector store and adding more data.

We first reload an existing vector store from Deep Lake that's located at a specified dataset path. Then, we load new textual data and split it into manageable chunks. Finally, we add these chunks to the existing dataset, creating and storing corresponding embeddings for each added text segment

In [12]:
# load the existing Deep Lake dataset and specify the embedding function
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# create new documents
texts = [
    "Lady Gaga was born in 28 March 1986",
    "Michael Jeffrey Jordan was born in 17 February 1963"
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.create_documents(texts)

# add documents to our Deep Lake dataset
db.add_documents(docs)

Deep Lake Dataset in hub://komaldiwe/langchain_course_from_zero_to_hero already exists, loading from the storage


 

Dataset(path='hub://komaldiwe/langchain_course_from_zero_to_hero', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
 embedding  embedding  (4, 1536)  float32   None   
    id        text      (4, 1)      str     None   
 metadata     json      (4, 1)      str     None   
   text       text      (4, 1)      str     None   


['108d65b0-141e-11ee-8297-dc215c0499d1',
 '108d65b1-141e-11ee-9092-dc215c0499d1']

We then recreate our previous agent and ask a question that can be answered only by the last documents added.

In [13]:
# instantiate the wrapper class for GPT3
llm = OpenAI(model="text-davinci-003", temperature=0)

# create a retriever from the db
retrieval_qa = RetrievalQA.from_chain_type(
	llm=llm, chain_type="stuff", retriever=db.as_retriever()
)

# instantiate a tool that uses the retriever
tools = [
    Tool(
        name="Retrieval QA System",
        func=retrieval_qa.run,
        description="Useful for answering questions."
    ),
]

# create an agent that uses the tool
agent = initialize_agent(
	tools,
	llm,
	agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
	verbose=True
)

Let’s now test our agent with a new question.

In [14]:
response = agent.run("When was Michael Jordan born?")
print(response)



[1m> Entering new  chain...[0m


AuthenticationError: Incorrect API key provided: sk-FZoOu***************************************yuRF. You can find your API key at https://platform.openai.com/account/api-keys.