In [1]:
from langchain.prompts import PromptTemplate

In [2]:
template = """Question: {question}

Answer: """

prompt = PromptTemplate(
    template = template,
    input_variables= ['question']
)

# user question
question = "Which NFL team won te Super Bowl in the 2010 season?"

In [3]:
from dotenv import load_dotenv
import os

load_dotenv()

os.environ['HUGGINGFACEHUB_API_TOKEN'] = os.getenv("HF_API_KEY")

tiiuae/falcon-7b-instruct: A good instruct model that you already mentioned, which is slightly smaller and optimized for instruction-following tasks.
google/flan-t5-large: This is a smaller version of flan-t5-xl and can handle a wide range of NLP tasks effectively.
EleutherAI/gpt-neo-2.7B: A well-performing model that is smaller than GPT-3 and can still provide good results for many tasks.
bigscience/bloom-1b7: A part of the BLOOM family with a more manageable size, suitable for both English and multilingual tasks.
facebook/opt-1.3b: A strong option for text generation and general NLP tasks, smaller than some of the large-scale models.

In [4]:
from langchain import HuggingFaceHub, LLMChain

hub_llm = HuggingFaceHub(
    # repo_id = "bigscience/bloom-1b7",
    repo_id='tiiuae/falcon-7b-instruct',
    model_kwargs = {'temperature':1e-10}
)

llm_chain = LLMChain(
    prompt = prompt,
    llm = hub_llm
)

# Run the chain and get the answer
print(llm_chain.run({"question": question}))

  hub_llm = HuggingFaceHub(
  from .autonotebook import tqdm as notebook_tqdm
  llm_chain = LLMChain(
  print(llm_chain.run({"question": question}))


Question: Which NFL team won te Super Bowl in the 2010 season?

Answer: 


In [5]:
qs = [
    {'question': "Which NFL team won the Super Bowl in the 2010 season?"},
    {'question': "If I am 6 ft 4 inches, how tall am I in centimeters?"},
    {'question': "Who was the 12th person on the moon?"},
    {'question': "How many eyes does a blade of grass have?"}
]
res = llm_chain.generate(qs)
res

LLMResult(generations=[[Generation(text='Question: Which NFL team won the Super Bowl in the 2010 season?\n\nAnswer: ')], [Generation(text='Question: If I am 6 ft 4 inches, how tall am I in centimeters?\n\nAnswer: ')], [Generation(text='Question: Who was the 12th person on the moon?\n\nAnswer: ')], [Generation(text='Question: How many eyes does a blade of grass have?\n\nAnswer: ')]], llm_output=None, run=[RunInfo(run_id=UUID('51bec776-e08c-4023-8181-35c25d2878e1')), RunInfo(run_id=UUID('aa384099-0b40-4a99-8d34-4c3ac4e6c29f')), RunInfo(run_id=UUID('540e9d12-1ec3-4475-94bc-e27585867130')), RunInfo(run_id=UUID('15c41c59-c809-4972-aa7a-fc969d45c435'))], type='LLMResult')

USING OPENAI

In [6]:
import os
from dotenv import load_dotenv

load_dotenv()

os.environ["OPENAI_API_TOKEN"] = os.getenv("OPENAI_API_KEY")


In [9]:
# from langchain.llms import OpenAI
from langchain_community.chat_models import ChatOpenAI

model = ChatOpenAI(model_name = "gpt-3.5-turbo")


In [10]:
llm_chain = LLMChain(
    prompt = prompt,
    llm = model
)

print(llm_chain.run(question))


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

Prompt Templates
The prompt template classes in Langchain are built to make constructing prompts with dynamic inputs easier. Of these classes, the simplest is the PromptTemplate. We’ll test this by adding a single dynamic input to our previous prompt, the user query

In [None]:
from langchain import PromptTemplate


template = """Answer the question based on the context below. If the
question cannot be answered using the information provided answer
with "I don't know".

Context: Large Language Models (LLMs) are the latest models used in NLP.
Their superior performance over smaller models has made them incredibly
useful for developers building NLP enabled applications. These models
can be accessed via Hugging Face's `transformers` library, via OpenAI
using the `openai` library, and via Cohere using the `cohere` library.

Question: {query}

Answer: """

prompt_template = PromptTemplate(
    input_variables = ['query'],
    template = template
)

With this, we can use the format method on our prompt_template to see the effect of passing a query to the template.

In [None]:
print(
    prompt_template.format(
        query="Which libraries and model providers offer LLMs?"
    )
)

Naturally, we can pass the output of this directly into an LLM object like so:

In [None]:
print(openai(
    prompt_template.format(
        query="Which libraries and model providers offer LLMs?"
    )
))

Conversational Memory for LLMs with Langchain

In [None]:
from langchain import OpenAI
from langchain.chains import ConversationChain

In [None]:
llm = OpenAI(
    temperature = 0.7,
    openai_api_key = 'OPENAI_KEY',
    model = "gpt_4"
)

conversation = ConversationChain(llm = llm)
print(ConversationChain.prompt.template)

In [None]:
from langchain import FewShotPromptTemplate

# create our examples
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }
]

# create a example template
example_template = """
User: {query}
AI: {answer}
"""

# create a prompt example from above template
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template=example_template
)

# now break our previous prompt into a prefix and suffix
# the prefix is our instructions
prefix = """The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 
"""
# and the suffix our user input and output indicator
suffix = """
User: {query}
AI: """

# now create the few shot prompt template
few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

Few Shot Prompt Templates
The success of LLMs comes from their large size and ability to store “knowledge” within the model parameter, which is learned during model training. However, there are more ways to pass knowledge to an LLM. The two primary methods are:

Parametric knowledge — the knowledge mentioned above is anything that has been learned by the model during training time and is stored within the model weights (or parameters).
Source knowledge — any knowledge provided to the model at inference time via the input prompt.

Forms of Conversational Memory

1.  ConversationBufferMemory

In [None]:
from langchain.chains.conversation.memory import ConversationBufferMemory

conver_buf = ConversationChain(
    memory = ConversationBufferMemory,
    llm = llm
)

conver_buf("Good morning. ")

count_tokens count_tokens count_tokens

In [None]:
from langchain.callbacks import get_openai_callback

def count_tokens(chain, query):
    with get_openai_callback() as cb:
        result = chain.run(query)
        print(f'Spent a total of {cb.total_tokens} tokens')

    return result

In [None]:
count_tokens(
    conversation_buf, 
    "My interest here is to explore the potential of integrating Large Language Models with external knowledge"
)

The LLM can clearly remember the history of the conversation. Let’s take a look at how this conversation history is stored by the ConversationBufferMemory:

In [None]:
print(conver_buf.memory.buffer)

ConversationSummaryMemory

In [None]:
from langchain.chains.conversation.memory import ConversationSummaryMemory

conver_sum = ConversationChain(
    llm=llm,
    memory = ConversationSummaryMemory(llm=llm)
)

print(conver_sum.memory.prompt.template)

In [None]:
# without count_tokens we'd call `conversation_sum("Good morning AI!")`
# but let's keep track of our tokens:
count_tokens(
    conversation_sum, 
    "Good morning AI!"
)

ConversationBufferWindowMemory
The ConversationBufferWindowMemory acts in the same way as our earlier “buffer memory” but adds a window to the memory. Meaning that we only keep a given number of past interactions before “forgetting” them. We use it like so:

In [None]:
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

conversation = ConversationChain(
	llm=llm,
	memory=ConversationBufferWindowMemory(k=1)
)

In [None]:
count_tokens(
    conversation_bufw, 
    "Good morning AI!"
)

In [None]:
bufw_history = conversation_bufw.memory.load_memory_variables(
    inputs=[]
)['history']

print(bufw_history)

ConversationSummaryBufferMemory
The ConversationSummaryBufferMemory is a mix of the ConversationSummaryMemory and the ConversationBufferWindowMemory. It summarizes the earliest interactions in a conversation while maintaining the max_token_limit most recent tokens in their conversation. It is initialized like so:

In [None]:
conversation_sum_bufw = ConversationChain(
    llm=llm, memory=ConversationSummaryBufferMemory(
        llm=llm,
        max_token_limit=650
)

Fixing Hallucination with Knowledge Bases

Creating the Knowledge Base
We have two primary types of knowledge for LLMs. The parametric knowledge refers to everything the LLM learned during training and acts as a frozen snapshot of the world for the LLM.

The second type of knowledge is source knowledge. This knowledge covers any information fed into the LLM via the input prompt. When we talk about retrieval augmentation, we’re talking about giving the LLM valuable source knowledge.

In [2]:
from datasets import load_dataset

data = load_dataset("wikipedia", "20220301.simple", split='train[:10000]')
data

ModuleNotFoundError: No module named 'datasets'

In [4]:
class Animal:
  def make_sound(self):
    print("Generic animal sound")

class Dog(Animal):
#   def make_sound(self):
#     print("Woof!")
    pass

animals = [Dog(), Animal()]
for animal in animals:
  animal.make_sound()  # Output: Woof! (for Dog), Generic animal sound (for Animal)


Generic animal sound
Generic animal sound


In [9]:
class Duck:
    def quack(self):
        return "Duck quacks"

class Person:
    def quack(self):
        return "Person imitates duck"
class Persons:
    def quack(self):
        return "Person imitates ducking"

# Polymorphic behavior using duck typing
def make_sound(obj):
    return obj.quack()

duck_obj = Duck()
person_obj = Person()

print(make_sound(duck_obj))    # Output: "Duck quacks
print(make_sound(person_obj))  # Output: "Person imitates duck"

Duck quacks
Person imitates duck


Superpower LLMs with Conversational Agents
Superpower LLMs with Conversational Agents

In [27]:
from langchain import HuggingFaceHub, LLMChain, PromptTemplate

hub_llm = HuggingFaceHub(
    # repo_id = "bigscience/bloom-1b7",
    repo_id='tiiuae/falcon-7b-instruct',
    # model_kwargs = {'temperature':1e-10},
    
)


# Define a simple prompt template
prompt_template = PromptTemplate(
    input_variables=["question"],
    template="Answer the question: {question}")

In [28]:
from langchain import LLMChain

# Create an LLMChain
llm_chain = LLMChain(prompt = prompt_template, llm = hub_llm)

Difining the agents

In [30]:
from langchain.chains import LLMMathChain
from langchain.tools import Tool

llm_math = LLMMathChain(llm_chain = llm_chain)

math_tool = Tool(
    name = "Calculator",
    func = llm_math.run,
    description = "Useful when you need to answer questions about math."
)

tools = [math_tool]

In [31]:
tools[0].name , tools[0].description

('Calculator', 'Useful when you need to answer questions about math.')

In [None]:
from langchain.agents import load_tools

tools = load_tools(
    ['llm_math'],
    llm = hub_llm
)

tools[0].name , tools[0].description

We now have the LLM and tools but no agent. To initialize a simple agent, we can do the following:

In [39]:
from langchain.agents import initialize_agent

zero_shot_agent  = initialize_agent(
    agent = "zero-shot-react-description",
    tools = tools,
    llm = hub_llm,
    verbose =  True,
    max_iterations = 15
)

The agent used here is a "zero-shot-react-description" agent. Zero-shot means the agent functions on the current action only — it has no memory. It uses the ReAct framework to decide which tool to use, based solely on the tool’s description.
We won’t discuss the ReAct framework in this chapter, but you can think of it as if an LLM could cycle through Reasoning and Action steps. Enabling a multi-step process for identifying answers.

In [41]:
zero_shot_agent("what is 1 + 1")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAnswer the following questions as best you can. You have access to the following tools:

Calculator(*args: Any, callbacks: Union[list[langchain_core.callbacks.base.BaseCallbackHandler], langchain_core.callbacks.base.BaseCallbackManager, NoneType] = None, tags: Optional[List[str]] = None, metadata: Optional[Dict[str, Any]] = None, **kwargs: Any) -> Any - Useful when you need to answer questions about math.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Calculator]
Action Input: the input to the action[0m
Observation: the action to take, should be one of [Calculator] is not a valid tool, try one of [Calculator].
Thought:[32;1m[1;3mAnswer the following questions as best you can. You have access to the following tools:

Calculator(*args: Any, callbacks: Union[list[langchain_core.callbacks.base.BaseCallbackH

{'input': 'what is 1 + 1',
 'output': 'Agent stopped due to iteration limit or time limit.'}