# 5 Hello, Llama: langchain  

_LangChain is a framework for developing applications powered by large language models (LLMs)._ [https://github.com/langchain-ai/langchain]  
It helps developers build complex, multi-step workflows by connecting LLMs with external data sources, such as databases or APIs, and organizing those interactions into modular components.  
LangChain enables tasks like prompt generation, data retrieval, and decision-making through the use of chains (sequences of actions) and agents (autonomous decision-makers), making it a versatile tool for applications such as chatbots, question-answering systems, and other AI-driven workflows.


#### Install required libraries

In [3]:
# uncomment to install required libraries

# !pip install langchain_huggingface==0.1.0

#### Import required libraries

In [5]:
import transformers
from langchain_huggingface import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import os
import time
import json

#### Choose the model you want to use

The model could be downloaded from HuggingFace for example here --> https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct. You can clone the repo locally after creating an account on Huggingface and accepting the meta policies.  

_Note: you can configure transformer library to download it without cloning repo manually._

In [6]:
# change the following folder to point the path where you have stored the model you want to use
base_folder = "FILL_WITH_BASE_FOLDER" # Example: "C:/Users/username/Documents/HuggingFace"

model_name = "Llama-3.2-3B-Instruct"

# set the model id
model_id = os.path.join(base_folder, model_name)

#### Build HuggingFace pipeline to be used in langchain

In [7]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id
)

pipe = transformers.pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=128, 
    top_k=50, 
    temperature=0.1
)


hf_pipeline = HuggingFacePipeline(
    pipeline=pipe
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

#### Define the prompt template

In LangChain, a prompt template is a predefined structure used to format input prompts for large language models (LLMs).   
It combines static text and dynamic variables, allowing users to customize prompts by filling in specific values.

In [8]:
template = """
System: You are an expert on world capitals. Respond only to the question. And respond only if the question is related to countries capitals; otherwise, respond with "I don't know the answer."

Query: {query}
Response:
"""
prompt_template = PromptTemplate.from_template(template)

#### Create chain using python pipe operator

In Python, the pipe operator (|) is primarily used for two main purposes:
- __Bitwise OR Operator__: When applied between two integers, the pipe operator performs a bitwise OR operation.
- __Chaining Operator__ : In contexts such as the Langchain library, the pipe operator allows for chaining operations in a functional style, enhancing readability.


In [9]:
# chaining prompt_template and pipeline
chain = prompt_template | hf_pipeline.bind(skip_prompt=True)

#### Define input dict and execute chain steps

Define the input_dict to fill the prompt_template variable defined before

In [10]:
input_dict = {"query": "What is the capital of France?"}

result = chain.invoke(input_dict)

print(result)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


The capital of France is Paris.


### Check if the llm and prompt works as expected

##### Bad Question

In [11]:
input_dict = {"query": "I know Ottawa is a capital city. But what is birmingham?"}

result = chain.invoke(input_dict)

print(result)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


I don't know the answer.


In [12]:
input_dict = {"query": "What is an intel core i5?"}

result = chain.invoke(input_dict)

print(result)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


I don't know the answer.


##### Different wording of the question

In [13]:
input_dict = {"query": "Is Ottawa a capital city?"}

result = chain.invoke(input_dict)

print(result)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Yes, Ottawa is the capital city of Canada. It has been the country's capital since 1857.


In [14]:
input_dict = {"query": "If Ottawa is the capital of Canada what is Rome?"}

result = chain.invoke(input_dict)

print(result)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Rome is the capital of Italy.


In [15]:
# bad question but the model could correct the speaker
input_dict = {"query": "What is the capital of Ottawa?"}

result = chain.invoke(input_dict)

print(result)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Ottawa is the capital of Canada.


## Not known answer (RAG key concept)

Large Language Models (LLMs) are trained on vast amounts of data, but their knowledge is limited to the information available up to their training cutoff. So, what happens if a new country is created after this period?

In such cases, the model may not know about new developments. To address this, we can apply the key concept of **Retrieval-Augmented Generation (RAG)**. RAG involves providing additional, relevant information (known as "context") to help the model generate a more accurate response.

For example, if a new state is created and the model has no prior knowledge of it, we can supply context like:
**"Edoras is the capital of Rohan."**

By incorporating this context, the model can answer questions about concepts it would not otherwise know.

In [16]:
# example of unkown capital
input_dict = {"query": "What is the capital of Rohan?"}

result = chain.invoke(input_dict)

print(result)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


I don't know the answer.


#### Changing the prompt to support a context and updating chain

In [17]:
template = """
System: You are an expert on world capitals. Respond only to the question. And respond only if the question is related to countries capitals; otherwise, respond with "I don't know the answer.". In addition to your knowledge consider the "Extra information" below. 

Extra information: {context}\n\n

Query: {query}
Response:
"""

prompt_template = PromptTemplate.from_template(template)

# chaining prompt_template and pipeline
chain = prompt_template | hf_pipeline.bind(skip_prompt=True)

#### Execute the same query that fails before adding context 

Context that will be addes: The capital of Rohan is the fortified town of Edoras, on a hill in a valley of the White Mountains. "Edoras" is Old English for "enclosures".

In [21]:
context = {"context": "The capital of Rohan is the fortified town of Edoras, on a hill in a valley of the White Mountains. \"Edoras\" is Old English for \"enclosures\"."}

In [19]:
# example of unkown capital
input_dict = {"query": "What is the capital of Rohan?", "context": context}

result = chain.invoke(input_dict)
print(result)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Edoras


In [20]:
# example of unkown capital
input_dict = {"query": "What is Edoras?", "context": context}

result = chain.invoke(input_dict)
print(result)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Edoras is the capital of Rohan.


### Final cosiderations

This is the basic idea behind building a Retrieval-Augmented Generation (RAG) system. However, the key challenge lies in how to efficiently find and retrieve the right context to include in the prompt.