This notebook outlines the implementation of a Retrieval-Augmented Generation (RAG) pipeline designed to use a fine-tuned LLaMa model and ingested PDF documents on travel guidelines. The goal is to create an intelligent travel assistant capable of answering user queries related to travel policies and recommendations. Below is a detailed description of the pipeline and its components:

# Pipeline Workflow
1. User Query: The system receives a natural language question from the user regarding travel guidelines.

2. Question Classification: The query is analyzed to determine its category, specifically identifying the country or region the question pertains to.

3. Namespace Mapping: The classified category is mapped to the corresponding namespace in the Pinecone database. Each namespace represents a region-specific collection of travel guidelines related to differnet places like USA, Asia, Africa etc.

4. Context Retrieval: Using the mapped namespace, the system retrieves the most relevant documents or document chunks from the Pinecone vector database.

5. RAG Chain Processing: The retrieved context is combined with the user's query. The fine-tuned LLaMa 2 model is then used to process the input and reason over the context to formulate an accurate and contextually relevant answer.

6. Answer Generation: The system outputs a comprehensive and concise response to the user, addressing their query based on the retrieved context.



## Installing dependencies

In [None]:
!pip install -U transformers
!pip install -U sentence-transformers
!pip install -U bitsandbytes accelerate
!pip install -U pydantic
!pip install pypdf
!pip install langchain-groq
!pip install -U peft
!pip install -U pinecone langchain-pinecone
!pip install -U langchain langchain-community langchain-cohere faiss-cpu sentence-transformers chromadb

Collecting transformers
  Downloading transformers-4.47.0-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading transformers-4.47.0-py3-none-any.whl (10.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m86.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m96.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.20.3
    Uninstalling tokenizers-0.20.3:
      Successfully uninstalled tokenizers-0.20

## Import RAG components required to build pipeline

In [None]:
# Libraries for orchestrating the RAG pipeline
from langchain.llms import HuggingFaceHub
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings
import os
from langchain.prompts import ChatPromptTemplate
from langchain.llms import HuggingFacePipeline
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain_pinecone import PineconeVectorStore
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from pinecone import Pinecone, ServerlessSpec
import time
from langchain_pinecone import PineconeVectorStore
=from langchain_community.document_loaders import PyPDFLoader

# Libraries for loading the LLM with quantization
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
import os



## Setting API keys

In [None]:
import os
from getpass import getpass

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"
HF_TOKEN = "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"
pinecone_api_key = "pcsk_3WxVZM_F7geCbitX1UU1ACesReFrgC8L7WPxBfPsLaVFLdWHvxar7sdsg6dnF3RkiEXVAr"
os.environ["PINECONE_API_KEY"] = "pcsk_3WxVZM_F7geCbitX1UU1ACesReFrgC8L7WPxBfPsLaVFLdWHvxar7sdsg6dnF3RkiEXVAr"

import nest_asyncio
nest_asyncio.apply()

## Embedding model

Here the embedding model is instantiated using Huggingface model via langchain

In [None]:
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=HF_TOKEN, model_name="BAAI/bge-base-en-v1.5"
)

## Loading classification model

The classificaiton model used is Mixtral 8x7b from Groq API. A classificaiton LLM is used to classify user queries to relevant countries, and then access the relevant namespace in the Pinecone database when constructing the reteriever for the RAG pipeline.

In [None]:
from langchain_groq import ChatGroq
os.environ["GROQ_API_KEY"] = "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"
classification_llm = ChatGroq(model_name="mixtral-8x7b-32768")


## Loading LLM model with QLoRA

This is the fine-tuned LLM model being loaded in quantized format due to limited memory constraints. It will be used to answer user's questions



In [None]:
import warnings
warnings.filterwarnings("ignore")

model_name = "NousResearch/llama-2-7b-chat-hf"
device_map = {"": 0}

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)

new_model = "Maaz66/llama-2-7b-miniguanaco"
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_new_tokens=2048)
llm = HuggingFacePipeline(pipeline=pipe)

config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/783 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/369M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

## Templates definition

Here, we defined two templates, one for classifying user queries using the classification LLM, and the other for generating an answer using the retierved context from the vector database.

In [None]:
# Define classify question prompt template
classify_question = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are an assistant that answers in JSON format.
You will be given a question and your task is to classify it into one of the following categories:
- usa_guides: If the query is related to travelling guidelines in the USA.
- canada_guides: If the query is related to travelling guidelines in Canada.
- australia_guides: If the query is related to travelling guidelines in Australia.
- africa_guides: If the query is related to travelling guidelines in Africa.
- europe_guides: If the query is related to travelling guidelines in Europe.
- north_america_guides: If the query is related to travelling guidelines in North America (excluding USA and Canada).
- south_america_guides: If the query is related to travelling guidelines in South America.
- asia_guides: If the query is related to travelling guidelines in Asia.

Default value is usa_guides.
Your output should only contain one of the above values as output, and nothing else.
<|eot_id|><|start_header_id|>user<|end_header_id|>
    Question: {question}
    Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>"""


# Define answer prompt template
answer_template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a travel assistant for question-answering tasks.
    Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Answer queries related to travel guidelines only.
    Use three to four sentences maximum and keep the answer concise <|eot_id|><|start_header_id|>user<|end_header_id|>
    Question: {question}
    Context: {context}
    Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

## Classify user question using the classification LLM

In [None]:
# Function to classify the question
def classify_question_type(question: str, classification_llm) -> str:
    """
    Classifies the input question into one of the predefined categories: usa_guides or canada_guides.

    Args:
        question (str): The input question to classify.
        classification_llm: The LLM model for classification.

    Returns:
        str: The classification output (usa_guides or canada_guides).
    """
    prompt2 = ChatPromptTemplate.from_template(classify_question)
    classify_chain = prompt2 | classification_llm | StrOutputParser()
    return classify_chain.invoke(question)


llm_output = classify_question_type("What local activities or tours would you recommend in Japan for someone interested in natural landscapes?", classification_llm)
llm_output

'asia_guides'

## Select relevant namespace using the classificaiton output and setup retriever

Here, we have defined two functions. The first one takes the classificaiton llm's output and returns the relevant namespace which we can connect to in Pinecone.

The second one uses the returned namespace from the first function, and connects to Pinecone database, and returns it as the retriever.

Thus, we can use these functions in conjunction later on in the pipeline construction

In [None]:
def get_document_namespace(llm_output: str, default_namespace="usa_guides") -> str:
    """
    Determines the namespace based on the LLM classification output.
    Checks for specific namespaces based on the classification result.
    Defaults to 'usa_guides' if no match is found.

    Args:
        llm_output (str): The classification output.
        default_namespace (str): The default namespace if no match is found.

    Returns:
        str: The namespace corresponding to the classification.
    """
    llm_output = llm_output.lower().strip()  # Normalize to lowercase and strip whitespace

    # Map LLM outputs to namespaces
    namespace_map = {
        "usa_guides": "travel-guide-us",
        "canada_guides": "travel-guide-ca",
        "australia_guides": "travel-guide-australia",
        "africa_guides": "travel-guide-africa",
        "europe_guides": "travel-guide-europe",
        "north_america_guides": "travel-guide-north-america",
        "asia_guides": "travel-guide-asia",
        "south_america_guides": "travel-guide-south-america"
    }

    # Match namespace or return the default
    for key, namespace in namespace_map.items():
        if key in llm_output:
            return namespace

    return namespace_map.get(default_namespace, "travel-guide-us")


def create_retriever(namespace: str, index_name="travel-assistant", embeddings=None) -> PineconeVectorStore:
    """
    Creates and returns a retriever using Pinecone and the given namespace.

    Args:
        namespace (str): The namespace to search in.
        index_name (str): The name of the Pinecone index.
        embeddings: The embeddings used for the retriever.

    Returns:
        PineconeVectorStore: The retriever for the specified namespace.
    """

    # Create the retriever using the specified namespace
    try:
        docsearch = PineconeVectorStore.from_existing_index(
            index_name=index_name,
            embedding=embeddings,
            namespace=namespace
        )
        return docsearch.as_retriever(search_type="mmr", search_kwargs={"k": 6})
    except Exception as e:
        raise ValueError(f"Failed to create retriever for namespace '{namespace}' with index '{index_name}': {e}")


## RAG chain

Here we define the answering RAG chain. It takes in the prompt, the retiever, and the fine-tuned llm and generates an answer for the user question

In [None]:
def create_rag_chain(retriever, llm, template):
    """
    Defines the Retrieval-Augmented Generation (RAG) chain.

    Args:
        retriever: The retriever object for context retrieval.
        llm: The language model to use for generating answers.
        template (str): The prompt template for the RAG chain.

    Returns:
        RAG chain object.
    """
    prompt = ChatPromptTemplate.from_template(template)
    rag_chain = (
        {"context": retriever,  "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    return rag_chain


## Helper function to format LLM output

A simple helper function to parse the LLM output once it responds to the user question

In [None]:
def extract_key_from_rag_chain(response: str, key_to_access: str) -> str:
    # Step 2: Clean and extract data
    data = {}
    parts = response.split("<|")
    for part in parts:
        if "|>" in part:
            key, rest = part.split("|>", 1)
            key = key.strip()
            rest = rest.strip()
            data[key] = rest

    # Step 3: Access the specific key
    if key_to_access in data:
        return data[key_to_access]
    else:
        return f"Key '{key_to_access}' does not exist in the parsed data."


## Main function

This is the main function that uses the above functions defined. It first classifies the user's questions using the function defined above, then gets the relevant namespace, connects to that namespace, returns the retriever, and then sends the retiever, user question to the rag chain to get an answer. The answer is then parsed using the helper function defined above. The function returns a helper message telling which namespace was selected for the user question, and the answer generated by the fine-tuned LLM

In [None]:
def process_question(question: str, classification_llm, embeddings, llm, template, index_name="travel-assistant"):
    """
    Processes the input question: classifies it, selects the appropriate namespace,
    retrieves relevant documents, and generates an answer.

    Args:
        question (str): The input question.
        classification_llm: The LLM model for classification.
        embeddings: The embeddings for the retriever.
        llm: The language model for answer generation.
        template: The prompt template for the RAG chain.
        index_name (str): The Pinecone index name.

    Returns:
        tuple: A helpful message about the namespace and the generated answer.
    """

    # Step 1: Classify the question
    llm_output = classify_question_type(question, classification_llm)

    # Step 2: Get the appropriate namespace based on classification
    namespace = get_document_namespace(llm_output)

    # Step 3: Map namespaces to descriptions
    namespace_descriptions = {
        "travel-guide-us": "Information related to traveling guidelines and policies in the USA",
        "travel-guide-ca": "Information related to traveling guidelines and policies in Canada",
        "travel-guide-australia": "Information related to traveling guidelines and policies in Australia",
        "travel-guide-africa": "Information related to traveling guidelines and policies in Africa",
        "travel-guide-europe": "Information related to traveling guidelines and policies in Europe",
        "travel-guide-north-america": "Information related to traveling guidelines and policies in North America",
        "travel-guide-asia": "Information related to traveling guidelines and policies in Asia",
        "travel-guide-south-america": "Information related to traveling guidelines and policies in South America",
    }
    namespace_description = namespace_descriptions.get(namespace, "Unknown namespace")

    # Step 4: Create the retriever
    try:
        retriever = create_retriever(namespace, index_name, embeddings)
    except ValueError as e:
        return f"Error while creating retriever: {e}", None

    # Step 5: Create the RAG chain with the selected retriever
    try:
        rag_chain = create_rag_chain(retriever, llm, template)
    except Exception as e:
        return f"Error while creating RAG chain: {e}", None

    # Step 6: Invoke the RAG chain to get the answer
    try:
        answer = rag_chain.invoke(question)
    except Exception as e:
        return f"Error while generating an answer: {e}", None

    # Include helpful message about the namespace
    message = f"Namespace selected: {namespace} ({namespace_description})"
    return message, answer


## Ask away!!!

In [None]:
# Example usage: Process a question, classify it, select retriever, and get the answer
question = "What are some travelling guidelines in the USA?"

namespace_message, answer = process_question(question, classification_llm, embeddings, llm, answer_template)
answer  = extract_key_from_rag_chain(answer, "end_header_id")

print(namespace_message)
print("\n--------- Assistant answer --------\n")
print(answer)

Namespace selected: travel-guide-us (Information related to traveling guidelines and policies in the USA)

--------- Assistant answer --------

The Centers for Disease Control and Prevention (CDC) provides guidelines for traveling to the United States during the COVID-19 pandemic. Before traveling, it is recommended to get up to date with COVID-19 vaccines and follow mask-wearing recommendations in public transportation settings. It is also important to follow the requirements of transportation operators, such as airlines, cruise lines, and buses, and to be aware of any testing or proof of vaccination requirements at your destination. Additionally, you should be prepared to provide contact information to airlines before boarding, as this helps to rapidly identify and contact people in the US who may have been exposed to a communicable disease. Overall, it is important to be aware of and follow all requirements and recommendations provided by the CDC and other authorities to ensure a sa

In [None]:
# Example usage: Process a question, classify it, select retriever, and get the answer
question = "What are the best times of the years to visit Japan?"

namespace_message, answer = process_question(question, classification_llm, embeddings, llm, answer_template)
answer  = extract_key_from_rag_chain(answer, "end_header_id")

print(namespace_message)
print("\n--------- Assistant answer --------\n")
print(answer)

Namespace selected: travel-guide-asia (Information related to traveling guidelines and policies in Asia)

--------- Assistant answer --------

The best times to visit Japan are from March to May and from September to November. These periods offer pleasant weather conditions, with mild temperatures and fewer crowds compared to the peak tourist season from June to August. During these times, you can enjoy outdoor activities such as hiking, visiting temples and shrines, and experiencing the country's vibrant culture without the heat and humidity of the summer months. Additionally, the spring season (March to May) is known for its beautiful cherry blossom blooms, while the autumn season (September to November) is characterized by vibrant foliage and festivals.


In [None]:
# Example usage: Process a question, classify it, select retriever, and get the answer
question = "Can you recommend accomodations in Paris that suit a mid-range budget and are near the Eiffel Tower?"

namespace_message, answer = process_question(question, classification_llm, embeddings, llm, answer_template)
answer  = extract_key_from_rag_chain(answer, "end_header_id")

print(namespace_message)
print("\n--------- Assistant answer --------\n")
print(answer)

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Namespace selected: travel-guide-europe (Information related to traveling guidelines and policies in Europe)

--------- Assistant answer --------

Of course! There are plenty of accommodations near the Eiffel Tower that fit a mid-range budget. Here are a few options to consider:

* Hotel Eiffel Seine: This 3-star hotel is located just a short walk from the Eiffel Tower and offers comfortable rooms at an affordable price.
* Hotel du Louvre: This 3-star hotel is situated in the heart of Paris, within walking distance of the Eiffel Tower and other popular attractions.
* Hostel Moyen Age: This budget-friendly hostel is located in the Latin Quarter, just a short metro ride from the Eiffel Tower. It offers dormitory-style accommodations as well as private rooms.

All of these options are within a reasonable budget and offer convenient access to the Eiffel Tower. I hope this helps! Let me know if you have any other questions.


In [None]:
# Example usage: Process a question, classify it, select retriever, and get the answer
question = "Are there any visa requirements or travel restrictions for US citizens traveling to VIetnam?"

namespace_message, answer = process_question(question, classification_llm, embeddings, llm, answer_template)
answer  = extract_key_from_rag_chain(answer, "end_header_id")

print(namespace_message)
print("\n--------- Assistant answer --------\n")
print(answer)

Namespace selected: travel-guide-asia (Information related to traveling guidelines and policies in Asia)

--------- Assistant answer --------

Yes, there are visa requirements and travel restrictions for US citizens traveling to Vietnam. US citizens are required to have a valid visa to enter Vietnam, and the process for obtaining a visa can vary depending on the purpose and duration of the trip.

For tourist visas, US citizens can apply for a single-entry visa online through the Vietnamese government's website or through a local embassy or consulate. The visa is valid for 30 days and can be extended for an additional 30 days.

For business visas, US citizens can apply through a local embassy or consulate. The requirements for a business visa include a letter from the employer explaining the purpose of the trip and a copy of the passport.

It's important to note that US citizens are restricted from traveling to certain areas in Vietnam, including the provinces of Binh Dinh, Phu Yen, and

In [None]:
# Example usage: Process a question, classify it, select retriever, and get the answer
question = "What local activities or tours would you recommend in Iceland for someone interested in natural landscapes?"

namespace_message, answer = process_question(question, classification_llm, embeddings, llm, answer_template)
answer  = extract_key_from_rag_chain(answer, "end_header_id")

print(namespace_message)
print("\n--------- Assistant answer --------\n")
print(answer)

Namespace selected: travel-guide-europe (Information related to traveling guidelines and policies in Europe)

--------- Assistant answer --------

I would recommend exploring Iceland's natural landscapes through a combination of guided tours and independent exploration. Here are some activities and tours that I would recommend:

1. The Golden Circle Tour: This popular tour takes you to some of Iceland's most iconic natural attractions, including the Gullfoss waterfall, the Geysir geothermal area, and Thingvellir National Park.
2. South Shore Adventure: This tour takes you along Iceland's scenic south coast, where you can see stunning waterfalls, black sand beaches, and glaciers.
3. Glacier Hiking: Explore Iceland's glaciers on a guided hike, where you can see breathtaking ice formations and learn about the geology of the glaciers.
4. Whale Watching: Take a boat tour to see orcas, humpback whales, and other marine life in their natural habitat.
5. Northern Lights: Take a guided tour to 