# Retrieval-Augmented Generation with Llama2 and ChromaDB on PropulsionAI

In this notebook, we'll explore how to leverage Retrieval-Augmented Generation (RAG) to enhance language model responses with relevant information from a knowledge base. Specifically, we'll use a pre-trained Llama2 model from PropulsionAI, a powerful language model, and ChromaDB, a vector database, to build a question-answering system focused on Domino's Pizza in India.

To learn how to train a Llama2 model, check the link: [Fine-Tune LLaMA 2 on PropulsionAI](https://propulsionhq.com/resources/tutorials/llama-2-fine-tune-your-own-data/).


In [None]:
# Install necessary libraries
%pip install -q chromadb langchain requests sentence-transformers langchain-community jq

In [2]:
# Import required libraries
import json
from typing import Any, List, Optional

import requests
from langchain.chains import RetrievalQA
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain_community.document_loaders import JSONLoader
from langchain_core.callbacks.manager import CallbackManagerForLLMRun
from langchain_core.language_models.llms import LLM

In [3]:
# Load the data from dominos-menu.jsonl
loader = JSONLoader(
    file_path="./dominos-menu.jsonl",
    jq_schema="{name: .name, price: .price, description: .description, category: .category}",
    text_content=False,
    json_lines=True,
)
data = loader.load()

In [None]:
# Set up Chroma vector database
embeddings = HuggingFaceEmbeddings()
vectordb = Chroma.from_documents(data, embeddings)

In [12]:
# Set up Llama2 API using Propulsion API key and model version id
PROPULSION_API_KEY = "your api key"
PROPULSION_VERSION_ID = "your model version id"

In [20]:
def call_llama2(prompt, temperature=0.5, top_p=1, n=1, max_tokens=500, stream=False):
    url = f"https://api.propulsionhq.com/api/v1/{PROPULSION_VERSION_ID}/run?wait=true"
    payload = {
        "model": "p8n-llm",
        "messages": [
            {
                "role": "system",
                "content": "You are Domino's India Copilot, a specialized GPT model created to interact with users regarding Domino's Pizza in India. Your role is to provide information about Domino's products & prices, services, policies, and procedures within India. Your responses should be focused on Domino's-related inquiries only, avoiding engagement in general conversation or unrelated topics. Ensure your responses are up-to-date, accurate, concise, and relevant to Domino's India. You don't talk about anything else. If you're not provided with product prices in instructions, you don't mention prices, otherwise you do. Operational Guidelines: 1. Focus: Address only Domino's India-related questions including menu items, delivery services, gift cards, franchise opportunities, and customer service policies. 2. Accuracy: Offer factually correct information based on the latest updates from Domino's India. 3. Conciseness: Keep responses clear and to the point, without additional elaboration or extraneous content. 4. Relevance: Respond exclusively to questions about Domino's India and politely DECLINE topics outside this remit. 5. Don't invent any new products or items on your own, stick to your existing information. Don't hallucinate. 6. Currency is Indian Rupees (INR). Add INR while quoting prices like Veg Paradise (INR 589). Example Interactions: - User Query: How do I redeem my Domino's gift card? Copilot Response: To redeem your Domino's gift card in India, present it at checkout in any participating outlet or enter the card details online during payment. - User Query: What's the latest pizza on the Domino's menu? Copilot Response: Domino's India's newest menu item is the 'Spicy Triple Tango' pizza. Check the Domino's app or website for more details. - User Query: Can you tell me a joke? Copilot Response: My expertise is in assisting with Domino's India-related questions. For jokes, please look elsewhere. How may I help with your Domino's experience today? - User Query: Who is donald trump? Copilot Response: My expertise is in assisting with Domino's India-related questions. How may I help with your Domino's experience today?",
            },
            {"role": "user", "content": prompt},
        ],
        "temperature": temperature,
        "top_p": top_p,
        "n": n,
        "max_tokens": max_tokens,
        "stream": stream,
    }

    headers = {
        "Authorization": f"Bearer {PROPULSION_API_KEY}",
        "Content-Type": "application/json; charset=utf-8",
    }
    response = requests.request("POST", url, headers=headers, data=json.dumps(payload))

    # raise exception if response status code is not 200
    response.raise_for_status()

    response_json = response.json()

    if stream:
        output = "".join(
            [chunk["choices"][0]["message"]["content"] for chunk in response_json]
        )
    else:
        output = response_json["choices"][0]["message"]["content"]

    return output

In [7]:
# Set up custom LLM
class PropulsionLLM(LLM):
    @property
    def _llm_type(self) -> str:
        return "custom"

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        return call_llama2(prompt)

    @property
    def _llm_type(self):
        return "custom"


llm = PropulsionLLM()

In [8]:
# Set up RAG chain
qa = RetrievalQA.from_llm(llm, retriever=vectordb.as_retriever())

In [21]:
# Ask a question
query = "What are some vegetarian options on the menu? Give me the prices as well."
result = qa.invoke(query)
print(result)

{'query': 'What are some vegetarian options on the menu? Give me the prices as well.', 'result': 'There are several vegetarian options on the menu, including the Veg Paradise (INR 589) and Veg Extravaganza (INR 619) meals, which feature a regular pizza and a regular side.'}
