# Retrieval Augmented Generation

Prompt + Data = Big Success

### Working Environment 

[![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/build-on-aws/generative-ai-prompt-engineering/blob/main/prompt-engineering-chatbot/prompt-engineering-chatbot.ipynb)


This notebook has been designed, written and tested to run on machines with a minimum of 16GB of RAM (32GB preferred). However, if you don't have access to one sign up for a free account on [Amazon SageMaker Studio Lab](https://studiolab.sagemaker.aws/).  Studio Lab is a free machine learning (ML) development environment that provides compute and storage (up to 15GB) at no cost with NO credit card required.

You can sign up for Amazon SageMaker Studio Lab here: [https://studiolab.sagemaker.aws/]

# Code Example

### Boring Stuff
This is just code needed to set everything up

In [None]:
%pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
%pip install langchain
%pip install chromadb

In [None]:
%git lfs install
%git clone https://huggingface.co/johnr9412/Nashville-Meta-Llama-3-8B-Instruct-GGUF ../

In [None]:
MODEL_PATH = "../Nashville-Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf"

In [None]:
from llama_cpp import Llama

LLM = Llama(
    model_path=MODEL_PATH,
    chat_format="llama-3",
    n_gpu_layers=200, #leave this off unless you have gpu to run against
    verbose=False,
    n_ctx=8000
)

In [None]:
from langchain.prompts import PromptTemplate

context_template = PromptTemplate.from_template(
    """
Given the context below, answer the question that follows. If you do not know the answer and the context does not contain the information to answer the question say you don't know and why.

Context: {context}
Question: {question}

"""
)

def query_model(user_prompt, additional_context=""):
    messages = [
          {"role": "system", "content": "You are an AI assistant which gives helpful, detailed, and polite answers to the user's questions."},
          {
              "role": "user",
              "content": context_template.format(question=user_prompt, context=additional_context)
          }
      ]

    results = LLM.create_chat_completion(
        messages
    )
    
    answer = results['choices'][0]['message']['content']
    return f"Question: {user_prompt}\nAnswer: {str(answer).strip()}"

### Query Model
First things first: let's ask the model something it won't know

In [None]:
question = "Who won the 2024 Super Bowl?"

response = query_model(question)
print(response)

Well that was lame.

### Model + Data
Let's give our model some more data to make it more useful

In [None]:
question = "Who won the 2024 Super Bowl?"
additional_context = "The Kansas City Chiefs won the 2024 Super Bowl 25 to 22 over the San Fransisco 49ers."

response = query_model(question, additional_context=additional_context)
print(response)

# RAG Example

### Store Data
Let's take a document wtih some data detailing who won recent Super Bowls.

Below is some boilerplate to store our data into a local Vector DB.

In [None]:
# LangChain is a framework and toolkit for interacting with LLMs programmatically

from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders.text import TextLoader
from os import listdir
from os.path import isfile, join

text_splitter = CharacterTextSplitter (chunk_size=1000, chunk_overlap=0)

# Load the document using a LangChain text loader
texts = []
data_dir = "./docs/"
for file_path in listdir(data_dir):
    file_path = join(data_dir, file_path)
    if isfile(file_path):
        loader = TextLoader(file_path)

        # Split the document into chunks
        for doc in text_splitter.split_documents(loader.load()):
            texts.append(doc.page_content)

# Use the sentence transformer package with the all-MiniLM-L6-v2 embedding model
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# Load the text embeddings in SQLiteVSS in a table named state_union
db = Chroma.from_texts(
    texts = texts,
    embedding = embedding_function
)

# First, we will do a simple retrieval using similarity search
# Query
question = "Who won the 2024 Super Bowl?"
data = db.similarity_search(question, k=1)

# print results
print(data[0].page_content)

## THROW IT ALL TOGETHER. 
### Automate the "Retrival" and "Augmate" the "Generation"

In [None]:
question = "Who won the 2024 Super Bowl?"

data = db.similarity_search(question, k=1)
additional_context = data[0].page_content

response = query_model(question, additional_context=additional_context)
print(response)