# Retrieval Augmented Generation

Prompt + Data = Big Success

### Working Environment 

[![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/build-on-aws/generative-ai-prompt-engineering/blob/main/prompt-engineering-chatbot/prompt-engineering-chatbot.ipynb)


This notebook has been designed, written and tested to run on machines with a minimum of 16GB of RAM (32GB preferred). However, if you don't have access to one sign up for a free account on [Amazon SageMaker Studio Lab](https://studiolab.sagemaker.aws/).  Studio Lab is a free machine learning (ML) development environment that provides compute and storage (up to 15GB) at no cost with NO credit card required.

You can sign up for Amazon SageMaker Studio Lab here: [https://studiolab.sagemaker.aws/]

# Code Example

### Boring Stuff
This is just code needed to set everything up

In [None]:
import transformers

pipeline = transformers.pipeline(
        "text-generation",
        model="/Users/john.robinson/Projects/models/Meta-Llama-3-8B-Instruct",
        device="mps"
    )

In [None]:
from langchain.prompts import PromptTemplate

context_template = PromptTemplate.from_template(
    """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are an AI assistant which gives helpful, detailed, and polite answers to the user's questions
<|eot_id|><|start_header_id|>user<|end_header_id|>
Given the context below, answer the question that follows. If you do not know the answer and the context does not contain the information to answer the question say you don't know and why.

Context: {context}
Question: {question}

<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
)


def test_model(pipeline, user_prompt, additional_context=""):

    prompt=context_template.format(question=user_prompt, context=additional_context)

    terminators = [
        pipeline.tokenizer.eos_token_id,
        pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
    ]
    sequences = pipeline(
        prompt,
        top_p=0.9,
        temperature=0.8,
        eos_token_id=terminators,
        max_new_tokens=200,
        return_full_text=False,
        pad_token_id=pipeline.tokenizer.eos_token_id
    )
    
    answer = sequences[0]['generated_text']
    
    return f"Question: {user_prompt}\nAnswer: {answer}"

### Query Model
First things first: let's ask the model something it won't know

In [None]:
question = "Who won the 2024 Super Bowl?"

response = test_model(pipeline, question)
print(response)

Well that was lame.

### Model + Data
Let's give our model some more data to make it more useful

In [None]:
question = "Who won the 2024 Super Bowl?"
additional_context = "The Kansas City Chiefs won the 2024 Super Bowl 25 to 22 over the San Fransisco 49ers."

response = test_model(pipeline, question, additional_context=additional_context)
print(response)

# RAG Example

### Store Data
Let's take a document wtih some data detailing who won recent Super Bowls.

Below is some boilerplate to store our data into a local Vector DB.

In [None]:
# LangChain is a framework and toolkit for interacting with LLMs programmatically

from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import SQLiteVSS
from langchain.document_loaders.text import TextLoader

# Load the document using a LangChain text loader
loader = TextLoader("data.txt")
documents = loader.load()

# Split the document into chunks
text_splitter = CharacterTextSplitter (chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
texts = [doc.page_content for doc in docs]

# Use the sentence transformer package with the all-MiniLM-L6-v2 embedding model
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# Load the text embeddings in SQLiteVSS in a table named state_union
db = SQLiteVSS.from_texts(
    texts = texts,
    embedding = embedding_function,
    table = "documents",
    db_file = "/tmp/vss.db"
)

# First, we will do a simple retrieval using similarity search
# Query
question = "Who won the 2024 Super Bowl?"
data = db.similarity_search(question)

# print results
print(data[0].page_content)

## THROW IT ALL TOGETHER. 
### Automate the "Retrival" and "Augmate" the "Generation"

In [None]:
question = "Who won the 2024 Super Bowl?"

data = db.similarity_search(question)
additional_context = data[0].page_content

response = test_model(pipeline, question, additional_context=additional_context)
print(response)