# Basic RAG Implementation

## Basic RAG - v1 
- tokenized using `cl100k_base`
- chunking using `Fixed-size Chunking `

In [12]:
# Get api key

import os
from dotenv import load_dotenv

load_dotenv(".credential")

GEMINI_API = os.getenv("GEMINI_API")


In [None]:
import ollama

response = ollama.chat(model='hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:latest', messages=[
    {
        'role': 'user',
        'content': 'Is AI just IF/ELSE?',
    },
])

for chunk in response:
    print(response['message']['content'], end='', flush=True)

In [50]:
# Indexing
# Loading data

data_list = []
with open("data/cat-facts.txt", "r", encoding="UTF-8") as f:
    data_list.append(f.read())

print(len(data_list[0]))

22476


In [None]:
# Tokenize data
import tiktoken
encoding = tiktoken.get_encoding("cl100k_base")
data_tokens = encoding.encode(data_list[0])

print(len(data_tokens))

# print(len([data_tokens[i:i+256] for i in range(0, len(data_tokens), 256)][-1]))

5247
127


In [87]:
# Chunkings
# Fixed-size Chunking 

def get_chunk(data, num_size: int): 
    return [data[i:i+num_size] for i in range(0, len(data), num_size)]

# Divide data into each 256 chunk
chunks_list = get_chunk(data_tokens, 256)

# print(len(chunks_list))
# print(chunks_list[1])
# detokenized_chunk = encoding.decode(chunks_list[0])
# print(type(detokenized_chunk))

In [98]:
# Embedding each chunks and save to DB
EMBEDDING_MODEL = 'hf.co/CompendiumLabs/bge-base-en-v1.5-gguf'

VECTOR_DB = []

def detokenize(chunk):
    encoding = tiktoken.get_encoding("cl100k_base")
    return encoding.decode(chunk)

def get_embeded(chunk):
    embedding = ollama.embed(model=EMBEDDING_MODEL, input=chunk)['embeddings'][0]
    return embedding

for chunk in chunks_list:
    VECTOR_DB.append((detokenize(chunk), get_embeded(detokenize(chunk))))

print(VECTOR_DB[0])

('On average, cats spend 2/3 of every day sleeping. That means a nine-year-old cat has been awake for only three years of its life.\nUnlike dogs, cats do not have a sweet tooth. Scientists believe this is due to a mutation in a key taste receptor.\nWhen a cat chases its prey, it keeps its head level. Dogs and humans bob their heads up and down.\nThe technical term for a cat’s hairball is a “bezoar.”\nA group of cats is called a “clowder.”\nFemale cats tend to be right pawed, while male cats are more often left pawed. Interestingly, while 90% of humans are right handed, the remaining 10% of lefties also tend to be male.\nA cat can’t climb head first down a tree because every claw on a cat’s paw points the same way. To get down from a tree, a cat must back down.\nCats make about 100 different sounds. Dogs make only about 10.\nA cat’s brain is biologically more similar to a human brain than it is to a dog’s. Both humans and cats have identical regions in their brains that are responsible 

In [125]:
# Retrieve
import numpy as np

# Cosine similarity to calculate similarity between input and each chunks
def cosine_similarity(v1, v2):
    vec1 = np.array(v1)
    vec2 = np.array(v2)
    multiply = np.dot(vec1, vec2)
    norm_v1 = np.linalg.norm(vec1)
    norm_v2 = np.linalg.norm(vec2)
    return multiply/(norm_v1*norm_v2)

def retrieve_top_k(input, top_k):
    similarity_list = []
    embed_input = get_embeded(input)
    for chunk in VECTOR_DB:
        similarity_list.append((chunk[0],cosine_similarity(embed_input, chunk[1])))

    sort_list = sorted(similarity_list, key=lambda x: x[1], reverse=True)

    return sort_list[0:top_k]

# text = "I love cat"
# print(retrieve_top_k(text, 2))



In [132]:
# Generation
from google import genai

# Get User Input
user_input = input("Enter your question? ")

# Compare embeded input to each chunks
compare_list = retrieve_top_k(user_input, 3)

# print(compare_list[1])
# print(compare_list[0][0])

client = genai.Client(api_key=GEMINI_API)

prompt_text = f"""Answer strictly based on this information: 
- `{compare_list[0][0]}`  
Just give the direct answer related to this question `{user_input}`"""

response = client.models.generate_content(
    model="gemini-2.0-flash", 
    contents=prompt_text
)

print("Prompt sent to API:")
print(prompt_text)  # This prints the actual prompt sent to Gemini

print("\nResponse from API:\n")
print(response.text)  # Prints the generated response






Prompt sent to API:
Answer strictly based on this information: 
- ` owners move far from its home, the cat can’t find them.
Cats bury their feces to cover their trails from predators.
Cats sleep 16 to 18 hours per day. When cats are asleep, they are still alert to incoming stimuli. If you poke the tail of a sleeping cat, it will respond accordingly.
Besides smelling with their nose, cats can smell with an additional organ called the Jacobson’s organ, located in the upper surface of the mouth.
The chlorine in fresh tap water irritates sensitive parts of the cat’s nose. Let tap water sit for 24 hours before giving it to a cat.`  
Just give the direct answer related to this question `how many hours did cats used to sleep ?`

Response from API:

Cats sleep 16 to 18 hours per day.

