# Chroma DB ingestion and Q&A

## Ingestion

In [1]:
import chromadb
chroma_client = chromadb.Client()

In [2]:
tourism_collection = chroma_client.create_collection(
    name="tourism_collection")

In [3]:
tourism_collection.add(
    documents=[
        "Paestum, Greek Poseidonia, ancient city in southern Italy near the west coast, 22 miles (35 km) southeast of modern Salerno and 5 miles (8 km) south of the Sele (ancient Silarus) River. Paestum is noted for its splendidly preserved Greek temples.", 
        "Poseidonia was probably founded about 600 BC by Greek colonists from Sybaris, along the Gulf of Taranto, and it had become a flourishing town by 540, judging from its temples. After many years’ resistance the city came under the domination of the Lucanians (an indigenous Italic people) sometime before 400 BC, after which its name was changed to Paestum. Alexander, the king of Epirus, defeated the Lucanians at Paestum about 332 BC, but the city remained Lucanian until 273, when it came under Roman rule and a Latin colony was founded there. The city supported Rome during the Second Punic War. The locality was still prosperous during the early years of the Roman Empire, but the gradual silting up of the mouth of the Silarus River eventually created a malarial swamp, and Paestum was finally deserted after being sacked by Muslim raiders in AD 871. The abandoned site’s remains were rediscovered in the 18th century.",
        "The ancient Greek part of Paestum consists of two sacred areas containing three Doric temples in a remarkable state of preservation. During the ensuing Roman period a typical forum and town layout grew up between the two ancient Greek sanctuaries. Of the three temples, the Temple of Athena (the so-called Temple of Ceres) and the Temple of Hera I (the so-called Basilica) date from the 6th century BC, while the Temple of Hera II (the so-called Temple of Neptune) was probably built about 460 BC and is the best preserved of the three. The Temple of Peace in the forum is a Corinthian-Doric building begun perhaps in the 2nd century BC. Traces of a Roman amphitheatre and other buildings, as well as intersecting main streets, have also been found. The circuit of the town walls, which are built of travertine blocks and are 15–20 feet (5–6 m) thick, is about 3 miles (5 km) in circumference. In July 1969 a farmer uncovered an ancient Lucanian tomb that contained Greek frescoes painted in the early classical style. Paestum’s archaeological museum contains these and other treasures from the site."
    ],
    metadatas=[
        {"source": "https://www.britannica.com/place/Paestum"}, 
        {"source": "https://www.britannica.com/place/Paestum"},
        {"source": "https://www.britannica.com/place/Paestum"}
    ],
    ids=["paestum-br-01", "paestum-br-02", "paestum-br-03"]
)

## Q&A

In [4]:
results = tourism_collection.query(
    query_texts=["How many Doric temples are in Paestum"],
    n_results=1
)
print(results)

{'ids': [['paestum-br-03']], 'embeddings': None, 'documents': [['The ancient Greek part of Paestum consists of two sacred areas containing three Doric temples in a remarkable state of preservation. During the ensuing Roman period a typical forum and town layout grew up between the two ancient Greek sanctuaries. Of the three temples, the Temple of Athena (the so-called Temple of Ceres) and the Temple of Hera I (the so-called Basilica) date from the 6th century BC, while the Temple of Hera II (the so-called Temple of Neptune) was probably built about 460 BC and is the best preserved of the three. The Temple of Peace in the forum is a Corinthian-Doric building begun perhaps in the 2nd century BC. Traces of a Roman amphitheatre and other buildings, as well as intersecting main streets, have also been found. The circuit of the town walls, which are built of travertine blocks and are 15–20 feet (5–6 m) thick, is about 3 miles (5 km) in circumference. In July 1969 a farmer uncovered an ancien

In [5]:
results = tourism_collection.query(
    query_texts=["How many Doric temples are in Paestum"],
    n_results=3
)
print(results)

{'ids': [['paestum-br-03', 'paestum-br-01', 'paestum-br-02']], 'embeddings': None, 'documents': [['The ancient Greek part of Paestum consists of two sacred areas containing three Doric temples in a remarkable state of preservation. During the ensuing Roman period a typical forum and town layout grew up between the two ancient Greek sanctuaries. Of the three temples, the Temple of Athena (the so-called Temple of Ceres) and the Temple of Hera I (the so-called Basilica) date from the 6th century BC, while the Temple of Hera II (the so-called Temple of Neptune) was probably built about 460 BC and is the best preserved of the three. The Temple of Peace in the forum is a Corinthian-Doric building begun perhaps in the 2nd century BC. Traces of a Roman amphitheatre and other buildings, as well as intersecting main streets, have also been found. The circuit of the town walls, which are built of travertine blocks and are 15–20 feet (5–6 m) thick, is about 3 miles (5 km) in circumference. In July

# RAG from scratch

In [6]:
from openai import OpenAI
import getpass

OPENAI_API_KEY = getpass.getpass('Enter your OPENAI_API_KEY')

Enter your OPENAI_API_KEY ········


In [7]:
openai_client = OpenAI(api_key=OPENAI_API_KEY)

In [8]:
def query_vector_database(question):
    results = tourism_collection.query(
    query_texts=[question],
    n_results=1)

    results_text = results['documents'][0][0]

    return results_text

In [9]:
results_text = query_vector_database("How many Doric temples are in Paestum")
print(results_text)

The ancient Greek part of Paestum consists of two sacred areas containing three Doric temples in a remarkable state of preservation. During the ensuing Roman period a typical forum and town layout grew up between the two ancient Greek sanctuaries. Of the three temples, the Temple of Athena (the so-called Temple of Ceres) and the Temple of Hera I (the so-called Basilica) date from the 6th century BC, while the Temple of Hera II (the so-called Temple of Neptune) was probably built about 460 BC and is the best preserved of the three. The Temple of Peace in the forum is a Corinthian-Doric building begun perhaps in the 2nd century BC. Traces of a Roman amphitheatre and other buildings, as well as intersecting main streets, have also been found. The circuit of the town walls, which are built of travertine blocks and are 15–20 feet (5–6 m) thick, is about 3 miles (5 km) in circumference. In July 1969 a farmer uncovered an ancient Lucanian tomb that contained Greek frescoes painted in the earl

## Naive prompt implementation

In [15]:
def prompt_template(question, text):
    return f'Read the following text and answer this question: {question}. \nText: {text}'

In [16]:
def execute_llm_prompt(prompt_input):
    prompt_response = openai_client.chat.completions.create(
        model='gpt-5-nano',
        messages=[
         {"role": "system", "content": "You are an assistant for question-answering tasks."},
         {"role": "user", "content": prompt_input}
        ])
    return prompt_response

### Trick question

In [17]:
trick_question = "How many columns have the three temples got in total?"
tq_result_text = query_vector_database(trick_question)
tq_prompt = prompt_template(trick_question , tq_result_text)
tq_prompt_response = execute_llm_prompt(tq_prompt)
print(tq_prompt_response)

ChatCompletion(id='chatcmpl-CXXFdWgmhfYVN8ZrqbDl51sqFyCm3', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The text does not specify the number of columns for the three temples, so it cannot be determined from the given passage.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1762108677, model='gpt-5-nano-2025-08-07', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=2273, prompt_tokens=290, total_tokens=2563, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=2240, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))


## Safer prompt implementation

In [18]:
def prompt_template(question, text):
    return f'Use the following pieces of retrieved context to answer the question. Only use the retrieved context to answer the question. If you don\'t know the answer, or the answer is not contained in the retrieved context, just say that you don\'t know. Use three sentences maximum and keep the answer concise. \nQuestion: {question}\nContext: {text}. Remember: if you do not know, just say: I do not know. Do not make up an answer. For example do not say the three temples have got a total of three columns. \nAnswer:'

### Trick question

In [19]:
trick_question = "How many columns have the three temples got in total?"
tq_result_text = query_vector_database(trick_question)
tq_prompt = prompt_template(trick_question , tq_result_text)
tq_prompt_response = execute_llm_prompt(tq_prompt)
print(tq_prompt_response)

ChatCompletion(id='chatcmpl-CXXHXdUbuQn2fjLyl8YWuRPMS9Wo1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I do not know.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1762108795, model='gpt-5-nano-2025-08-07', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=206, prompt_tokens=382, total_tokens=588, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=192, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))


## Building a chatbot

In [20]:
def my_chatbot(question):
    results_text = query_vector_database(question) #A    
    prompt_input = prompt_template(question, 
                                   results_text) #B
    prompt_output = execute_llm_prompt(
        prompt_input) #C

    return prompt_output
#A Retrieve content from vector store
#B Create LLM prompt
#C Execute LLM prompt

In [21]:
question = """Let me know how many temples there
are in Paestum, who constructed them, and what 
architectural style they are"""
result = my_chatbot(question)
print(result)

ChatCompletion(id='chatcmpl-CXXHfSGHtttuDI8CjykXR4AHJjNXY', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The ancient Greek part of Paestum contains three temples, all built in the Doric style.  \nThe retrieved context does not specify who constructed them.  \nThey are the Temple of Athena (Temple of Ceres), the Temple of Hera I (Basilica), and the Temple of Hera II (Temple of Neptune).', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1762108803, model='gpt-5-nano-2025-08-07', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=1546, prompt_tokens=398, total_tokens=1944, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=1472, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))
