In [1]:
from dotenv import load_dotenv

# load environment variables from .env file
assert load_dotenv(), 'Failed to load environment variables'

In [2]:
import os

from openai import OpenAI

# deepseek for llm client
client = OpenAI(base_url='https://api.deepseek.com', api_key=os.getenv('DEEPSEEK_API_KEY'))

# test the client
messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': 'Say this is a test.'},
]
response = client.chat.completions.create(model='deepseek-chat', messages=messages)
print(response.choices[0].message.content)

Got it! Let me know how I can assist you with this test. ðŸ˜Š


In [3]:
# ducument for embeddings
DOCUMENT = 'The Olympic Games are an international multi-sport event held every four years, bringing together athletes from around the world to compete in a wide range of sports. Originating in ancient Greece around 776 BCE, the modern Olympics were revived in 1896 by Pierre de Coubertin. The Games are divided into the Summer and Winter Olympics, with each occurring every four years in alternating cycles. They promote values of excellence, friendship, and respect, fostering unity and sportsmanship among nations. The Olympics feature iconic traditions such as the Olympic flame, the opening and closing ceremonies, and the awarding of gold, silver, and bronze medals.'

Before embedding, documents which are too large should be chunked for better retrieval. Several tools can be used for chunking.
* For LLM-based apps: `Tiktoken`
* For semantic chunking: `NLTK`
* For simple fix-size chunking: `Textwrap`
* For token-aware chunking: `Hugging Face Tokenizers`

In [4]:
import chromadb

# chromedb for vector store
vector_store = chromadb.Client()
collection = vector_store.get_or_create_collection('olympics')

In [5]:
# embed the document
collection.upsert(ids='intro', documents=DOCUMENT)

In [6]:
# peek the collection
print(collection.peek())

{'ids': ['intro'], 'embeddings': array([[ 1.79688800e-02,  1.18339844e-01,  6.72155842e-02,
        -7.08082020e-02, -4.75794729e-03,  2.44840644e-02,
        -7.89013319e-03, -6.69706240e-02, -2.07181112e-03,
         3.21410187e-02, -1.26450405e-01,  5.10243326e-02,
        -4.60675955e-02,  5.82205541e-02,  1.88843417e-03,
        -3.16090584e-02, -5.11947274e-02,  6.29511476e-02,
        -7.85850063e-02, -6.17706217e-02,  9.65088084e-02,
        -1.02336958e-01, -6.96366951e-02, -6.96109794e-03,
        -7.27820471e-02,  8.85843933e-02, -5.08588366e-02,
         4.81406413e-03,  4.74892417e-03,  1.27982005e-01,
        -5.03585525e-02, -4.03356180e-02,  3.19394246e-02,
         1.70302726e-02, -5.66602983e-02,  1.06500849e-01,
        -2.88458522e-02, -2.86371671e-02, -6.58168569e-02,
         7.74316415e-02,  1.14370380e-02, -1.39400680e-02,
         2.39230413e-02,  1.29687097e-02, -2.02092272e-03,
         1.03467621e-01,  1.33820921e-02, -5.74484700e-03,
        -2.58494318e-02

In [7]:
# retrieve the document
results = collection.query(query_texts='introduction of the olympics', n_results=1)
print(f'Query Results: {results}')

Query Results: {'ids': [['intro']], 'embeddings': None, 'documents': [['The Olympic Games are an international multi-sport event held every four years, bringing together athletes from around the world to compete in a wide range of sports. Originating in ancient Greece around 776 BCE, the modern Olympics were revived in 1896 by Pierre de Coubertin. The Games are divided into the Summer and Winter Olympics, with each occurring every four years in alternating cycles. They promote values of excellence, friendship, and respect, fostering unity and sportsmanship among nations. The Olympics feature iconic traditions such as the Olympic flame, the opening and closing ceremonies, and the awarding of gold, silver, and bronze medals.']], 'uris': None, 'data': None, 'metadatas': [[None]], 'distances': [[0.6842873096466064]], 'included': [<IncludeEnum.distances: 'distances'>, <IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}


In [8]:
# ask the LLM with the context retrieved from vector store
context = results['documents'][0][0]
prompt = 'You are a hepful assistant. Answer to user accroding to the following context as your learned knowledge.\n\nContext: {context}'
messages = [
    {'role': 'system', 'content': prompt},
    {'role': 'user', 'content': 'What are the Olympic Games?'},
]
response = client.chat.completions.create(model='deepseek-reasoner', messages=messages)
print(response.choices[0].message.content)

The Olympic Games are a premier international multi-sport event featuring Summer and Winter editions, where thousands of athletes from over 200 nations compete. Regarded as the world's foremost sports competition, they are held every four years, with Summer and Winter Games alternating every two years. 

Inspired by the ancient Olympic Games in Olympia, Greece (8th century BC to 4th century AD), the modern Olympics were revived by Baron Pierre de Coubertin, who founded the International Olympic Committee (IOC) in 1894. The first modern Games took place in 1896. Governed by the IOC, the Olympics have expanded to include the Winter Olympics (for snow and ice sports), the Paralympics (for athletes with disabilities), and the Youth Olympic Games (for athletes aged 14â€“18). 

The Games also significantly impact host cities, influencing their economic, social, and environmental landscapes. Over time, the Olympics have evolved to reflect changing global contexts while maintaining their core 

In [9]:
# reasoning content of reasoning model
print(response.choices[0].message.reasoning_content)

Okay, I need to answer the question "What are the Olympic Games?" based on the provided context. Let me first read through the context carefully.

The context mentions that the Olympic Games are a major international multi-sport event with summer and winter editions, where thousands of athletes participate. They're considered the world's foremost sports competition with over 200 nations. The Games are held every four years, alternating between Summer and Winter every two years. Their creation was inspired by the ancient Olympic Games in Olympia, Greece, from the 8th century BC to the 4th century AD. Baron Pierre de Coubertin founded the International Olympic Committee (IOC) in 1894, leading to the first modern Games in 1896. The IOC governs the Olympic Movement, and the Games have evolved with changes like the Winter Olympics, Paralympics, and Youth Olympics. Also, the Olympics influence host cities' economic, social, and environmental aspects.

Alright, so the user is asking for an ex