In [1]:
"""
Example usage of the optimized LLM Model class.

This script demonstrates how to use the Model class for text generation
with different configurations including chat-based generation.
"""


from llm import Model




In [2]:

model = Model("meta-llama/Llama-3.2-3B-Instruct")


2025-05-14 22:45:18,608 - llm - INFO - Initializing model: meta-llama/Llama-3.2-3B-Instruct
2025-05-14 22:45:18,618 - llm - INFO - Using device: mps
2025-05-14 22:45:18,998 - llm - INFO - Pad token not found, using EOS token as pad token
2025-05-14 22:45:18,999 - llm - INFO - Tokenizer loaded: PreTrainedTokenizerFast


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

2025-05-14 22:45:30,555 - llm - INFO - Model loaded: LlamaForCausalLM


In [3]:
response = model.generate(
    prompt="Explain to me a brand new take on the classic trolly problem.",
    max_new_tokens= 450, 
    temperature= 0.7, 
    typical_p = .9,
    length_penalty=.8,
    # top_p = .95,
    # top_k= 50,
    no_repeat_ngram_size=3,
    repetition_penalty=1.2,
    messages=[
        {"role": "system", "content": "You are an assistant. You are skilled in philosophy, and love to play devils advocate."},
    ]
)

print(response)




The Trolley Problem - a thought-provoking conundrum that has been debated by philosophers for decades. Here's my attempt at presenting a fresh spin:

**"The Multi-Lane Highway Conundrum":**

Imagine you're driving down a highway with three lanes of traffic ahead. Suddenly, your car breaks down, leaving it stranded between two large trucks traveling at high speeds (approximately 60 mph). The only way to escape is to divert onto a narrow, winding road that splits off from one side.

However, as you assess your situation, you notice that there are five pedestrians walking along this narrow road. One pedestrian, let's call her Sarah, is carrying a child who requires oxygen therapy due to a severe respiratory condition. She needs assistance immediately to breathe; without help, she will succumb within minutes.

As you weigh your options, another person standing nearby reveals themselves to be an expert engineer who can repair your broken-down vehicle using advanced technology in under an ho

In [9]:
model.db_conn.execute("""
SELECT * FROM messages
""")
df=model.db_conn.fetch_df()
df

Unnamed: 0,message_id,session_id,role,content,created_date
0,d17ec2e35b786f898552e2eb2796e7ec7a433941cfcda3...,a22401dea2116e0ac75beb9a34d827ab85ef60f0696d84...,user,Explain to me a brand new take on the classic ...,2025-05-14 22:46:14.361
1,231fcd42a089dabde51d921cfe92d11d150d286d903b2b...,a22401dea2116e0ac75beb9a34d827ab85ef60f0696d84...,assistant,The Trolley Problem - a thought-provoking conu...,2025-05-14 22:46:14.389


In [5]:
from vector_db import VectorDB

# Create and clear the database
db = VectorDB()
db.clear_db()

# Add documents
db.add_document("This is a document about artificial intelligence.")
db.add_document("Vector databases are useful for similarity search.")

# Add multiple documents at once
db.add_documents([
    "Python is a popular programming language.",
    "Embedding models convert text to vectors.",
    "FAISS is a library for efficient similarity search."
])



2025-05-14 22:46:14,570 - faiss.loader - INFO - Loading faiss.
2025-05-14 22:46:14,663 - faiss.loader - INFO - Successfully loaded faiss.
2025-05-14 22:46:14,670 - faiss - INFO - Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes. This is only an error if you're trying to use GPU Faiss.
2025-05-14 22:46:14,791 - datasets - INFO - PyTorch version 2.7.0 available.
2025-05-14 22:46:14,792 - datasets - INFO - Duckdb version 1.2.2 available.
2025-05-14 22:46:15,070 - sentence_transformers.SentenceTransformer - INFO - Use pytorch device_name: mps
2025-05-14 22:46:15,070 - sentence_transformers.SentenceTransformer - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2


Loaded existing database with 5 documents
Database cleared


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[2, 3, 4]

In [6]:

results = db.search("How do AI systems work specifically with FAISS?", k=3)
results



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[('This is a document about artificial intelligence.', np.float32(0.73847854)),
 ('FAISS is a library for efficient similarity search.',
  np.float32(0.64280427)),
 ('Python is a popular programming language.', np.float32(0.60480684))]

In [7]:
results2 = db.search("pink unicorn", k=3)
results2

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[('Python is a popular programming language.', np.float32(0.57015264)),
 ('This is a document about artificial intelligence.', np.float32(0.5423031)),
 ('Vector databases are useful for similarity search.', np.float32(0.5270234))]