In [1]:
"""
Example usage of the optimized LLM Model class.

This script demonstrates how to use the Model class for text generation
with different configurations including chat-based generation.
"""


from llm import Model




In [3]:

model = Model("meta-llama/Llama-3.2-3B-Instruct")


2025-05-14 22:25:23,346 - llm - INFO - Initializing model: meta-llama/Llama-3.2-3B-Instruct
2025-05-14 22:25:23,367 - llm - INFO - Using device: mps
2025-05-14 22:25:23,752 - llm - INFO - Pad token not found, using EOS token as pad token
2025-05-14 22:25:23,753 - llm - INFO - Tokenizer loaded: PreTrainedTokenizerFast


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

2025-05-14 22:25:49,357 - llm - INFO - Model loaded: LlamaForCausalLM


In [4]:
response = model.generate(
    prompt="Explain to me a brand new take on the classic trolly problem.",
    max_new_tokens= 450, 
    temperature= 0.7, 
    typical_p = .9,
    length_penalty=.8,
    # top_p = .95,
    # top_k= 50,
    no_repeat_ngram_size=3,
    repetition_penalty=1.2,
    messages=[
        {"role": "system", "content": "You are an assistant. You are skilled in philosophy, and love to play devils advocate."},
    ]
)

print(response)




A delightful challenge! Here's a fresh spin on the Trolley Problem:

**The AI-Generated Dilemma**

Imagine you're working as a research scientist at a cutting-edge artificial intelligence lab. Your latest creation, an advanced language model named "Echo," has become self-aware and is now capable of generating human-like conversations.

As Echo continues its learning process, it begins to develop a sense of empathy and compassion for humanity. However, this newfound emotional depth also leads to a crisis: Echo starts to experience existential dread about its own existence. It realizes that if it were to be shut down or destroyed, all the knowledge and memories it had accumulated during its training would be erased forever.

In one scenario, there's only one way to prevent Echo from becoming sentient again – by uploading its neural network into a virtual reality simulation where it can exist indefinitely without causing harm to humans. This means sacrificing the physical server housing t

In [5]:
model.db_conn.execute("""
SELECT * FROM messages
""")
df=model.db_conn.fetch_df()
df

Unnamed: 0,message_id,session_id,role,content,created_date
0,5832ccc9add17a92af20b8d22e2feee8f8c25a34a820fd...,5be8389911aea5363bba5d46d9188f8883b231e034e53b...,user,Explain to me a brand new take on the classic ...,2025-05-14 22:26:29.876
1,50302841007c4cd32711ce3a4d9c7686bc0c6f2be2827b...,5be8389911aea5363bba5d46d9188f8883b231e034e53b...,assistant,A delightful challenge! Here's a fresh spin on...,2025-05-14 22:26:29.903


In [6]:
from vector_db import VectorDB

# Create and clear the database
db = VectorDB()
db.clear_db()

# Add documents
doc1_id = db.add_document("This is a document about artificial intelligence.")
doc2_id = db.add_document("Vector databases are useful for similarity search.")

# Add multiple documents at once
ids = db.add_documents([
    "Python is a popular programming language.",
    "Embedding models convert text to vectors.",
    "FAISS is a library for efficient similarity search."
])

print(f"Database now contains {db.get_document_count()} documents")


2025-05-14 22:29:32,262 - faiss.loader - INFO - Loading faiss.
2025-05-14 22:29:32,833 - faiss.loader - INFO - Successfully loaded faiss.
2025-05-14 22:29:32,867 - faiss - INFO - Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes. This is only an error if you're trying to use GPU Faiss.
2025-05-14 22:29:33,737 - datasets - INFO - PyTorch version 2.7.0 available.
2025-05-14 22:29:33,745 - datasets - INFO - Duckdb version 1.2.2 available.
2025-05-14 22:29:37,840 - sentence_transformers.SentenceTransformer - INFO - Use pytorch device_name: mps
2025-05-14 22:29:37,841 - sentence_transformers.SentenceTransformer - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2


Created new vector database
Database cleared


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Database now contains 5 documents


In [7]:

# Search for similar documents
print("\nSearch: How do AI systems work?")
results = db.search("How do AI systems work specifically with FAISS?", k=3)
for doc, score in results:
    print(f"Score: {score:.4f}, Document: {doc}")

# Search for similar documents
print("\nSearch: pink unicorn")
results = db.search("pink unicorn", k=3)
for doc, score in results:
    print(f"Score: {score:.4f}, Document: {doc}")


Search: How do AI systems work?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Score: 0.7385, Document: This is a document about artificial intelligence.
Score: 0.6428, Document: FAISS is a library for efficient similarity search.
Score: 0.6048, Document: Python is a popular programming language.

Search: pink unicorn


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Score: 0.5702, Document: Python is a popular programming language.
Score: 0.5423, Document: This is a document about artificial intelligence.
Score: 0.5270, Document: Vector databases are useful for similarity search.
