# Setup Verification Notebook

Run all cells in this notebook to verify your environment is set up correctly.

## What This Tests:
1. Python dependencies
2. Ollama LLM connection
3. Embedding model
4. ChromaDB vector database
5. Basic LLM interaction

If all cells run successfully, you're ready to start the project!

---
## Test 1: Import Required Libraries

In [21]:
import sys
print(f"Python version: {sys.version}")
print()

Python version: 3.13.9 | packaged by conda-forge | (main, Oct 22 2025, 23:12:41) [MSC v.1944 64 bit (AMD64)]



In [22]:
# Test imports
try:
    import chromadb
    print("✓ ChromaDB imported successfully")
except ImportError as e:
    print("✗ ChromaDB import failed:", e)
    print("  Install with: pip install chromadb")

try:
    from sentence_transformers import SentenceTransformer
    print("✓ Sentence Transformers imported successfully")
except ImportError as e:
    print("✗ Sentence Transformers import failed:", e)
    print("  Install with: pip install sentence-transformers")

try:
    import requests
    print("✓ Requests imported successfully")
except ImportError as e:
    print("✗ Requests import failed:", e)
    print("  Install with: pip install requests")

print("\n✓ All dependencies imported successfully!")

✓ ChromaDB imported successfully
✓ Sentence Transformers imported successfully
✓ Requests imported successfully

✓ All dependencies imported successfully!


---
## Test 2: Ollama LLM Connection

This test checks if Ollama is running and accessible.

In [23]:
import requests
import json

OLLAMA_URL = "http://127.0.0.1:11434"

# Test connection
try:
    response = requests.get(f"{OLLAMA_URL}/api/tags", timeout=5)
    if response.status_code == 200:
        print("✓ Ollama is running and accessible")
        
        # List available models
        models = response.json().get('models', [])
        if models:
            print(f"\nAvailable models:")
            for model in models:
                print(f"  - {model['name']}")
        else:
            print("\n⚠ No models found. You may need to pull a model.")
            print("  Run: docker exec ollama-mistral-offline ollama pull mistral")
    else:
        print(f"✗ Ollama returned status code: {response.status_code}")
except requests.exceptions.ConnectionError:
    print("✗ Cannot connect to Ollama at", OLLAMA_URL)
    print("\nMake sure:")
    print("  1. Docker Desktop is running")
    print("  2. Ollama container is started")
    print("  3. Check with: docker ps")
    print("\nSee COMMANDS.txt for Docker setup instructions.")
except Exception as e:
    print(f"✗ Error: {e}")

✓ Ollama is running and accessible

Available models:
  - mistral:7b


---
## Test 3: Embedding Model

This test loads a small embedding model and creates a test embedding.

In [24]:
from sentence_transformers import SentenceTransformer

print("Loading embedding model...")
print("(This may take a minute on first run)\n")

try:
    # Load a small, fast model
    model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
    print("✓ Embedding model loaded successfully")
    
    # Test embedding
    test_text = "Hello, this is a test sentence."
    embedding = model.encode(test_text)
    
    print(f"\nTest embedding:")
    print(f"  Text: '{test_text}'")
    print(f"  Embedding dimension: {len(embedding)}")
    print(f"  First 5 values: {embedding[:5]}")
    print("\n✓ Embedding model is working correctly!")
    
except Exception as e:
    print(f"✗ Error loading embedding model: {e}")
    print("\nThis could mean:")
    print("  1. Not enough RAM (need at least 4GB free)")
    print("  2. No internet connection (needed for first download)")
    print("  3. Disk space issue")

Loading embedding model...
(This may take a minute on first run)

✓ Embedding model loaded successfully

Test embedding:
  Text: 'Hello, this is a test sentence.'
  Embedding dimension: 384
  First 5 values: [ 0.0542045   0.09602847  0.02270406  0.10747128 -0.01486251]

✓ Embedding model is working correctly!


---
## Test 4: ChromaDB Vector Database

This test creates a temporary collection and tests basic operations.

In [25]:
import chromadb
import tempfile
import os

# Use a temporary directory for testing
temp_dir = tempfile.mkdtemp()

try:
    # Initialize ChromaDB
    client = chromadb.PersistentClient(path=temp_dir)
    print("✓ ChromaDB client initialized")
    
    # Create a test collection
    collection = client.create_collection(name="test_collection")
    print("✓ Test collection created")
    
    # Add some test documents
    collection.add(
        documents=["This is document 1", "This is document 2"],
        ids=["doc1", "doc2"]
    )
    print("✓ Test documents added")
    
    # Query the collection
    results = collection.query(
        query_texts=["document"],
        n_results=2
    )
    print("✓ Query executed successfully")
    
    # Clean up
    client.delete_collection(name="test_collection")
    print("\n✓ ChromaDB is working correctly!")
    
except Exception as e:
    print(f"✗ ChromaDB test failed: {e}")
finally:
    # Clean up temp directory
    import shutil
    try:
        shutil.rmtree(temp_dir)
    except:
        pass

✓ ChromaDB client initialized
✓ Test collection created
✓ Test documents added
✓ Query executed successfully

✓ ChromaDB is working correctly!


---
## Test 5: Basic LLM Interaction

This test sends a simple prompt to the LLM and gets a response.

In [26]:
import requests
import json

OLLAMA_URL = "http://127.0.0.1:11434"
MODEL_NAME = "mistral:7b"  # Change if using a different model

def test_llm(prompt: str) -> str:
    """Send a prompt to Ollama and get response."""
    try:
        response = requests.post(
            f"{OLLAMA_URL}/api/generate",
            json={
                "model": MODEL_NAME,
                "prompt": prompt,
                "stream": False
            },
            timeout=60
        )
        
        if response.status_code == 200:
            return response.json()['response']
        else:
            return f"Error: Status code {response.status_code}"
    
    except requests.exceptions.ConnectionError:
        return "Error: Cannot connect to Ollama. Is the container running?"
    except Exception as e:
        return f"Error: {e}"

# Test the LLM
print("Testing LLM with a simple question...")
print("(This may take 10-30 seconds)\n")

test_prompt = "Answer in one sentence: What is 2+2?"
response = test_llm(test_prompt)

print(f"Prompt: {test_prompt}")
print(f"Response: {response}")

if "Error" not in response:
    print("\n✓ LLM is responding correctly!")
else:
    print("\n✗ LLM test failed")
    print("\nCheck:")
    print("  1. Ollama container is running: docker ps")
    print("  2. Correct model is installed")
    print("  3. See COMMANDS.txt for troubleshooting")

Testing LLM with a simple question...
(This may take 10-30 seconds)

Prompt: Answer in one sentence: What is 2+2?
Response:  The sum of 2 and 2 is 4.

✓ LLM is responding correctly!


---
## Test 6: File System Access

This test verifies you can read the sample documents.

In [28]:
from pathlib import Path

DOCS_FOLDER = "./docs/text"

try:
    docs_path = Path(DOCS_FOLDER)
    
    if not docs_path.exists():
        print(f"✗ Documents folder not found: {DOCS_FOLDER}")
        print("  Make sure you're running this from the project root directory")
    else:
        print(f"✓ Documents folder found: {DOCS_FOLDER}")
        
        # List text files
        text_files = list(docs_path.glob("*.txt"))
        
        if text_files:
            print(f"\n✓ Found {len(text_files)} text file(s):")
            for file in text_files:
                # Get file size
                size = file.stat().st_size
                print(f"  - {file.name} ({size:,} bytes)")
            
            # Test reading one file
            test_file = text_files[0]
            with open(test_file, 'r', encoding='utf-8') as f:
                content = f.read()
            
            print(f"\n✓ Successfully read {test_file.name}")
            print(f"  Content preview: {content[:100]}...")
            print("\n✓ File system access is working!")
        else:
            print(f"\n⚠ No .txt files found in {DOCS_FOLDER}")
            print("  You'll need documents to test your RAG system")
            
except Exception as e:
    print(f"✗ Error accessing files: {e}")

✓ Documents folder found: ./docs/text

✓ Found 1 text file(s):
  - sample1.txt (135 bytes)

✓ Successfully read sample1.txt
  Content preview: Artificial intelligence is transforming the way organizations use data.
This is a sample document fo...

✓ File system access is working!


---
## Summary

If all tests passed, your environment is ready!

### Next Steps:
1. Read `STUDENT_PROJECT_GUIDE.md` for assignment details
2. Prepare your own document collection
3. Start working on the TODO tasks

### If Any Tests Failed:
- Check the error messages above
- See `docker_starter.md` for Docker setup
- See `COMMANDS.txt` for Docker commands
- Ask for help if you're stuck

**Good luck with your project!**

In [31]:
# Mini-RAG: embed → store → retrieve → generate (Ollama)
from pathlib import Path
import chromadb
from sentence_transformers import SentenceTransformer
import requests, textwrap, os, json

# --- Config ---
DOCS_FOLDER = Path("./docs/text")
OLLAMA_URL = "http://127.0.0.1:11434"
MODEL_NAME = "mistral:7b"  # confirmed from /api/tags
COLLECTION_NAME = "rag_demo"

# --- Load/embed model ---
print("Loading embedding model...")
embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# --- Collect docs ---
files = sorted(DOCS_FOLDER.glob("*.txt"))
assert files, f"No .txt files in {DOCS_FOLDER}"
docs = [f.read_text(encoding="utf-8") for f in files]
ids = [f.name for f in files]
print(f"Loaded {len(files)} file(s):", [f.name for f in files])

# --- Chroma: create/reset collection ---
client = chromadb.PersistentClient(path="./.chroma")  # persisted locally
try:
    client.delete_collection(COLLECTION_NAME)
except Exception:
    pass
col = client.create_collection(name=COLLECTION_NAME)
print("Chroma collection ready.")

# --- Add embeddings ---
print("Embedding and adding to Chroma...")
embeddings = embedder.encode(docs, convert_to_numpy=True).tolist()
col.add(documents=docs, embeddings=embeddings, ids=ids)
print("Indexed documents.")

# --- Simple retriever ---
def retrieve(query, k=3):
    q_emb = embedder.encode([query], convert_to_numpy=True).tolist()
    res = col.query(query_embeddings=q_emb, n_results=k)
    hits = list(zip(res["ids"][0], res["documents"][0], res["distances"][0]))
    return hits

# --- LLM call (Ollama /api/generate) ---
def ollama_generate(prompt: str) -> str:
    r = requests.post(
        f"{OLLAMA_URL}/api/generate",
        json={"model": MODEL_NAME, "prompt": prompt, "stream": False},
        timeout=120,
    )
    r.raise_for_status()
    return r.json()["response"].strip()

# --- Ask a question ---
question = "Summarize the key ideas from the documents in two sentences."
hits = retrieve(question, k=3)
context = "\n\n---\n\n".join([f"[{i+1}] {h[1]}" for i, h in enumerate(hits)])

prompt = f"""Use the context to answer the question concisely.

Question: {question}

Context:
{context}

Answer:"""

print("\nRetrieval results (top 3):")
for i, (doc_id, _, dist) in enumerate(hits, 1):
    print(f"  {i}. {doc_id}  (distance={dist:.4f})")

print("\nQuerying Mistral via Ollama...")
answer = ollama_generate(prompt)
print("\n=== Answer ===\n", textwrap.fill(answer, width=100))


Loading embedding model...
Loaded 1 file(s): ['sample1.txt']
Chroma collection ready.
Embedding and adding to Chroma...
Indexed documents.

Retrieval results (top 3):
  1. sample1.txt  (distance=1.4809)

Querying Mistral via Ollama...

=== Answer ===
 1. The first document suggests that artificial intelligence (AI) is significantly changing how
organizations manage and utilize data. 2. This statement was made in a document meant to test local
RAG integration, which implies it may not directly relate to AI's broader impacts or applications
but rather its specific role within the organization testing the integration.
