# ScienceSage Sanity Check Notebook

This notebook verifies that:

1. Text chunks exist
2. Qdrant embeddings were created
3. Retrieval returns reasonable results

You can run this before starting the Streamlit app.

In [7]:
import sys
from pathlib import Path

project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

import json
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue
from app.config import OPENAI_API_KEY, QDRANT_URL, QDRANT_COLLECTION

In [8]:
print(sys.executable)

/workspaces/ScienceSage/.venv/bin/python


In [9]:
client = OpenAI(api_key=OPENAI_API_KEY)
qdrant = QdrantClient(url=QDRANT_URL)

### 1️⃣ Check processed chunks

In [11]:
chunks_file = Path('../data/chunks/chunks.jsonl')
chunks = [json.loads(line) for line in chunks_file.open('r', encoding='utf-8')]
print(f"Total chunks: {len(chunks)}")
print("Sample chunk:")
print(chunks[0])

Total chunks: 214
Sample chunk:
{'id': 'nasa_causes_30d637499af3', 'uuid': '17af4ab7-888c-56c2-9cc5-ec6d1ee4defc', 'topic': 'nasa', 'source': 'nasa_causes', 'chunk_index': 0, 'text': 'Takeaways - The greenhouse effect is essential to life on Earth, but human-made emissions in the atmosphere are trapping and slowing heat loss to space. - Five key greenhouse gases are carbon dioxide, nitrous oxide, methane, chlorofluorocarbons, and water vapor. - While the Sun has played a role in past climate changes, the evidence shows the current warming cannot be explained by the Sun. Increasing Greenhouses Gases Are Warming the Planet Scientists attribute the global warming trend observed since the mid-20th century to the human expansion of the "greenhouse effect"1 — warming that results when the atmosphere traps heat radiating from Earth toward space. Life on Earth depends on energy coming from the Sun. About half the light energy reaching Earth\'s atmosphere passes through the air and clouds to th

### 2️⃣ Test embedding a query

In [13]:
query = "What is neuroplasticity?"
embedding_model = "text-embedding-3-small"

response = client.embeddings.create(model=embedding_model, input=query)
query_vector = response.data[0].embedding
print(f"Embedding length: {len(query_vector)}")

Embedding length: 1536


### 3️⃣ Test Qdrant similarity search

In [16]:
results = qdrant.query_points(
    collection_name=QDRANT_COLLECTION,
    query=query_vector,
    limit=3,
    query_filter=Filter(
        must=[FieldCondition(key="topic", match=MatchValue(value="neuroplasticity"))]
    ),
)

print("Top 3 retrieved chunks:")
if hasattr(results, "points") and results.points:
    for p in results.points:
        text = p.payload.get('text', '') if hasattr(p, 'payload') else ''
        print(f"ID: {getattr(p, 'id', 'N/A')}, text snippet: {text[:150]}...")
else:
    print("No chunks retrieved. Check your Qdrant collection and query.")

Top 3 retrieved chunks:
ID: 2b8442ef-57a5-5d2a-ac27-e7b0b50090e1, text snippet: Neuroplasticity Neuroplasticity, also known as neural plasticity or just plasticity, is the ability of neural networks in the brain to change through ...
ID: 948a065c-69a7-5be4-be4e-a78b2e7414cc, text snippet: Clinton Woosley. The experiment was based on observation of what occurred in the brain when one peripheral nerve was cut and subsequently regenerated....
ID: 57b1244a-bb34-55dd-98a4-2d0068dffe35, text snippet: Up until the 1970s, neuroscientists believed that the brain's structure and function was essentially fixed throughout adulthood.[23] While the brain w...
