# ScienceSage Sanity Check Notebook

This notebook verifies that:

1. Text chunks exist
2. Qdrant embeddings were created
3. Retrieval returns reasonable results

You can run this before starting the Streamlit app.

In [1]:
import sys
from pathlib import Path

project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

import json
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchAny, MatchValue
from sciencesage.config import OPENAI_API_KEY, QDRANT_URL, QDRANT_COLLECTION
from collections import Counter

[32m2025-09-21 15:47:31.311[0m | [1mINFO    [0m | [36msciencesage.config[0m:[36m<module>[0m:[36m88[0m - [1mConfiguration loaded.[0m


In [2]:
print(sys.executable)

/workspaces/ScienceSage/.venv/bin/python


In [3]:
client = OpenAI(api_key=OPENAI_API_KEY)
qdrant = QdrantClient(url=QDRANT_URL)

### 1️⃣ Check processed chunks

In [4]:
chunks_file = Path('../data/chunks/chunks.jsonl')
chunks = [json.loads(line) for line in chunks_file.open('r', encoding='utf-8')]
print(f"Total chunks: {len(chunks)}")
print("Sample chunk:")
print(chunks[0])

Total chunks: 408
Sample chunk:
{'uuid': '54f51ef7-eb7b-5f86-80a1-0efec0a9d1b6', 'text': 'Discovery and exploration of the Solar System is observation, visitation, and increase in knowledge and understanding of Earth\'s "cosmic neighborhood". This includes the Sun, Earth and the Moon, the major planets Mercury, Venus, Mars, Jupiter, Saturn, Uranus, and Neptune, their satellites, as well as smaller bodies including comets, asteroids, and dust. In ancient and medieval times, only objects visible to the naked eye—the Sun, the Moon, the five classical planets, and comets, along with phenomena now known to take place in Earth\'s atmosphere, like meteors and aurorae—were known. Ancient astronomers were able to make geometric observations with various instruments. The collection of precise observations in the early modern period and the invention of the telescope helped determine the overall structure of the Solar System. Telescopic observations resulted in the discovery of moons and rings ar

### 2️⃣ Test embedding a query

In [5]:
query = "What missions have explored Mars?"
embedding_model = "text-embedding-3-small"
response = client.embeddings.create(model=embedding_model, input=query)
query_vector = response.data[0].embedding
print(f"Embedding length: {len(query_vector)}")

Embedding length: 1536


In [7]:
scroll = qdrant.scroll(
    collection_name=QDRANT_COLLECTION,
    limit=5,
    with_payload=True,
    with_vectors=False,
)
for p in scroll[0]:
    print(p.payload.get("uuid"), p.payload.get("title"), p.payload.get("chunk_index"))

0045255c-236a-5c2d-af07-1bbfecb4eca7 Lunar Roving Vehicle 3
013399c9-b202-54ff-8f41-67eb00f5323b History of Solar System formation and evolution hypotheses 2
0193f2d1-c2e8-50e9-9f1a-629656c5157b Project Boreas 0
01c53460-cd60-5b64-ad85-c50ee4dc0b0c Exploration of the Moon 3
01d47e22-77af-54c7-a4e5-137500ce64b0 Lunar water 2


### 3️⃣ Test Qdrant similarity search

In [9]:
results = qdrant.query_points(
    collection_name=QDRANT_COLLECTION,
    query=query_vector,
    limit=3,
)

print("Top 3 retrieved chunks about Mars:")
if hasattr(results, "points") and results.points:
    for p in results.points:
        text = p.payload.get('text', '') if hasattr(p, 'payload') else ''
        print(f"ID: {getattr(p, 'id', 'N/A')}, text snippet: {text[:150]}...")
else:
    print("No chunks retrieved. Check your Qdrant collection and query.")

Top 3 retrieved chunks about Mars:
ID: 92958c7f-6fe6-5c17-bf17-86c976bded7f, text snippet: The planet Mars has been explored remotely by spacecraft. Probes sent from Earth, beginning in the late 20th century, have yielded a large increase in...
ID: 80327091-b1a0-57bb-9a03-bb58d672aed5, text snippet: ice at the planet's south pole, while NASA had previously confirmed their presence at the north pole of Mars. The lander's fate remained a mystery unt...
ID: aaf9c722-4cbc-5b0b-ac83-15922c28dcda, text snippet: landers and 52,000 from the orbiters. The Viking landers recorded atmospheric pressures ranging from below 7 millibars (0.0068 bars) to over 10 millib...


In [10]:
client = QdrantClient("http://localhost:6333")
print(client.get_collections())

collections=[CollectionDescription(name='scientific_concepts')]


In [11]:
client.count("scientific_concepts", exact=True)

CountResult(count=408)