# ScienceSage Sanity Check Notebook

This notebook verifies that:

1. Environment check
2. Text chunks exist
3. Qdrant embeddings were created
4. Qdrant connection and data check
5. Retrieval returns reasonable results

You can run this before starting the Streamlit app.

In [1]:
import sys
from pathlib import Path

project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

import json
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchAny, MatchValue
from sciencesage.config import OPENAI_API_KEY, QDRANT_URL, QDRANT_COLLECTION
from collections import Counter

[32m2025-09-27 16:55:17.120[0m | [1mINFO    [0m | [36msciencesage.config[0m:[36m<module>[0m:[36m86[0m - [1mConfiguration loaded.[0m


### 1️⃣ Environment Check

In [2]:
print(sys.executable)

/workspaces/ScienceSage/.venv/bin/python


In [3]:
print(sys.version)

3.12.1 (main, Jul 10 2025, 11:57:50) [GCC 13.3.0]


In [4]:
client = OpenAI(api_key=OPENAI_API_KEY)
qdrant = QdrantClient(url=QDRANT_URL)

### 2️⃣  Check processed chunks exist

In [5]:
chunks_file = Path('../data/processed/chunks.jsonl')
chunks = [json.loads(line) for line in chunks_file.open('r', encoding='utf-8')]
print(f"Total chunks: {len(chunks)}")
print("Sample chunk:")
print(chunks[0])

Total chunks: 2116
Sample chunk:
{'uuid': 'ae6fe91d-1fb3-599d-bc47-a2fc226295df', 'text': 'Discovery and exploration of the Solar System is observation, visitation, and increase in knowledge and understanding of Earth\'s "cosmic neighborhood". This includes the Sun, Earth and the Moon, the major planets Mercury, Venus, Mars, Jupiter, Saturn, Uranus, and Neptune, their satellites, as well as smaller bodies including comets, asteroids, and dust.\nIn ancient and medieval times, only objects visible to the naked eye—the Sun, the Moon, the five classical planets, and comets, along with phenomena now known to take place in Earth\'s atmosphere, like meteors and aurorae—were known. Ancient astronomers were able to make geometric observations with various instruments. The collection of precise observations in the early modern period and the invention of the telescope helped determine the overall structure of the Solar System. Telescopic observations resulted in the discovery of moons and rings 

### 3️⃣ Test embedding a query

In [6]:
query = "What missions have explored Mars?"
embedding_model = "text-embedding-3-small"
response = client.embeddings.create(model=embedding_model, input=query)
query_vector = response.data[0].embedding

print(f"Embedding length: {len(query_vector)}")
print(f"First 5 values: {query_vector[:5]}")
print(f"Type: {type(query_vector)}")

Embedding length: 1536
First 5 values: [-0.026839520782232285, -0.011170332320034504, 0.06702198833227158, -0.046744219958782196, -0.08664139360189438]
Type: <class 'list'>


### 4️⃣  Qdrant connection and data check

In [7]:
qdrant = QdrantClient(url=QDRANT_URL)
print(qdrant.get_collections())
points, _ = qdrant.scroll(collection_name=QDRANT_COLLECTION, limit=2, with_payload=True)
for p in points:
    print(p.payload)

# Count points in the "scientific_concepts" collection
print("scientific_concepts count:", qdrant.count("scientific_concepts", exact=True).count)

collections=[CollectionDescription(name='scientific_concepts')]
{'uuid': '001336fc-d972-5d80-a89a-9365095a704f', 'text': '[We] wanted to create something which felt comfortable within that canon of those science fiction films from the sort of late seventies to early eighties. For me, the Moon has this weird mythic nature to it. ... There is still a mystery to it. As a location, it bridges the gap between science fiction and science fact. We (humankind) have been there. It is something so close and so plausible and yet at the same time, we really don\'t know that much about it.\nThe director described the lack of romance in the Moon as a location, citing images from the Japanese lunar orbiter SELENE: "It\'s the desolation and emptiness of it ... it looks like some strange ball of clay in blackness. ... Look at photos and you\'ll think that they\'re monochrome. In fact, they\'re not. There simply are no primary colours." Jones made reference to the photography book Full Moon by Michael L

### 5️⃣ Test Qdrant similarity search

In [8]:
results = qdrant.query_points(
    collection_name=QDRANT_COLLECTION,
    query=query_vector,
    limit=3,
)

print("Top 3 retrieved chunks about Mars:")
if hasattr(results, "points") and results.points:
    for p in results.points:
        text = p.payload.get('text', '') if hasattr(p, 'payload') else ''
        print(f"ID: {getattr(p, 'id', 'N/A')}, text snippet: {text[:150]}...")
else:
    print("No chunks retrieved. Check your Qdrant collection and query.")

Top 3 retrieved chunks about Mars:
ID: c8d3a3bf-e790-549a-8276-7284453a7bbc, text snippet: The planet Mars has been explored remotely by spacecraft. Probes sent from Earth, beginning in the late 20th century, have yielded a large increase in...
ID: 45535663-50c8-5ea3-bb00-08cfd04fbe5a, text snippet: Current missions
On 10 March 2006, NASA's Mars Reconnaissance Orbiter (MRO) probe arrived in orbit to conduct a two-year science survey. The orbiter b...
ID: 2f171be0-31c3-56dd-a062-9ca2ca44c7f2, text snippet: Past missions
Starting in 1960, the Soviet space program launched a series of probes to Mars including the first intended (but unsuccessful) flybys an...
