# ScienceSage Sanity Check Notebook

This notebook verifies that:

1. Environment check
2. Text chunks exist
3. Qdrant embeddings were created
4. Qdrant connection and data check
5. Retrieval returns reasonable results

You can run this before starting the Streamlit app.

In [2]:
import sys
from pathlib import Path

project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

import json
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchAny, MatchValue
from sciencesage.config import OPENAI_API_KEY, QDRANT_URL, QDRANT_COLLECTION
from collections import Counter

[32m2025-09-21 19:18:36.343[0m | [1mINFO    [0m | [36msciencesage.config[0m:[36m<module>[0m:[36m87[0m - [1mConfiguration loaded.[0m


### 1️⃣ Environment Check

In [3]:
print(sys.executable)

/workspaces/ScienceSage/.venv/bin/python


In [4]:
print(sys.version)

3.12.1 (main, Jul 10 2025, 11:57:50) [GCC 13.3.0]


In [8]:
client = OpenAI(api_key=OPENAI_API_KEY)
qdrant = QdrantClient(url=QDRANT_URL)

### 2️⃣  Check processed chunks exist

In [4]:
chunks_file = Path('../data/chunks/chunks.jsonl')
chunks = [json.loads(line) for line in chunks_file.open('r', encoding='utf-8')]
print(f"Total chunks: {len(chunks)}")
print("Sample chunk:")
print(chunks[0])

Total chunks: 408
Sample chunk:
{'uuid': '54f51ef7-eb7b-5f86-80a1-0efec0a9d1b6', 'text': 'Discovery and exploration of the Solar System is observation, visitation, and increase in knowledge and understanding of Earth\'s "cosmic neighborhood". This includes the Sun, Earth and the Moon, the major planets Mercury, Venus, Mars, Jupiter, Saturn, Uranus, and Neptune, their satellites, as well as smaller bodies including comets, asteroids, and dust. In ancient and medieval times, only objects visible to the naked eye—the Sun, the Moon, the five classical planets, and comets, along with phenomena now known to take place in Earth\'s atmosphere, like meteors and aurorae—were known. Ancient astronomers were able to make geometric observations with various instruments. The collection of precise observations in the early modern period and the invention of the telescope helped determine the overall structure of the Solar System. Telescopic observations resulted in the discovery of moons and rings ar

### 3️⃣ Test embedding a query

In [9]:
query = "What missions have explored Mars?"
embedding_model = "text-embedding-3-small"
response = client.embeddings.create(model=embedding_model, input=query)
query_vector = response.data[0].embedding

print(f"Embedding length: {len(query_vector)}")
print(f"First 5 values: {query_vector[:5]}")
print(f"Type: {type(query_vector)}")

Embedding length: 1536
First 5 values: [-0.026839520782232285, -0.011170332320034504, 0.06702198833227158, -0.046744219958782196, -0.08664139360189438]
Type: <class 'list'>


### 4️⃣  Qdrant connection and data check

In [11]:
qdrant = QdrantClient(url=QDRANT_URL)
print(qdrant.get_collections())
points, _ = qdrant.scroll(collection_name=QDRANT_COLLECTION, limit=2, with_payload=True)
for p in points:
    print(p.payload)

# Count points in the "scientific_concepts" collection
print("scientific_concepts count:", qdrant.count("scientific_concepts", exact=True).count)

collections=[CollectionDescription(name='scientific_concepts')]
{'uuid': '0045255c-236a-5c2d-af07-1bbfecb4eca7', 'text': 'to protect the hub. Dust guards were mounted above the wheels. Each wheel had its own electric drive made by Delco, a brushed DC electric motor capable of 0.25 horsepower (190 W) at 10,000 rpm, attached to the wheel via an 80:1 harmonic drive, and a mechanical brake unit. In the case of drive failure, astronauts could remove pins to disengage the drive from the wheel, allowing the wheel to spin freely. Maneuvering capability was provided through the use of front and rear steering motors. Each series-wound DC steering motor was capable of 0.1 horsepower (75 W). The front and rear wheels could pivot in opposite directions to achieve a tight turning radius of 10 feet (3 m), or could be decoupled so only front or rear would be used for steering. The wheels were linked in Ackermann steering geometry, where the inside tires have a greater turn angle than the outside tires

### 5️⃣ Test Qdrant similarity search

In [12]:
results = qdrant.query_points(
    collection_name=QDRANT_COLLECTION,
    query=query_vector,
    limit=3,
)

print("Top 3 retrieved chunks about Mars:")
if hasattr(results, "points") and results.points:
    for p in results.points:
        text = p.payload.get('text', '') if hasattr(p, 'payload') else ''
        print(f"ID: {getattr(p, 'id', 'N/A')}, text snippet: {text[:150]}...")
else:
    print("No chunks retrieved. Check your Qdrant collection and query.")

Top 3 retrieved chunks about Mars:
ID: 92958c7f-6fe6-5c17-bf17-86c976bded7f, text snippet: The planet Mars has been explored remotely by spacecraft. Probes sent from Earth, beginning in the late 20th century, have yielded a large increase in...
ID: 80327091-b1a0-57bb-9a03-bb58d672aed5, text snippet: ice at the planet's south pole, while NASA had previously confirmed their presence at the north pole of Mars. The lander's fate remained a mystery unt...
ID: aaf9c722-4cbc-5b0b-ac83-15922c28dcda, text snippet: landers and 52,000 from the orbiters. The Viking landers recorded atmospheric pressures ranging from below 7 millibars (0.0068 bars) to over 10 millib...
