# AstroRAG Demo (Toy Index)

Lightweight demo that searches a tiny toy index (no external models required).

Try queries like:

- `"dark matter halos"`
- `"fast radio bursts"`
- `"gravitational wave background"`
- `"exoplanet atmosphere"`


In [3]:
!pip install numpy

Collecting numpy
  Downloading numpy-2.3.3-cp312-cp312-win_amd64.whl.metadata (60 kB)
Downloading numpy-2.3.3-cp312-cp312-win_amd64.whl (12.8 MB)
   ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
   -------------- ------------------------- 4.7/12.8 MB 40.9 MB/s eta 0:00:01
   ---------------------------------------  12.6/12.8 MB 35.9 MB/s eta 0:00:01
   ---------------------------------------- 12.8/12.8 MB 33.4 MB/s  0:00:00
Installing collected packages: numpy
Successfully installed numpy-2.3.3


In [9]:
import json, numpy as np
from pathlib import Path
import os
print(os.getcwd())

IDX_DIR = Path('toy_indexes')
EMB = np.load(IDX_DIR / 'embeddings.npy')  # (N, D)
META = [json.loads(l) for l in open(IDX_DIR / 'meta.jsonl')]
MODEL = open(IDX_DIR / 'model.txt').read().strip()
print(f'Loaded {len(META)} passages with embedding dim={EMB.shape[1]}')
print('Model:', MODEL)

keywords = [
    'dark','matter','halo','dwarf','galaxy','fast','radio','burst','magnetar',
    'exoplanet','atmosphere','transmission','spectra','gravitational','wave','background',
    'pulsar','supermassive','binary','cosmic','ray','star','formation','alma','disk',
    'ring','gap','bao','supernova','dark energy','equation','state','agn','cluster',
    'feedback','kilonova','r-process','opacity','neutron','merger'
]

def embed_query(q: str):
    v = np.zeros(len(keywords), dtype=np.float32)
    t = q.lower()
    for i, kw in enumerate(keywords):
        if kw in t:
            v[i] = 1.0
    n = np.linalg.norm(v)
    return v / n if n > 0 else v

def search(q: str, topk=5):
    qv = embed_query(q)
    if np.linalg.norm(qv) == 0:
        print('Query has no matching keywords in toy vocab; try another phrase.')
        return
    scores = (EMB @ qv)
    idxs = np.argsort(-scores)[:topk]
    print(f'\nQuery: {q}\n')
    for rank, i in enumerate(idxs, 1):
        rec = META[i]
        print(f'[{rank}] score={scores[i]:.3f}  {rec["title"]} ({rec["category"]})')
        txt = rec['passage']
        print((txt[:220] + '...') if len(txt) > 220 else txt)
        print('-'*80)

# Example searches
search('dwarf galaxies dark')
# search('fast radio bursts magnetars')
# search('gravitational wave background')
# search('exoplanet atmosphere spectra')


c:\Users\jdsto\OneDrive\Desktop\repos\AstroRAG\examples
Loaded 10 passages with embedding dim=40
Model: demo-keyword-embedding (toy, not SentenceTransformer)

Query: dwarf galaxies dark

[1] score=0.707  Dark matter halos in dwarf galaxies (astro-ph.GA)
We analyze the density profiles of dark matter halos in nearby dwarf galaxies and compare core vs cusp scenarios using stellar kinematics.
--------------------------------------------------------------------------------
[2] score=0.316  Dark energy equation of state from BAO and SNe (astro-ph.CO)
Combining baryon acoustic oscillations with Type Ia supernova distances we obtain constraints on a time-varying dark energy equation of state.
--------------------------------------------------------------------------------
[3] score=0.000  Exoplanet atmospheric retrieval with transmission spectra (astro-ph.EP)
Using near-infrared transmission spectroscopy we retrieve molecular abundances and cloud properties in warm Neptune exoplanet atmospher