# 01 — Shared Embedding Spaces

The **Voyage 4 model family** (`voyage-4-large`, `voyage-4`, `voyage-4-lite`, `voyage-4-nano`) all share a single vector space.

This means you can:
- **Index** documents with `voyage-4-large` (highest accuracy)
- **Query** with `voyage-4-lite` (lowest latency / cost)
- …without ever re-indexing. Other providers require a full re-index if you change models.

**Steps:**
1. Embed listing descriptions with `voyage-4-large` and store vectors in MongoDB
2. Create a vector search index
3. Query with `voyage-4-lite` — same index, different model
4. Prove cross-model compatibility by querying with every model in the family

In [None]:
// ── Setup ────────────────────────────────────────────────────────────────────
import { MongoClient } from 'mongodb';

// ← Paste your VoyageAI API key here (get one at https://dash.voyageai.com)
const VOYAGE_API_KEY = 'pa-...';

const DOC_MODEL   = 'voyage-4-large';  // used to build the index
const QUERY_MODEL = 'voyage-4-lite';   // used at query time
const DIMS        = 1024;
const INDEX_NAME  = 'listing_vector_index';

const client = new MongoClient(process.env.MONGODB_URI!);
await client.connect();
const db  = client.db('voyage_lab');
const col = db.collection<{ _id: string; [key: string]: unknown }>('listings');

console.log('Connected. Listings:', await col.countDocuments());

## Step 1 — Embed documents with `voyage-4-large` and store in MongoDB

In [None]:
// ── Embed helper ─────────────────────────────────────────────────────────────
async function embed(texts: string[], model: string, inputType: 'document' | 'query'): Promise<number[][]> {
  const res = await fetch('https://api.voyageai.com/v1/embeddings', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${VOYAGE_API_KEY}` },
    body: JSON.stringify({ input: texts, model, input_type: inputType }),
  });
  if (!res.ok) throw new Error(await res.text());
  const json = await res.json() as { data: { embedding: number[] }[] };
  return json.data.map(d => d.embedding);
}

In [None]:
// ── Embed all listings (voyage-4-large, document side) ───────────────────────
const listings = await col.find({}, { projection: { _id: 1, description: 1 } }).toArray();
const BATCH = 64;
let done = 0;

for (let i = 0; i < listings.length; i += BATCH) {
  const batch = listings.slice(i, i + BATCH);
  const vecs  = await embed(batch.map(l => String(l.description ?? l._id)), DOC_MODEL, 'document');
  for (let j = 0; j < batch.length; j++) {
    await col.updateOne({ _id: batch[j]._id }, { $set: { embedding: vecs[j] } });
  }
  done += batch.length;
  console.log(`Stored ${done}/${listings.length}`);
}
console.log('All embeddings stored.');

## Step 2 — Create a Vector Search index on MongoDB

In [None]:
// ── Create vector search index ────────────────────────────────────────────────
try {
  await col.dropSearchIndex(INDEX_NAME);
  await new Promise(r => setTimeout(r, 2000));
} catch { /* didn't exist */ }

await col.createSearchIndex({
  name: INDEX_NAME,
  type: 'vectorSearch',
  definition: {
    fields: [
      { type: 'vector', path: 'embedding', numDimensions: DIMS, similarity: 'cosine' },
      { type: 'filter', path: 'price' },
      { type: 'filter', path: 'property_type' },
    ],
  },
});

console.log('Waiting for index to be READY...');
for (let i = 0; i < 30; i++) {
  await new Promise(r => setTimeout(r, 2000));
  const [idx] = await col.listSearchIndexes(INDEX_NAME).toArray();
  console.log(' status:', idx?.status);
  if (idx?.status === 'READY') break;
}

## Step 3 — Query with `voyage-4-lite`

The index was built with `voyage-4-large` embeddings. We now query with `voyage-4-lite` — a cheaper, faster model in the same family.  
Because they share a vector space, the `$vectorSearch` pipeline finds the right results with no re-indexing.

In [None]:
// ── $vectorSearch with voyage-4-lite query ────────────────────────────────────
const query      = 'luxury penthouse with rooftop pool and city views';
const [queryVec] = await embed([query], QUERY_MODEL, 'query');  // ← lite model

const results = await col.aggregate([
  {
    $vectorSearch: {
      index:         INDEX_NAME,
      path:          'embedding',
      queryVector:   queryVec,
      numCandidates: 50,
      limit:         5,
    },
  },
  {
    $project: {
      name:          1,
      property_type: 1,
      price:         1,
      score:         { $meta: 'vectorSearchScore' },
    },
  },
]).toArray();

console.log(`Results for: "${query}" (indexed=voyage-4-large, queried=voyage-4-lite)\n`);
console.table(results.map(r => ({ name: r.name, price: r.price, score: (r.score as number).toFixed(4) })));

## Step 4 — Cross-model compatibility: every model queries the same index

Run the **same** `$vectorSearch` pipeline using each model in the Voyage 4 family.  
Observe: the top results are consistent across all models — they share the same space.

In [None]:
// ── All four Voyage 4 models hit the same index ───────────────────────────────
const queryModels = ['voyage-4-large', 'voyage-4', 'voyage-4-lite', 'voyage-4-nano'];
const testQuery   = 'cozy countryside cottage with fireplace';

for (const model of queryModels) {
  const [qVec] = await embed([testQuery], model, 'query');
  const hits = await col.aggregate([
    { $vectorSearch: { index: INDEX_NAME, path: 'embedding', queryVector: qVec, numCandidates: 50, limit: 3 } },
    { $project: { name: 1, score: { $meta: 'vectorSearchScore' } } },
  ]).toArray();

  console.log(`\n[${model}]`);
  hits.forEach((h, i) => console.log(`  ${i+1}. [${(h.score as number).toFixed(4)}] ${h.name}`));
}
// OBSERVE: Top results are consistent across all four models.
// Other providers would require a full re-index to switch query models.

In [None]:
// ── Cleanup ───────────────────────────────────────────────────────────────────
await client.close();
console.log('Done.');