# 02 — Auto-Embedding

MongoDB Community Edition 8.2+ includes **native auto-embedding** in the vector search engine (`mongot`) (IN PREVIEW).

How it works:
1. You declare `"type": "autoEmbed"` on a text field in your index definition, choosing a VoyageAI model.
2. MongoDB generates and keeps embeddings in sync automatically — for existing docs, new inserts, and updates.
3. At query time, use `query: { text: '...' }` in `$vectorSearch`, MongoDB embeds your query too.

> **Before you start:** This notebook requires your personal VoyageAI API key to be set as a GitHub Codespace secret named `VOYAGE_API_KEY`.
> Go to **github.com → Settings → Codespaces → Secrets** and add your key there — then (re)open this Codespace.

In [None]:
import { MongoClient } from 'mongodb';

const INDEX_NAME = 'auto_embed_index';

const client = new MongoClient(process.env.MONGODB_URI!);
await client.connect();
const db  = client.db('voyage_lab');
const col = db.collection<{ _id: string; [key: string]: unknown }>('listings');

console.log('Connected. Listings:', await col.countDocuments());

## Create an `autoEmbed` vector search index

In [None]:
try {
  await col.dropSearchIndex(INDEX_NAME);
  await new Promise(r => setTimeout(r, 2000));
} catch { /* didn't exist */ }

await col.createSearchIndex({
  name: INDEX_NAME,
  type: 'vectorSearch',
  definition: {
    fields: [
      {
        type:     'autoEmbed',   // ← tells MongoDB to generate embeddings
        modality: 'text',
        path:     'description', // ← field to embed
        model:    'voyage-4',    // ← VoyageAI model
      },
    ],
  },
});

console.log('Index creation requested. Waiting for READY status...');
for (let i = 0; i < 30; i++) {
  await new Promise(r => setTimeout(r, 5000));
  const [idx] = await col.listSearchIndexes(INDEX_NAME).toArray();
  console.log(` status: ${idx?.status}`);
  if (idx?.queryable) { console.log('Index is ready.'); break; }
}

## Insert documents with no vector field

In [None]:
// ── Load listings from JSON and insert ───────────────────────────────────────
import fs from 'fs';
import path from 'path';

function loadListings(filename: string) {
  const filePath = path.join(process.cwd(), 'data', filename);
  return JSON.parse(fs.readFileSync(filePath, 'utf-8')) as { _id: string; [key: string]: unknown }[];
}

const newListings = loadListings('auto_embed_listings.json');

// Drop previous inserts if re-running
await col.deleteMany({ _id: { $in: newListings.map(l => l._id) } });
await col.insertMany(newListings);
console.log(`Inserted ${newListings.length} listings.`);

// Verify: no embedding field on the documents — MongoDB handles it internally
const sample = await col.findOne({ _id: 'ae-001' }, { projection: { name: 1, embedding: 1 } });
console.log('Document (no embedding field):', JSON.stringify(sample));

## Run a $vectorSearch query with `query: { text: '...' }` instead of `queryVector`.  

In [None]:
const results = await col.aggregate([
  {
    $vectorSearch: {
      index:         INDEX_NAME,
      path:          'description',
      query:         { text: 'romantic wine country getaway surrounded by nature' },
      model:         'voyage-4',
      numCandidates: 50,
      limit:         5,
    },
  },
  {
    $project: {
      name:          1,
      property_type: 1,
      price:         1,
      score:         { $meta: 'vectorSearchScore' },
    },
  },
]).toArray();

console.log('Results for: "romantic wine country getaway surrounded by nature"\n');
console.table(results.map(r => ({ name: r.name, price: r.price, score: (r.score as number).toFixed(4) })));

In [None]:
// ── Cleanup ───────────────────────────────────────────────────────────────────
await col.deleteMany({ _id: { $in: newListings.map(l => l._id) } });
await client.close();
console.log('Done.');