# Lab A — Using $graphLookup with the Graph Pattern

## Overview

The **Graph Pattern** models relationships by storing arrays of connected node IDs directly in each document.
MongoDB's `$graphLookup` aggregation stage then traverses those connections **recursively** — like a `JOIN` that keeps going.

### Data model (database: `graph_lab`)

| Collection | Key fields |
|---|---|
| `data_centers` | `_id`, `features`, **`connected_datacenters`** |
| `routers` | `_id`, `data_center`, `features`, **`connected_routers`** |
| `network_cards` | `_id`, `router`, `features` |

```
dc1 ──── dc2
 |         |
dc3 ──── dc4 ── dc5

r1 ── r2    r5 ── r6
 \   /        \   \
  r3 ── r4    r8    r9 ── r10
          \   /
           r7
```

### How `$graphLookup` works

```
$graphLookup {
  from:             collection to search
  startWith:        field(s) to begin traversal from
  connectFromField: field in the CURRENT document that holds neighbor IDs
  connectToField:   field in the FOUND document to match against
  as:               output array name
  depthField:       (optional) record hop count in each result
  maxDepth:         (optional) limit traversal depth
}
```

## Setup — connect to MongoDB

In [None]:
import { MongoClient, Document } from 'mongodb';

const uri = process.env.MONGODB_URI ?? 'mongodb://admin:mongodb@localhost:27017/?directConnection=true';
const client = new MongoClient(uri);
await client.connect();

const db = client.db('graph_lab');
const dataCenters = db.collection('data_centers');
const routers     = db.collection('routers');
const networkCards = db.collection('network_cards');

// Quick sanity check
const counts = {
  data_centers:  await dataCenters.countDocuments(),
  routers:       await routers.countDocuments(),
  network_cards: await networkCards.countDocuments(),
};
console.log('Connected to graph_lab. Document counts:', counts);

## Exercise 1 — Explore the data

In [None]:
// Look at a data center document
const dc1 = await dataCenters.findOne({ _id: 'dc1' });
console.log('Data center document:');
console.log(JSON.stringify(dc1, null, 2));

In [None]:
// Look at a router document
const r1 = await routers.findOne({ _id: 'r1' });
console.log('Router document:');
console.log(JSON.stringify(r1, null, 2));

In [None]:
// All routers in dc1
const routersInDc1 = await routers.find({ data_center: 'dc1' }).toArray();
console.log('Routers in dc1:');
routersInDc1.forEach(r => console.log(`  ${r._id} — ${r.hostname} — features: ${r.features.join(', ')}  → connected to: ${r.connected_routers.join(', ')}`));

## Exercise 2 — Find all routers reachable from r1

This is the core `$graphLookup` query from the slides.
`startWith: '$_id'` means: *"start the traversal from r1's own ID"*,
then follow every `connected_routers` array recursively.

In [None]:
const result = await routers.aggregate([
  { $match: { _id: 'r1' } },
  {
    $graphLookup: {
      from:             'routers',
      startWith:        '$connected_routers',   // start from r1's neighbours
      connectFromField: 'connected_routers',
      connectToField:   '_id',
      as:               'allConnectedRouters',
      depthField:       'hops',
      maxDepth:         10,
    }
  },
  // Keep only the fields we care about
  { $project: { hostname: 1, allConnectedRouters: { _id: 1, hostname: 1, hops: 1 } } }
]).toArray();

const r1Result = result[0];
console.log(`Starting router: ${r1Result._id} (${r1Result.hostname})`);
console.log(`\nAll reachable routers (${r1Result.allConnectedRouters.length}):`);
r1Result.allConnectedRouters
  .sort((a: Document, b: Document) => a.hops - b.hops)
  .forEach((r: Document) => console.log(`  ${r._id}  ${r.hostname}  (${r.hops} hop${r.hops === 1 ? '' : 's'})`));

## Exercise 3 — Limit traversal depth with `maxDepth`

Try changing `maxDepth` from `1` to `3` and observe how the reachable set grows.

In [None]:
for (const maxDepth of [1, 2, 3]) {
  const res = await routers.aggregate([
    { $match: { _id: 'r1' } },
    {
      $graphLookup: {
        from: 'routers', startWith: '$connected_routers',
        connectFromField: 'connected_routers', connectToField: '_id',
        as: 'reached', depthField: 'hops', maxDepth,
      }
    },
    { $project: { reachedCount: { $size: '$reached' }, reached: { _id: 1, hops: 1 } } }
  ]).toArray();

  const ids = res[0].reached
    .sort((a: Document, b: Document) => a.hops - b.hops)
    .map((r: Document) => `${r._id}(${r.hops})`);
  console.log(`maxDepth=${maxDepth}  →  ${res[0].reachedCount} routers: ${ids.join('  ')}`);
}

## Exercise 4 — Traverse the data center backbone

The same pattern works on the `data_centers` collection.

In [None]:
const dcResult = await dataCenters.aggregate([
  { $match: { _id: 'dc1' } },
  {
    $graphLookup: {
      from:             'data_centers',
      startWith:        '$connected_datacenters',
      connectFromField: 'connected_datacenters',
      connectToField:   '_id',
      as:               'reachableDCs',
      depthField:       'hops',
    }
  },
  { $project: { name: 1, reachableDCs: { _id: 1, name: 1, location: 1, hops: 1 } } }
]).toArray();

const dc = dcResult[0];
console.log(`Starting from: ${dc._id} — ${dc.name}`);
console.log(`\nReachable data centers (${dc.reachableDCs.length}):`);
dc.reachableDCs
  .sort((a: Document, b: Document) => a.hops - b.hops)
  .forEach((d: Document) => console.log(`  ${d._id}  ${d.name}  ${d.location}  (${d.hops} hop${d.hops === 1 ? '' : 's'})`));

## Exercise 5 — Cross-collection: find all network cards reachable from dc1

Combine `$graphLookup` (traverse the router graph) with `$lookup` (join network cards).
This demonstrates that you can do rich post-processing in the same aggregation pipeline.

In [None]:
// Step 1: find all routers in dc1 via $graphLookup
// Step 2: $unwind the router list
// Step 3: $lookup the network cards for each router
const pipeline = [
  { $match: { data_center: 'dc1' } },
  {
    $graphLookup: {
      from: 'routers', startWith: '$_id',
      connectFromField: 'connected_routers', connectToField: '_id',
      as: 'connectedRouters', depthField: 'hops', maxDepth: 1,
    }
  },
  // Add the starting router itself to the list
  { $addFields: { allRouters: { $setUnion: [['$_id'], '$connectedRouters._id'] } } },
  {
    $lookup: {
      from: 'network_cards',
      localField: 'allRouters',
      foreignField: 'router',
      as: 'networkCards',
    }
  },
  { $project: { hostname: 1, allRouters: 1, networkCards: { _id: 1, serial_number: 1, features: 1, speed_gbps: 1 } } }
];

const crossResult = await routers.aggregate(pipeline).toArray();
crossResult.forEach((router: Document) => {
  console.log(`\nRouter ${router._id} (${router.hostname}) and its 1-hop neighbours → ${router.networkCards.length} network card(s):`);
  router.networkCards.forEach((nc: Document) => {
    console.log(`  ${nc._id}  ${nc.serial_number}  ${nc.speed_gbps}GbE  [${nc.features.join(', ')}]`);
  });
});

## Key Takeaways

| Aspect | Graph Pattern |
|---|---|
| **Best for** | True graphs with many-to-many peer connections |
| **Query** | `$graphLookup` in aggregation pipeline |
| **Index** | Index on `connected_routers` / `connected_datacenters` |
| **Depth control** | `maxDepth` parameter |
| **Post-processing** | Add any pipeline stage after the traversal |
| **Watch out** | All traversal happens in RAM per query (16MB limit on `$graphLookup` result set) |

In [None]:
await client.close();
console.log('Connection closed.');