# ModSSC | Graph

Build KNN style graphs and related artifacts used by transductive methods.

## Objective
- Show the minimal steps to run this component in a notebook setting.
- Provide the exact objects to look at (outputs, shapes, metrics) to confirm it worked.

## Prerequisites
- Python 3.11+.
- `pip install modssc`.
- Optional dependencies depend on datasets and backends. If an import fails, install the matching extra and rerun.

## Outline
1) Imports and configuration
2) Core run (the part that does the work)
3) Sanity checks and outputs



## Notebook notes

Goal: build a graph from tabular data (for transductive evaluation),
then produce tabular views from a graph (for inductive evaluation).

This notebook includes the **anchor** scheme (approx kNN) and the **struct** view (structural embeddings).


## Imports

This notebook uses a synthetic example (no data_loader needed).


## Imports and configuration



In [1]:
import numpy as np

from modssc.graph import GraphBuilderSpec, GraphFeaturizerSpec, build_graph, graph_to_views
from modssc.graph.artifacts import NodeDataset

## Synthetic data

We generate 3 clusters to keep the structure simple.


In [2]:
rng = np.random.default_rng(0)
n_per = 80
d = 16
X = np.vstack(
    [
        rng.normal(loc=-2.0, scale=0.8, size=(n_per, d)),
        rng.normal(loc=0.0, scale=0.8, size=(n_per, d)),
        rng.normal(loc=2.0, scale=0.8, size=(n_per, d)),
    ]
).astype(np.float32)
y = np.array([0] * n_per + [1] * n_per + [2] * n_per, dtype=np.int64)
X.shape, y.shape

((240, 16), (240,))

## Build an exact kNN graph

For transductive evaluation, we often start with a kNN graph plus weights (heat kernel).


In [3]:
spec_knn = GraphBuilderSpec(
    scheme="knn",
    metric="cosine",
    k=15,
    symmetrize="mutual",
    self_loops=True,
)
g_knn = build_graph(X, spec=spec_knn, seed=0, cache=True)
g_knn.n_nodes, g_knn.edge_index.shape, g_knn.edge_weight.shape

(240, (2, 2066), (2066,))

## Build an approximate anchor graph

The anchor scheme is mainly for large n; it avoids an expensive exact kNN.
Here it's just a demo; the benefit shows up on larger datasets.


In [4]:
spec_anchor = GraphBuilderSpec(
    scheme="anchor",
    metric="cosine",
    k=15,
    n_anchors=40,
    anchors_k=4,
    candidate_limit=400,
    symmetrize="mutual",
    self_loops=True,
    backend="numpy",
)
g_anchor = build_graph(X, spec=spec_anchor, seed=0, cache=True)
g_anchor.n_nodes, g_anchor.edge_index.shape

(240, (2, 2046))

## Tabular views from a graph

We build 3 views:

1) attr: original X
2) diffusion: diffuse X over the graph
3) struct: structural embeddings (deepwalk/node2vec style)


In [5]:
views_spec = GraphFeaturizerSpec(
    views=("attr", "diffusion", "struct"),
    diffusion_steps=5,
    diffusion_alpha=0.15,
    struct_method="deepwalk",
    struct_dim=32,
    walk_length=30,
    num_walks_per_node=5,
    window_size=5,
)

dataset = NodeDataset(X=X, y=y, graph=g_knn, masks={})
V = graph_to_views(dataset, spec=views_spec, seed=0)
{k: v.shape for k, v in V.views.items()}

{'attr': (240, 16), 'diffusion': (240, 16), 'struct': (240, 32)}

## CLI

Cache utilities are available under `modssc graph`.


In [None]:
import subprocess
import sys


def run_cli(*args):
    cmd = [sys.executable, "-m", "modssc", *args]
    res = subprocess.run(cmd, text=True, capture_output=True)
    return res.returncode, res.stdout.strip(), res.stderr.strip()


print(run_cli("graph", "cache", "ls"))
print(run_cli("graph", "views", "cache-ls"))

## Model-side usage

At this stage:

Transductive: use `g_knn.edge_index/edge_weight` directly.

Inductive on graphs: use `V.views` as tabular inputs (attr/diffusion/struct).


## Outputs

- The last cells should print key shapes and a minimal metric or artifact summary.
- If something fails early, the error should point to a missing optional dependency.


## Next steps
- Explore the adjacent notebooks in this folder for the other pipeline components.
- If you hit an optional dependency error, install the suggested extra and rerun.
