# Fuzz Archives: Album-Centered Graph

*Powered by **Traverse** and Visualized with **Cosmograph***

Build a similarity graph where **each node is a record/album** and edges
connect records that share genre/style tags.  The more tags two records
share, the stronger the link.

This is the inverse of the tag co-occurrence graph (where nodes are tags
and links represent shared albums).

**Prerequisites:**
```bash
pip install -e ".[dev]"
cd src/traverse/cosmograph/app && npm install && npm run build
```

## 1. Configuration

In [None]:
from pathlib import Path

RECORDS_CSV = Path(r"C:\Users\xtrem\Documents\Datasets\records.csv")
OUT_DIR = Path("_out_album")
FORCE = False  # set True to rebuild cache

# Tuning parameters
MIN_WEIGHT = 2              # min shared tags to create an edge
MAX_NODES = 0               # 0 = unlimited (all records with edges)
MAX_EDGES = 0               # 0 = unlimited
MAX_TAG_DEGREE = 500        # sample/skip tags shared by more records than this
SAMPLE_HIGH_DEGREE = True   # True = sample down; False = skip entirely
UNWEIGHTED = False          # True = pure connectivity (no weights, faster, less memory)
MAX_EDGE_WEIGHT = 0         # cap edge weights at this value; 0 = unlimited

## 2. Build or Load Graph

Build the album-centered similarity graph from the records CSV (or load
from cache).  Uses a separate `_out_album/` cache dir to avoid colliding
with the tag co-occurrence cache.

In [None]:
from traverse.graph.album_graph import build_album_graph
from traverse.graph.cache import GraphCache

cache = GraphCache(
    cache_dir=OUT_DIR,
    build_fn=lambda: build_album_graph(
        RECORDS_CSV,
        min_weight=MIN_WEIGHT,
        max_nodes=MAX_NODES,
        max_edges=MAX_EDGES,
        max_tag_degree=MAX_TAG_DEGREE,
        sample_high_degree=SAMPLE_HIGH_DEGREE,
        unweighted=UNWEIGHTED,
        max_edge_weight=MAX_EDGE_WEIGHT,
    ),
    force=FORCE,
)
graph, records_df = cache.load_or_build()
print(f"Graph: {len(graph['points'])} nodes, {len(graph['links'])} edges")
print(f"Records: {len(records_df):,} rows")

## 3. Community Detection

Run Louvain community detection and add community labels to each album node.

In [11]:
from collections import Counter
from traverse.graph.community import add_communities, CommunityAlgorithm

graph = add_communities(graph, CommunityAlgorithm.LOUVAIN, seed=42)

comm_counts = Counter(pt["community"] for pt in graph["points"])
print(f"{len(comm_counts)} communities:")
for comm_id, count in comm_counts.most_common():
    print(f"  Community {comm_id}: {count} nodes")

23 communities:
  Community 0: 48 nodes
  Community 1: 35 nodes
  Community 2: 20 nodes
  Community 3: 19 nodes
  Community 4: 12 nodes
  Community 5: 10 nodes
  Community 6: 8 nodes
  Community 7: 7 nodes
  Community 8: 6 nodes
  Community 9: 5 nodes
  Community 10: 5 nodes
  Community 11: 5 nodes
  Community 13: 4 nodes
  Community 12: 4 nodes
  Community 15: 3 nodes
  Community 14: 3 nodes
  Community 16: 3 nodes
  Community 18: 2 nodes
  Community 17: 2 nodes
  Community 19: 2 nodes
  Community 20: 2 nodes
  Community 21: 2 nodes
  Community 22: 2 nodes


## 4. Sanity Check

Sample a node and its neighbors to verify the graph makes sense.

In [None]:
import random

sample_pt = random.choice(graph["points"])
sample_id = sample_pt["id"]
print(f"Sample node: {sample_pt}")
print()

# Find neighbors
id_to_pt = {pt["id"]: pt for pt in graph["points"]}
neighbors = []
for lk in graph["links"]:
    w = lk.get("weight", 1)
    if lk["source"] == sample_id:
        neighbors.append((lk["target"], w))
    elif lk["target"] == sample_id:
        neighbors.append((lk["source"], w))

neighbors.sort(key=lambda x: x[1], reverse=True)
print(f"{len(neighbors)} neighbors (top 10 by shared tags):")
for nid, w in neighbors[:10]:
    npt = id_to_pt.get(nid, {})
    print(f"  w={w}: {npt.get('label', nid)} — {npt.get('artist', '?')}")

## 5. Export & Serve

Export the community graph JSON and start the Cosmograph server.

In [None]:
from traverse.graph.adapters_cosmograph import CosmographAdapter
from traverse.cosmograph.server import serve, _default_dist_dir

n_pts = len(graph["points"])
n_lks = len(graph["links"])
print(f"Graph: {n_pts:,} nodes, {n_lks:,} edges")

# Browser safety check — Cosmograph handles ~500K edges max in-browser
if n_lks > 500_000:
    print(f"WARNING: {n_lks:,} edges is too many for browser visualization.")
    print("Consider increasing MIN_WEIGHT or lowering MAX_EDGES/MAX_NODES.")
    print("Skipping export. Adjust params and re-run.")
else:
    meta = {"clusterField": "community", "title": "Fuzz Archives"}
    out_path = _default_dist_dir() / "cosmo_albums_community.json"
    CosmographAdapter.write(graph, out_path, meta=meta)
    print()
    print("Starting server — open in browser:")
    print("  http://127.0.0.1:8080/?data=/cosmo_albums_community.json")
    print()
    print("Press Ctrl+C (or interrupt the kernel) to stop.")
    serve(port=8080)