# Traverse: End-to-End Example

Load Spotify listening history, enrich with genre/style metadata from a records CSV,
build a co-occurrence graph, and serve it in the Cosmograph frontend.

**Prerequisites:**
```bash
pip install -e ".[dev]"
cd src/traverse/cosmograph/app && npm install && npm run build
```

In [1]:
!pip install -e ".[dev]"
!cd src/traverse/cosmograph/app && npm install && npm run build

Obtaining file:///C:/Users/xtrem/Documents/Projects/GitHub/traverse_vc/notebooks


ERROR: file:///C:/Users/xtrem/Documents/Projects/GitHub/traverse_vc/notebooks does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
The system cannot find the path specified.


## 1. Configuration

Update these paths to match your local setup.

In [6]:
from pathlib import Path

EXTENDED_DIR = Path(r"C:\Users\xtrem\Documents\Datasets\Spotify\anthony\ExtendedStreamingHistory")  # Spotify Extended Streaming History
RECORDS_CSV  = Path(r"C:\Users\xtrem\Documents\Datasets\records.csv")  # records.csv with genres/styles
CACHE_DIR    = Path("_out")  # canonical table cache (parquet)
FORCE_REBUILD = False  # set True to rebuild from scratch, False to use cache

## 2. Load and Cache Canonical Tables

On first run this ingests the Spotify Extended Streaming History, enriches with
genres/styles from records.csv via `FastGenreStyleEnricher`, and caches the result
as parquet in `_out/`. Subsequent runs load from cache instantly.

In [7]:
from traverse.data.spotify_extended_minimal import load_spotify_extended_minimal
from traverse.processing.enrich_fast import FastGenreStyleEnricher
from traverse.processing.cache import CanonicalTableCache

cache = CanonicalTableCache(
    cache_dir=CACHE_DIR,
    build_fn=lambda: load_spotify_extended_minimal(EXTENDED_DIR),
    enrich_fn=lambda t: FastGenreStyleEnricher(records_csv=str(RECORDS_CSV)).run(t),
    force=FORCE_REBUILD,
)
plays_wide, tracks_wide = cache.load_or_build()

print(f"plays_wide:  {plays_wide.shape[0]:,} rows, {plays_wide.shape[1]} cols")
print(f"tracks_wide: {tracks_wide.shape[0]:,} rows, {tracks_wide.shape[1]} cols")
plays_wide.head(3)

Using cached canonical tables in _out (parquet)


plays_wide:  600,557 rows, 7 cols
tracks_wide: 45,196 rows, 5 cols


Unnamed: 0,played_at,track_id,ms_played,track_name,artist_name,genres,styles
0,2012-02-19 16:41:10+00:00,trk:1LV5G400jD3Ytvyv6Dlkym,6250,Right In,Skrillex,,
1,2012-02-19 16:42:15+00:00,trk:1LV5G400jD3Ytvyv6Dlkym,44240,Right In,Skrillex,,
2,2012-02-19 16:42:50+00:00,trk:6VRhkROS2SZHGlp0pxndbJ,34626,Bangarang (feat. Sirah),Skrillex,electronic,dubstep | electro | electro house | glitch hop


## 3. Build the Co-occurrence Graph

Walk every play, split its genre/style tags, and feed tag pairs to `CooccurrenceBuilder`.
Each play's `played_at` timestamp becomes the timeline value (`first_seen_ts`).

Genre and style tags are tracked separately via `tag_categories` so each node gets a
majority-vote `category` field ("genre" or "style") used for clustering and coloring.

In [None]:
import pandas as pd
from traverse.processing.normalize import split_tags, pretty_label
from traverse.graph.cooccurrence import CooccurrenceBuilder

builder = CooccurrenceBuilder(min_cooccurrence=2, max_nodes=500)

for played_at, genres, styles in plays_wide[
    ["played_at", "genres", "styles"]
].itertuples(index=False):
    genre_tags = split_tags(genres)
    style_tags = split_tags(styles)
    tags = genre_tags + style_tags

    # Build tag -> category mapping for this row
    tag_categories = {}
    for t in genre_tags:
        tag_categories[t] = "genre"
    for t in style_tags:
        tag_categories[t] = "style"

    ts_ms = (
        int(pd.Timestamp(played_at).value // 1_000_000)
        if pd.notna(played_at)
        else None
    )
    builder.add(tags, timestamp_ms=ts_ms, label_fn=pretty_label,
                tag_categories=tag_categories)

graph = builder.build()
print(f"Graph: {len(graph['points'])} nodes, {len(graph['links'])} edges")

# Check category coverage
cats = {p.get("category") for p in graph["points"]}
print(f"Categories on points: {cats}")

## 4. Export JSON and Serve

Write the graph to the frontend's `dist/` directory, then start the built-in
static server. Open the printed URL in your browser to see the visualization.

The exported JSON includes `meta.clusterField` so the frontend automatically
clusters and colors nodes by genre vs. style. Use the "Cluster by category"
checkbox in the header to toggle clustering on/off.

In [None]:
from traverse.graph.adapters_cosmograph import CosmographAdapter, detect_cluster_field
from traverse.cosmograph.server import serve, _default_dist_dir

# Detect if graph has category data and build meta
cluster_field = detect_cluster_field(graph)
meta = {"clusterField": cluster_field} if cluster_field else None
print(f"Cluster field: {cluster_field}")

# Write JSON into the frontend dist/ (use absolute path so it works from any cwd)
out_path = _default_dist_dir() / "cosmo_genres_spotify_timeline.json"
CosmographAdapter.write(graph, out_path, meta=meta)
print(f"Wrote {out_path}")
print()
print("Starting server â€” open in browser:")
print("  http://127.0.0.1:8080/?data=/cosmo_genres_spotify_timeline.json")
print()
print("Press Ctrl+C (or interrupt the kernel) to stop.")

serve(port=8080)

---

## Appendix: PyCosmograph Inline Widget

If you have `pycosmograph` installed (`pip install cosmograph`), you can render
the graph directly in the notebook without starting a server.

Nodes are colored and clustered by their `category` (genre vs. style).

In [None]:
# pip install cosmograph  # uncomment to install
import pandas as pd
from cosmograph import cosmo

points_df = pd.DataFrame(graph["points"])
links_df = pd.DataFrame(graph["links"])

w = cosmo(
    points=points_df,
    links=links_df,
    point_id_by="id",
    link_source_by="source",
    link_target_by="target",
    point_label_by="label",
    link_include_columns=["weight"],
    point_size=0.2,
    # Clustering by genre/style category
    point_color_by="category",
    point_cluster_by="category",
    show_cluster_labels=True,
    scale_cluster_labels=True,
    use_point_color_strategy_for_cluster_labels=True,
    point_include_columns=["category"],
)
w  # renders inline