# Traverse: End-to-End Example

Load Spotify listening history, enrich with genre/style metadata from a records CSV,
build a co-occurrence graph, and serve it in the Cosmograph frontend.

**Prerequisites:**
```bash
pip install -e ".[dev]"
cd src/traverse/cosmograph/app && npm install && npm run build
```

In [1]:
!pip install -e ".[dev]"
!cd src/traverse/cosmograph/app && npm install && npm run build

Obtaining file:///C:/Users/xtrem/Documents/Projects/GitHub/traverse_vc/notebooks


ERROR: file:///C:/Users/xtrem/Documents/Projects/GitHub/traverse_vc/notebooks does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
The system cannot find the path specified.


## 1. Configuration

Update these paths to match your local setup.

In [1]:
from pathlib import Path

EXTENDED_DIR = Path(r"C:\Users\xtrem\Documents\Datasets\Spotify\anthony\ExtendedStreamingHistory")  # Spotify Extended Streaming History
RECORDS_CSV  = Path(r"C:\Users\xtrem\Documents\Datasets\records.csv")  # records.csv with genres/styles
CACHE_DIR    = Path("_out")  # canonical table cache (parquet)
FORCE_REBUILD = False  # set True to rebuild from scratch, False to use cache

## 2. Load and Cache Canonical Tables

On first run this ingests the Spotify Extended Streaming History, enriches with
genres/styles from records.csv via `FastGenreStyleEnricher`, and caches the result
as parquet in `_out/`. Subsequent runs load from cache instantly.

In [2]:
from traverse.data.spotify_extended_minimal import load_spotify_extended_minimal
from traverse.processing.enrich_fast import FastGenreStyleEnricher
from traverse.processing.cache import CanonicalTableCache

cache = CanonicalTableCache(
    cache_dir=CACHE_DIR,
    build_fn=lambda: load_spotify_extended_minimal(EXTENDED_DIR),
    enrich_fn=lambda t: FastGenreStyleEnricher(records_csv=str(RECORDS_CSV)).run(t),
    force=FORCE_REBUILD,
)
plays_wide, tracks_wide = cache.load_or_build()

print(f"plays_wide:  {plays_wide.shape[0]:,} rows, {plays_wide.shape[1]} cols")
print(f"tracks_wide: {tracks_wide.shape[0]:,} rows, {tracks_wide.shape[1]} cols")
plays_wide.head(3)

Using cached canonical tables in _out (parquet)


plays_wide:  600,557 rows, 7 cols
tracks_wide: 45,196 rows, 5 cols


Unnamed: 0,played_at,track_id,ms_played,track_name,artist_name,genres,styles
0,2012-02-19 16:41:10+00:00,trk:1LV5G400jD3Ytvyv6Dlkym,6250,Right In,Skrillex,,
1,2012-02-19 16:42:15+00:00,trk:1LV5G400jD3Ytvyv6Dlkym,44240,Right In,Skrillex,,
2,2012-02-19 16:42:50+00:00,trk:6VRhkROS2SZHGlp0pxndbJ,34626,Bangarang (feat. Sirah),Skrillex,electronic,dubstep | electro | electro house | glitch hop


## 3. Build the Co-occurrence Graph

Walk every play, split its genre/style tags, and feed tag pairs to `CooccurrenceBuilder`.
Each play's `played_at` timestamp becomes the timeline value (`first_seen_ts`).

Update this to add 'genre' or 'style' as a value in a 'group' field as described by cosmo

In [3]:
import pandas as pd
from traverse.processing.normalize import split_tags, pretty_label
from traverse.graph.cooccurrence import CooccurrenceBuilder

builder = CooccurrenceBuilder(min_cooccurrence=2, max_nodes=500)

for played_at, genres, styles in plays_wide[
    ["played_at", "genres", "styles"]
].itertuples(index=False):
    tags = split_tags(genres) + split_tags(styles)
    ts_ms = (
        int(pd.Timestamp(played_at).value // 1_000_000)
        if pd.notna(played_at)
        else None
    )
    builder.add(tags, timestamp_ms=ts_ms, label_fn=pretty_label)

graph = builder.build()
print(f"Graph: {len(graph['points'])} nodes, {len(graph['links'])} edges")

Graph: 405 nodes, 12501 edges


## 4. Export JSON and Serve

Write the graph to the frontend's `dist/` directory, then start the built-in
static server. Open the printed URL in your browser to see the visualization.

In [4]:
from traverse.graph.adapters_cosmograph import CosmographAdapter
from traverse.cosmograph.server import serve, _default_dist_dir

# Write JSON into the frontend dist/ (use absolute path so it works from any cwd)
out_path = _default_dist_dir() / "cosmo_genres_spotify_timeline.json"
CosmographAdapter.write(graph, out_path)
print(f"Wrote {out_path}")
print()
print("Starting server — open in browser:")
print("  http://127.0.0.1:8080/?data=/cosmo_genres_spotify_timeline.json")
print()
print("Press Ctrl+C (or interrupt the kernel) to stop.")

serve(port=8080)

Wrote C:\Users\xtrem\Documents\Projects\GitHub\traverse_vc\src\traverse\cosmograph\app\dist\cosmo_genres_spotify_timeline.json

Starting server — open in browser:
  http://127.0.0.1:8080/?data=/cosmo_genres_spotify_timeline.json

Press Ctrl+C (or interrupt the kernel) to stop.
Serving C:\Users\xtrem\Documents\Projects\GitHub\traverse_vc\src\traverse\cosmograph\app\dist at http://127.0.0.1:8080

Shutting down.


---

## Appendix: PyCosmograph Inline Widget

If you have `pycosmograph` installed (`pip install pycosmograph`), you can render
a graph directly in the notebook without starting a server.

In [5]:
# pip install cosmograph  # uncomment to install
import pandas as pd
from cosmograph import cosmo

points = pd.DataFrame(graph["points"])
links = pd.DataFrame(graph["links"])

w = cosmo(
    points=points,
    links=links,
    point_id_by="id",
    link_source_by="source",
    link_target_by="target",
    point_label_by="label",
    link_include_columns=["weight"],
    point_size=0.2,
)
w  # renders inline

<cosmograph.widget.Cosmograph object at 0x0000013974A18E10>

## Tooling 

Leveraging cosmographs widget tooling to make specific selections:

0. use cosmographs java node stuff to do clustering based on some feature
1. use networkx adapters in tandem with cosmograph selection tools to select graph elements like communities, paths, etc

In [33]:
# w.unselect_points_by_indices(w.selected_point_indices)

w.select_points_by_ids(['soft rock', 'death metal'])
w.fit_view_by_ids(['soft rock', 'death metal'], duration=500, padding=0.1)
