# Transition Network Analysis (TNA) Tutorial

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mohsaqr/tnapy/blob/main/tutorial.ipynb)

This tutorial demonstrates how to use the Python `tna` package for Transition Network Analysis.
It covers the complete workflow from data preparation through model building, visualization,
pruning, pattern detection, centrality analysis, community detection, and bootstrapping.

This tutorial replicates the R TNA tutorial by Saqr & Lopez-Pernas (2026) using the Python implementation.

## 1. Installation & Setup

Install the `tna` package from the project directory:

In [None]:
# Install tna package (uncomment for Google Colab)
# !pip install git+https://github.com/mohsaqr/tnapy.git -q

import tna
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.dpi'] = 100

print(f"TNA version: {tna.__version__}")

## 2. Getting Started with Long-Format Data

TNA works with sequential event data. The built-in dataset contains coded collaborative
regulation behaviors from 2000 learning sessions, with columns for action, actor, and time.

In [None]:
# Load the built-in dataset of coded collaborative regulation behaviors
group_regulation_long = tna.load_group_regulation_long()
print(f"Shape: {group_regulation_long.shape}")
group_regulation_long.head(10)

Each row is a single event with columns:
- **Action**: The behavioral state (becomes a network node)
- **Actor**: Participant ID (one sequence per actor)
- **Time**: Timestamp (for ordering and session splitting)
- **Achiever**: Achievement group (High/Low, used later for group comparison)
- **Group**: Group identifier
- **Course**: Course identifier

## 3. Understanding `prepare_data()`

The `prepare_data()` function converts long-format event logs into sequences suitable for TNA.
It handles session splitting (based on time gaps), ordering, and reshaping.

In [None]:
# Convert long-format event log into sequences for TNA
prepared_data = tna.prepare_data(
    group_regulation_long,
    action="Action",   # column with behavioral states (become network nodes)
    actor="Actor",     # column with participant IDs (one sequence per actor)
    time="Time"        # column with timestamps (for ordering and session splitting)
)
prepared_data

In [None]:
# View the wide-format sequence data (rows = sequences, columns = positions)
print("Sequence data shape:", prepared_data.sequence_data.shape)
prepared_data.sequence_data.head()

In [None]:
# View the preserved metadata (e.g., Achiever group) for each sequence
prepared_data.meta_data.head()

### Alternative Input Formats

You can also build TNA models directly from wide-format data or pre-computed transition matrices:

In [None]:
# Wide-format data (rows = sequences, columns = time steps)
group_regulation = tna.load_group_regulation()
print("Wide-format shape:", group_regulation.shape)
group_regulation.head()

In [None]:
# Pre-computed transition matrix
mat = np.array([
    [0.1, 0.6, 0.3],
    [0.4, 0.2, 0.4],
    [0.3, 0.3, 0.4]
])
labels = ["A", "B", "C"]
model_from_matrix = tna.tna(pd.DataFrame(mat, index=labels, columns=labels))
print(model_from_matrix)

## 4. Building the TNA Model

The `tna()` function builds a relative transition probability model from the prepared data.
The resulting model contains:
- **weights**: The transition probability matrix (rows sum to 1)
- **inits**: Initial state probabilities
- **labels**: State labels

In [None]:
# Build the TNA model from the prepared sequence data
model = tna.tna(prepared_data)
print(model)

In [None]:
# Inspect the transition probability matrix
weights_df = model.to_dataframe()
weights_df.round(3)

In [None]:
# Inspect initial probabilities
init_df = pd.Series(model.inits, index=model.labels, name="Initial Probability")
init_df.round(3)

In [None]:
# Model summary
model.summary()

## 5. Visualizations

### 5.1 Transition Network Plot

The network plot shows states as nodes and transitions as directed edges.
Edge width and opacity reflect transition probability.

- `minimum`: hides edges below this weight (removes clutter)
- `cut`: fades edges below this weight (still visible but de-emphasized)

In [None]:
# minimum: hide edges below 0.05; cut: fade edges below 0.1
tna.plot_network(model, minimum=0.05, cut=0.1)
plt.show()

### 5.2 Histogram of Edge Weights

Examine the distribution of transition probabilities:

In [None]:
tna.plot_histogram(model)
plt.show()

### 5.3 Frequency Distribution of States

See how often each state appears as the first event in a sequence:

In [None]:
# Bar chart of how often each state appears across all sequences
tna.plot_frequencies(model)
plt.show()

## 6. Pruning

Pruning removes weak edges to simplify the network. The `prune()` function
sets edges with weight below a threshold to zero.

In [None]:
# Prune: remove edges with weight below 0.05
pruned = tna.prune(model, threshold=0.05)

print(f"Original edges: {model.summary()['n_edges']}")
print(f"Pruned edges:   {pruned.summary()['n_edges']}")

In [None]:
# Plot the pruned network
tna.plot_network(pruned, cut=0.1)
plt.show()

## 7. Patterns: Cliques

Cliques are groups of states with strong mutual transitions. A directed clique of size *k*
means every pair of states in the group has transitions in **both** directions above the threshold.

We search for cliques of size 2 (dyads), 3 (triads), and 4 (quads) with decreasing thresholds.

In [None]:
# Find cliques of size 2, 3, and 4 with decreasing thresholds
cliques_of_two   = tna.cliques(model, size=2, threshold=0.1)   # dyads
cliques_of_three = tna.cliques(model, size=3, threshold=0.05)  # triads
cliques_of_four  = tna.cliques(model, size=4, threshold=0.03)  # quads

In [None]:
print(cliques_of_two)

In [None]:
print(cliques_of_three)

In [None]:
print(cliques_of_four)

## 8. Centralities

### 8.1 Node-Level Centrality Measures

Centrality measures quantify the importance of each state in the transition network.
Available measures include OutStrength, InStrength, Closeness variants, Betweenness,
BetweennessRSP, Diffusion, and Clustering.

In [None]:
# Compute all centrality measures for each state
centrality_df = tna.centralities(model)
centrality_df.round(4)

In [None]:
# Plot centralities as faceted bar charts
tna.plot_centralities(centrality_df)
plt.show()

### 8.2 Edge-Level Measures: Edge Betweenness

Edge betweenness centrality measures how many shortest paths pass through each transition.
The `betweenness_network()` function returns a new TNA model with betweenness values as weights.

In [None]:
# Compute edge betweenness for all transitions
edge_betweenness = tna.betweenness_network(model)

# Show the betweenness values
edge_betweenness.to_dataframe().round(3)

In [None]:
# Plot edge betweenness network
tna.plot_network(edge_betweenness, cut=0.1, title="Edge Betweenness Network")
plt.show()

## 9. Community Detection

Community detection identifies groups of states that interact more densely with each other
than with the rest of the network.

In [None]:
# Detect communities using the default algorithm (leading eigenvector)
comms = tna.communities(model)
print(comms)

In [None]:
# Plot communities: nodes colored by community assignment
tna.plot_communities(comms, cut=0.1)
plt.show()

In [None]:
# Try multiple community detection methods
comms_multi = tna.communities(model, methods=["leading_eigen", "louvain", "fast_greedy"])
print(comms_multi)

## 10. Bootstrapping

### 10.1 Why Bootstrap?

Bootstrap resampling tests the stability of the transition network. It resamples sequences
and checks whether edge weights remain within a consistency range of the original values.
Edges that are unstable (i.e., change a lot when different sequences are included) are
flagged as non-significant.

In [None]:
# Resample sequences 1000 times and assess edge stability
np.random.seed(265)  # for reproducibility
boot = tna.bootstrap_tna(model, iter=1000, level=0.05, seed=265)

### 10.2 Results

The bootstrap summary table shows each edge's weight, p-value, significance flag,
consistency range, and confidence interval.

In [None]:
# Extract the bootstrap summary table
boot_df = boot.summary()
boot_df.head(10)

In [None]:
# Keep only edges that survived the bootstrap and sort by weight
sig_edges = boot_df[boot_df["sig"] == True].sort_values("weight", ascending=False)
print(f"{len(sig_edges)} out of {len(boot_df)} edges are significant")
sig_edges.head(15)

### 10.3 Bootstrapped Network

Plot the network showing only significant edges (those that survived the bootstrap):

In [None]:
# Plot the bootstrapped network (only significant edges)
tna.plot_network(boot.model, cut=0.1, title="Bootstrapped Network (significant edges)")
plt.show()

## 11. Sequence Plots

Visualize the raw sequence data as index plots (each row = one sequence) or
distribution plots (proportion of each state at each position).

In [None]:
# Each row is one sequence; colors represent states at each position
tna.plot_sequences(prepared_data, max_sequences=200)
plt.show()

In [None]:
# Proportion of each state at each sequence position
tna.plot_sequences(prepared_data, plot_type="distribution")
plt.show()

## 12. Group Comparison with Permutation Testing

Permutation testing compares TNA models between groups. Here we compare
High vs Low achievers to see which transitions differ significantly.

In [None]:
# Split data by Achiever group
high_data = group_regulation_long[group_regulation_long["Achiever"] == "High"]
low_data = group_regulation_long[group_regulation_long["Achiever"] == "Low"]

# Prepare each group
prepared_high = tna.prepare_data(high_data, action="Action", actor="Actor", time="Time")
prepared_low = tna.prepare_data(low_data, action="Action", actor="Actor", time="Time")

# Build TNA models for each group
model_high = tna.tna(prepared_high)
model_low = tna.tna(prepared_low)

print(f"High achievers: {prepared_high.statistics['n_sessions']} sequences")
print(f"Low achievers:  {prepared_low.statistics['n_sessions']} sequences")

In [None]:
# Compare the two networks side by side
tna.plot_comparison(
    model_high, model_low,
    plot_type="network",
    labels=("High Achievers", "Low Achievers"),
    minimum=0.05, cut=0.1
)
plt.show()

In [None]:
# Run permutation test (this takes a moment)
perm_result = tna.permutation_test(
    model_high, model_low,
    iter=500, seed=42, level=0.05
)

# Show significant edge differences
sig_perm = perm_result.edges["stats"][
    perm_result.edges["stats"]["p_value"] < 0.05
].sort_values("p_value")

print(f"{len(sig_perm)} significant edge differences found")
sig_perm

## 13. Complete Workflow at a Glance

```python
import tna
import pandas as pd

# Load and prepare data
my_data = pd.read_csv("your_data.csv")
prepared = tna.prepare_data(my_data, action="event", actor="user_id", time="timestamp")

# Build model
model = tna.tna(prepared)

# Visualize
tna.plot_network(model, minimum=0.05, cut=0.1)
tna.plot_histogram(model)
tna.plot_frequencies(model)

# Prune
pruned = tna.prune(model, threshold=0.05)
tna.plot_network(pruned, cut=0.1)

# Cliques
print(tna.cliques(model, size=2, threshold=0.1))
print(tna.cliques(model, size=3, threshold=0.05))

# Centralities
tna.plot_centralities(tna.centralities(model))
tna.plot_network(tna.betweenness_network(model), cut=0.1)

# Communities
tna.plot_communities(tna.communities(model), cut=0.1)

# Bootstrap
boot = tna.bootstrap_tna(model, iter=1000, level=0.05, seed=265)
tna.plot_network(boot.model, cut=0.1)

# Sequences
tna.plot_sequences(prepared)
```