# GTN One-Stop Tutorial (Python)

This notebook is a comprehensive curriculum-style tutorial for **GTN** using the Python APIs and examples in this repository.

It is organized from fundamentals to advanced topics:

1. Graph basics (WFSA/WFST)
2. Core graph operations
3. Scoring and differentiation
4. Sequence criteria (ASG, CTC)
5. String algorithms (n-grams, edit distance)
6. Learned decompositions and hand-designed priors
7. Sequence alignment
8. Linear-chain CRF and PyTorch integration

It cross-references these example scripts in `bindings/python/examples/`:

- `simple_graph.py`
- `asg.py`
- `ctc.py`
- `count_ngrams.py`
- `edit_distance.py`
- `learned_decompositions.py`
- `priors.py`
- `tutorial.py`
- `sequence_alignment.py`
- `word_decompositions.py`
- `linear_crf.py`
- `pytorch_loss.py`


## 0. Prerequisites

You need:

- `gtn` installed from this repo (`python -m pip install ./bindings/python`)
- Graphviz (`dot`) installed for drawing

If needed in WSL (Ubuntu/Debian):

```bash
bash ./scripts/install_wsl.sh
sudo apt install -y graphviz
```


In [None]:
import json
import math
import subprocess
from pathlib import Path

import gtn
from IPython.display import SVG, display, Markdown

print("GTN imported. version:", getattr(gtn, "__version__", "unknown"))
print("CUDA available:", gtn.cuda.is_available() if hasattr(gtn, "cuda") else False)


In [None]:
# Notebook paths
NB_DIR = Path.cwd()
OUT_DIR = NB_DIR / "_gtn_tutorial_artifacts"
OUT_DIR.mkdir(exist_ok=True)
OUT_DIR


In [None]:
def draw_svg(graph, name, isymbols=None, osymbols=None):
    """Draw graph via GTN -> .dot, then Graphviz -> .svg, and display inline."""
    dot_path = OUT_DIR / f"{name}.dot"
    svg_path = OUT_DIR / f"{name}.svg"
    gtn.draw(graph, str(dot_path), isymbols or {}, osymbols or {})
    subprocess.check_call(["dot", "-Tsvg", str(dot_path), "-o", str(svg_path)])
    display(SVG(filename=str(svg_path)))
    return dot_path, svg_path


## 1. Theory Primer: Weighted Graphs in GTN

A GTN graph is a weighted finite-state automaton/transducer.

- **Nodes**: states
- **Arcs**: transitions with labels and weights
- **Start / accept states** define valid paths

In log-space, path score is arc-weight sum:

$$s(\pi)=\sum_{a\in\pi} w_a$$

Forward score is log-sum-exp over accepting paths:

$$\log Z = \log\sum_{\pi\in\mathcal{A}} e^{s(\pi)}$$

Viterbi score is max over paths:

$$\max_{\pi\in\mathcal{A}} s(\pi)$$

GTN also supports autograd through graph operations.


## 2. Graph Basics (from `simple_graph.py` and `tutorial.py`)


In [None]:
# Build three acceptors like bindings/python/examples/simple_graph.py
symbols = {0: "a", 1: "b", 2: "c"}

g1 = gtn.Graph(False)
g1.add_node(True)
g1.add_node()
g1.add_node(False, True)
g1.add_arc(0, 1, 0)
g1.add_arc(1, 2, 1)
g1.add_arc(2, 2, 0)

g2 = gtn.Graph(False)
g2.add_node(True)
g2.add_node()
g2.add_node(False, True)
g2.add_arc(0, 1, 1)
g2.add_arc(1, 2, 0)

g3 = gtn.Graph(False)
g3.add_node(True)
g3.add_node()
g3.add_node(False, True)
g3.add_arc(0, 1, 0)
g3.add_arc(1, 2, 2)

print("g1 nodes/arcs:", g1.num_nodes(), g1.num_arcs())
draw_svg(g1, "simple_g1", symbols, symbols)


### Acceptor vs Transducer

- **Acceptor**: one label stream (input labels; output defaults to input)
- **Transducer**: separate input/output labels per arc

Epsilon label is `gtn.epsilon` and represents empty symbol transitions.


In [None]:
isymbols = {0: "a", 1: "b", 2: "c"}
osymbols = {0: "x", 1: "y", 2: "z"}

fst = gtn.Graph()
fst.add_node(True)
fst.add_node()
fst.add_node(False, True)
fst.add_arc(0, 1, 0)       # output defaults to input
fst.add_arc(0, 1, 1, 1)
fst.add_arc(1, 2, 1, 2)    # explicit transduction b->z
draw_svg(fst, "simple_fst", isymbols, osymbols)


## 3. Core Operations (`tutorial.py`)

### 3.1 Union, Concat, Closure

- `union([g1, g2, ...])`: accepts any string accepted by any input graph
- `concat(g1, g2)`: accepts strings `xy` where `x` in `g1`, `y` in `g2`
- `closure(g)`: Kleene closure, zero or more repetitions


In [None]:
u = gtn.union([g1, g2, g3])
c = gtn.concat(g2, g3)
k = gtn.closure(g2)

draw_svg(u, "union_graph", symbols)
draw_svg(c, "concat_graph", symbols)
draw_svg(k, "closure_graph", symbols)


### 3.2 Compose, Intersect, Project, Remove

- `compose(g1, g2)`: WFST composition (output labels of `g1` match input labels of `g2`)
- `intersect(g1, g2)`: acceptor intersection
- `project_input` / `project_output`: turn transducer to acceptor by selecting one side
- `remove(..., gtn.epsilon)`: epsilon-removal/simplification in many setups


In [None]:
# Two simple acceptors for intersection
a = gtn.Graph()
a.add_node(True)
a.add_node(False, True)
a.add_arc(0, 0, 0)
a.add_arc(0, 1, 1)
a.add_arc(1, 1, 2)

b = gtn.Graph()
for i in range(4):
    b.add_node(i == 0, i == 3)
for src in [0, 1, 2]:
    for lab in [0, 1, 2]:
        b.add_arc(src, src + 1, lab)

inter = gtn.intersect(a, b)
draw_svg(inter, "intersect_graph", symbols)


## 4. Forward/Viterbi and Autograd (`tutorial.py`)

GTN supports both differentiable marginal scoring (`forward_score`) and max-path decoding (`viterbi_score`, `viterbi_path`).

For differentiable objectives:

1. Build objective graph operations
2. Get scalar loss graph
3. Call `gtn.backward(loss)`
4. Read gradients from `.grad()`


In [None]:
g = gtn.Graph()
g.add_node(True)
g.add_node(True)
g.add_node()
g.add_node(False, True)
g.add_arc(0, 1, 0, 0, 1.1)
g.add_arc(0, 2, 1, 1, 3.2)
g.add_arc(1, 2, 2, 2, 1.4)
g.add_arc(2, 3, 0, 0, 2.1)

fs = gtn.forward_score(g)
vs = gtn.viterbi_score(g)
vp = gtn.viterbi_path(g)

print("forward score:", fs.item())
print("viterbi score:", vs.item())
draw_svg(vp, "viterbi_path", symbols)


In [None]:
# Autograd toy example
g1 = gtn.Graph()
g1.add_node(True)
g1.add_node()
g1.add_node(False, True)
g1.add_arc(0, 1, 0)
g1.add_arc(0, 1, 1)
g1.add_arc(1, 2, 0)
g1.add_arc(1, 2, 1)

g2 = gtn.Graph(False)  # no-grad graph
g2.add_node(True)
g2.add_node(False, True)
g2.add_arc(0, 0, 0)
g2.add_arc(0, 1, 1)

a = gtn.forward_score(gtn.compose(g1, g2))
b = gtn.forward_score(g1)
loss = gtn.subtract(b, a)
gtn.backward(loss)
print("g1 grad weights:", g1.grad().weights_to_list())
g1.zero_grad()


## 5. ASG Criterion (`asg.py`)

ASG can be written as:

$$\mathcal{L}_{ASG}=\log Z_{full} - \log Z_{target}$$

where:

- `Z_target`: constrained force-alignment graph score
- `Z_full`: unconstrained/full-connect score

Both are graph forward scores composed with emissions and transitions.


In [None]:
N = 27
T = 5
target = [2, 0, 19]

emissions = gtn.linear_graph(T, N)

transitions = gtn.Graph()
transitions.add_node(True)
for i in range(1, N + 1):
    transitions.add_node(False, True)
    transitions.add_arc(0, i, i - 1)
for i in range(N):
    for j in range(N):
        transitions.add_arc(i + 1, j + 1, j)

fal = gtn.Graph()
fal.add_node(True)
for idx, lab in enumerate(target, start=1):
    fal.add_node(False, idx == len(target))
    fal.add_arc(idx, idx, lab)
    fal.add_arc(idx - 1, idx, lab)

fal_align = gtn.compose(emissions, gtn.compose(fal, transitions))
full_align = gtn.compose(emissions, transitions)
loss_asg = gtn.subtract(gtn.forward_score(full_align), gtn.forward_score(fal_align))

print("ASG loss:", loss_asg.item())
gtn.backward(loss_asg)


## 6. CTC Criterion (`ctc.py`)

CTC adds a blank symbol and specific transition constraints.

Standard graph form:

$$\mathcal{L}_{CTC} = -\log Z_{ctc}$$

when emissions are already normalized per frame.


In [None]:
def create_ctc_target_graph(target, blank=0):
    L = len(target)
    U = 2 * L + 1
    ctc = gtn.Graph()
    for p in range(U):
        idx = (p - 1) // 2
        ctc.add_node(p == 0, p in (U - 1, U - 2))
        label = target[idx] if p % 2 else blank
        ctc.add_arc(p, p, label)
        if p > 0:
            ctc.add_arc(p - 1, p, label)
        if p % 2 and p > 1 and label != target[idx - 1]:
            ctc.add_arc(p - 2, p, label)
    return ctc

ctc_target = create_ctc_target_graph([3, 1, 20], blank=0)
emissions = gtn.linear_graph(5, 28)
ctc_align = gtn.compose(ctc_target, emissions)
ctc_loss = gtn.negate(gtn.forward_score(ctc_align))
print("CTC loss:", ctc_loss.item())
gtn.backward(ctc_loss)


## 7. Counting n-grams (`count_ngrams.py`)

A neat perspective: many symbolic counting problems can be phrased as graph composition + forward scoring.

Count extraction is done via `exp(log_count)`.


In [None]:
def make_chain(tokens):
    g = gtn.Graph(False)
    g.add_node(True)
    for i, tok in enumerate(tokens):
        g.add_node(False, i == len(tokens) - 1)
        g.add_arc(i, i + 1, tok)
    return g

def make_ngram_counter(n, num_tokens):
    g = gtn.linear_graph(n, num_tokens)
    for i in range(num_tokens):
        g.add_arc(0, 0, i, gtn.epsilon)
        g.add_arc(n, n, i, gtn.epsilon)
    return g

input_g = make_chain([0, 1, 0, 1])
ngram_g = make_chain([0, 1])
counter = make_ngram_counter(2, 2)
score = gtn.forward_score(gtn.compose(input_g, gtn.compose(counter, ngram_g)))
count = int(round(math.exp(score.item())))
print("count([0,1] in [0,1,0,1]) =", count)


## 8. Edit Distance (`edit_distance.py` and `sequence_alignment.py`)

Levenshtein distance via weighted edit transducer:

- substitution cost: 0 for match, -1 for mismatch
- insertion/deletion cost: -1

Then distance is `-viterbi_score(compose(x, compose(edits, y)))`.


In [None]:
def make_edits_graph(num_tokens):
    edits = gtn.Graph(False)
    edits.add_node(True, True)
    for i in range(num_tokens):
        for j in range(num_tokens):
            edits.add_arc(0, 0, i, j, -int(i != j))
        edits.add_arc(0, 0, i, gtn.epsilon, -1)
        edits.add_arc(0, 0, gtn.epsilon, i, -1)
    return edits

x = make_chain([0, 1, 0, 1])
y = make_chain([0, 0, 0, 1, 1])
edits = make_edits_graph(5)
d = int(-gtn.viterbi_score(gtn.compose(x, gtn.compose(edits, y))).item())
print("Levenshtein distance:", d)


## 9. Learned Decompositions (`word_decompositions.py`, `learned_decompositions.py`, `priors.py`)

These examples show a major GTN advantage: you can inject structure/priors by graph construction.

### Key idea

Build token/decomposition graphs and compose with emissions/targets to define training criteria with custom alignments.

This unifies:

- ASG-like constraints
- CTC-like blank behavior
- alternative subword decompositions
- hand-designed bias constraints


In [None]:
# Mini decomposition example from word_decompositions.py
letters = {"a": 0, "b": 1, "c": 2}
idx_to_let = {v: k for k, v in letters.items()}

word_pieces = ["a", "b", "c", "ab", "bc", "ac", "abc"]
idx_to_wp = dict(enumerate(word_pieces))

def lexicon_graph(word_pieces, letters_to_idx):
    lex = []
    for i, wp in enumerate(word_pieces):
        g = gtn.Graph()
        g.add_node(True)
        for e, l in enumerate(wp):
            g.add_node(False, e == len(wp) - 1)
            out = i if e == len(wp) - 1 else gtn.epsilon
            g.add_arc(e, e + 1, letters_to_idx[l], out)
        lex.append(g)
    return gtn.closure(gtn.union(lex))

def token_graph(tokens):
    gs = []
    for i in range(len(tokens)):
        g = gtn.Graph()
        g.add_node(True)
        g.add_node(False, True)
        g.add_arc(0, 1, i, i)
        g.add_arc(1, 1, i, gtn.epsilon)
        gs.append(g)
    return gtn.closure(gtn.union(gs))

lex = lexicon_graph(word_pieces, letters)
tokens = token_graph(word_pieces)

abc = gtn.Graph(False)
abc.add_node(True)
abc.add_node()
abc.add_node()
abc.add_node(False, True)
abc.add_arc(0, 1, letters["a"])
abc.add_arc(1, 2, letters["b"])
abc.add_arc(2, 3, letters["c"])

abc_decomps = gtn.remove(gtn.project_output(gtn.compose(abc, lex)))
abc_align = gtn.project_input(gtn.remove(gtn.compose(tokens, abc_decomps)))
draw_svg(abc_align, "abc_alignments", idx_to_wp)


## 10. Biological Sequence Alignment (`sequence_alignment.py`)

The sequence alignment example builds a scoring transducer using BLOSUM and gap penalties, then decodes best alignment with Viterbi.

Needleman-Wunsch (global) vs Smith-Waterman (local) is represented by start/accept state choices.


In [None]:
RESIDUE_MAP = {r: i for i, r in enumerate("ARNDCQEGHILKMFPSTWYV")}

def resolve_blosum_path():
    candidates = [
        Path("../../bindings/python/examples/blosum.json"),
        Path("bindings/python/examples/blosum.json"),
        Path("../bindings/python/examples/blosum.json"),
    ]
    for p in candidates:
        if p.exists():
            return p
    raise FileNotFoundError("Could not locate blosum.json")

BLOSUM = json.loads(resolve_blosum_path().read_text())

def make_score_graph(gap_open=-8, gap_add=-8):
    g = gtn.Graph()
    g.add_node(True, True)
    affine = (gap_open != gap_add)
    if affine:
        g.add_node(False, True)
        g.add_node(False, True)
    for k, v in BLOSUM.items():
        r1, r2 = k
        g.add_arc(0, 0, RESIDUE_MAP[r1], RESIDUE_MAP[r2], v)
        if affine:
            g.add_arc(1, 0, RESIDUE_MAP[r1], RESIDUE_MAP[r2], v)
            g.add_arc(2, 0, RESIDUE_MAP[r1], RESIDUE_MAP[r2], v)
    for r in RESIDUE_MAP.values():
        if affine:
            g.add_arc(0, 1, r, gtn.epsilon, gap_open)
            g.add_arc(1, 1, r, gtn.epsilon, gap_add)
            g.add_arc(0, 2, gtn.epsilon, r, gap_open)
            g.add_arc(2, 2, gtn.epsilon, r, gap_add)
        else:
            g.add_arc(0, 0, r, gtn.epsilon, gap_open)
            g.add_arc(0, 0, gtn.epsilon, r, gap_open)
    return g

def make_seq_graph(seq, alg="nw"):
    g = gtn.Graph()
    start = (alg == "sw")
    accept = (alg == "sw")
    g.add_node(start=True, accept=accept)
    for e, s in enumerate(seq):
        g.add_node(start=start, accept=accept or (e == len(seq) - 1))
        g.add_arc(e, e + 1, RESIDUE_MAP[s])
    return g

seq_a = "HEAGAWGHEE"
seq_b = "PAWHEAE"
score_g = make_score_graph()
ali = gtn.compose(gtn.compose(make_seq_graph(seq_a, "nw"), score_g), make_seq_graph(seq_b, "nw"))
print("NW score:", gtn.viterbi_score(ali).item())


## 11. Linear-Chain CRF (`linear_crf.py`)

Linear CRF with GTN is usually represented as:

$$\mathcal{L}=\log Z(x)-s(x,y)$$

where:

- `s(x,y)` is score of the target label path graph
- `Z(x)` is partition function from all-path graph

The script `linear_crf.py` contains a full trainable example.

Suggested workflow in notebook:

1. Open `bindings/python/examples/linear_crf.py`
2. Keep data generation and training loop as-is
3. Use notebook cells to inspect intermediate graphs (`target_graph`, `norm_graph`)


## 12. PyTorch Integration (`pytorch_loss.py`)

GTN can be used inside custom autograd functions to compute sequence losses while training neural nets in PyTorch.

The `pytorch_loss.py` example demonstrates:

- converting NN emissions into GTN graph weights
- computing sequence objective with GTN
- backpropagating to PyTorch tensors


In [None]:
display(Markdown("Run this from repo root to execute the full PyTorch example: `python bindings/python/examples/pytorch_loss.py`"))


## 13. Suggested Learning Path (Curriculum)

### Stage A: Foundations

1. `simple_graph.py`
2. `tutorial.py` sections on core ops and forward/viterbi

### Stage B: Sequence Criteria

3. `asg.py`
4. `ctc.py`
5. `priors.py`

### Stage C: Symbolic Algorithms

6. `count_ngrams.py`
7. `edit_distance.py`
8. `sequence_alignment.py`

### Stage D: Rich Structure and Training

9. `word_decompositions.py`
10. `learned_decompositions.py`
11. `linear_crf.py`
12. `pytorch_loss.py`


## 14. Practical Tips

- Prefer log-space reasoning for numerical stability.
- Keep graphs acyclic for `forward_score` unless you know behavior for cycles.
- Use `remove(..., gtn.epsilon)` strategically for simpler/faster compositions.
- Inspect with Graphviz early (`draw_svg`) when debugging.
- Reset gradients with `zero_grad()` before reusing graphs in iterative loops.
- Keep criterion definitions declarative: compose constraints from reusable graph blocks.


## 15. Where to Go Next

- Reproduce each script in `bindings/python/examples` directly.
- Turn one criterion (ASG/CTC/CRF) into your own task-specific graph.
- Profile composition order; associativity allows multiple equivalent factorizations with very different speed.
- Explore CUDA backend when available.
