Modernize deps (drop torchtext, HF datasets) and address open issues by keon · Pull Request #31 · keon/seq2seq

keon · 2026-05-18T04:00:17Z

Summary

Drop deprecated torchtext (legacy Field / Multi30k.splits was removed in 0.9+, and the package itself was deprecated entirely in 2024). Load Multi30k via HuggingFace datasets instead.
Switch additive attention from relu to tanh to match Bahdanau et al. 2014 — the README already claims this architecture.
Add greedy inference mode: Seq2Seq.forward(src, trg=None, max_len, sos) now works without a target.
Remove deprecated torch.autograd.Variable; auto-detect CUDA / Apple MPS / CPU in train.py.
Pin modern versions in requirements.txt; update README to new spacy model names.

Issues addressed

Closes The Pytorch version？ #10, torchtext Multi30k #11, What's the exact Pytorch and Torchtext version for your code? I am trying to downgrade to a previous version in order to avoid the Multi30k.split() problem but failed. #12 — torchtext / Multi30k version breakage.
Closes about the way to calculate attention weight #15, Why using relu to compute additaive attention #28 — relu vs tanh in additive attention.
Closes don't have the inference mode? #22 — no inference mode.

Test plan

Verified end-to-end on CPU in a fresh venv:

pip install -r requirements.txt succeeds against current PyPI.
python -m spacy download {de_core_news_sm,en_core_web_sm} succeeds.
utils.load_dataset(8) builds vocabs (7853 DE / 9797 EN) and yields 3625 / 127 / 125 train/val/test batches from bentrevett/multi30k.
3 training steps run; loss drops 9.20 → 9.15 → 9.10.
Seq2Seq.forward(src, trg=None, max_len=20, sos=2) produces predictions with no target.
Eval loop runs cleanly over val batches.

🤖 Generated with Claude Code

- Drop deprecated torchtext; load Multi30k via HuggingFace `datasets` (closes #10, #11, #12). - Use tanh in additive attention to match Bahdanau et al. 2014 (closes #15, #28). - Add inference mode: `Seq2Seq.forward(src, trg=None, max_len, sos)` performs greedy decoding without a target (closes #22). - Remove deprecated `torch.autograd.Variable`; auto-detect CUDA / Apple MPS / CPU in train.py. - Pin modern versions in requirements.txt (torch>=2.0, datasets>=2.14, spacy>=3.7) and update README install instructions to new spacy model names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses #14: 100 epochs at fixed lr on 30k-example Multi30k drives the train loss to ~0 while val plateaus. ReduceLROnPlateau halves the LR after 2 stagnant val epochs; early stopping breaks after `-patience` epochs without val improvement (default 5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address #31 review: Bahdanau s_0 bridge, safe inference, single best.pt

…e inference Addresses code review feedback on PR keon#31: - HIGH: drop `inplace=True` from decoder embedding dropout (autograd footgun) - HIGH: write `outputs[0]` one-hot for the start token so callers can do `outputs.argmax(-1)` without getting a UNK at position 0 - MEDIUM: replace `hidden[:n_layers]` slice (which silently kept only the encoder's *forward* direction) with a Bahdanau §A.2.2 bridge: s_0 = tanh(W_s · ←h_1) from the encoder's last backward state - MEDIUM: save a single `best.pt` instead of per-epoch `seq2seq_{e}.pt` - MEDIUM: pre-tokenize via `dataset.map` so spaCy doesn't re-run every epoch - LOW: simplify attention `v` init (drop redundant `torch.rand`) Re-verified on CPU: 500 batches, train 9.19 → 4.84, val 4.93 (vs 4.97 before the bridge — slight improvement, otherwise same dynamics). `outputs[0].argmax()` correctly returns SOS for all batch rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

keon and others added 3 commits May 17, 2026 23:59

docs: add Train section with CPU sanity-check loss curve

60bc53c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced May 18, 2026

About overfitting #14

Closed

A problem with loss computation. #23

Closed

keon merged commit 7c243a0 into master May 18, 2026

keon added a commit that referenced this pull request May 18, 2026

Merge pull request #32 from keon/review-fixes

3636454

Address #31 review: Bahdanau s_0 bridge, safe inference, single best.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modernize deps (drop torchtext, HF datasets) and address open issues#31

Modernize deps (drop torchtext, HF datasets) and address open issues#31
keon merged 3 commits into
masterfrom
modernize-deps-and-fixes

keon commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

keon commented May 18, 2026

Summary

Issues addressed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant