Modernize deps (drop torchtext, HF datasets) and address open issues#31
Merged
Conversation
- Drop deprecated torchtext; load Multi30k via HuggingFace `datasets` (closes #10, #11, #12). - Use tanh in additive attention to match Bahdanau et al. 2014 (closes #15, #28). - Add inference mode: `Seq2Seq.forward(src, trg=None, max_len, sos)` performs greedy decoding without a target (closes #22). - Remove deprecated `torch.autograd.Variable`; auto-detect CUDA / Apple MPS / CPU in train.py. - Pin modern versions in requirements.txt (torch>=2.0, datasets>=2.14, spacy>=3.7) and update README install instructions to new spacy model names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses #14: 100 epochs at fixed lr on 30k-example Multi30k drives the train loss to ~0 while val plateaus. ReduceLROnPlateau halves the LR after 2 stagnant val epochs; early stopping breaks after `-patience` epochs without val improvement (default 5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 18, 2026
Closed
This was referenced May 18, 2026
Closed
keon
added a commit
that referenced
this pull request
May 18, 2026
Address #31 review: Bahdanau s_0 bridge, safe inference, single best.pt
pull Bot
pushed a commit
to vishalbelsare/seq2seq
that referenced
this pull request
May 18, 2026
…e inference Addresses code review feedback on PR keon#31: - HIGH: drop `inplace=True` from decoder embedding dropout (autograd footgun) - HIGH: write `outputs[0]` one-hot for the start token so callers can do `outputs.argmax(-1)` without getting a UNK at position 0 - MEDIUM: replace `hidden[:n_layers]` slice (which silently kept only the encoder's *forward* direction) with a Bahdanau §A.2.2 bridge: s_0 = tanh(W_s · ←h_1) from the encoder's last backward state - MEDIUM: save a single `best.pt` instead of per-epoch `seq2seq_{e}.pt` - MEDIUM: pre-tokenize via `dataset.map` so spaCy doesn't re-run every epoch - LOW: simplify attention `v` init (drop redundant `torch.rand`) Re-verified on CPU: 500 batches, train 9.19 → 4.84, val 4.93 (vs 4.97 before the bridge — slight improvement, otherwise same dynamics). `outputs[0].argmax()` correctly returns SOS for all batch rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
torchtext(legacyField/Multi30k.splitswas removed in 0.9+, and the package itself was deprecated entirely in 2024). Load Multi30k via HuggingFacedatasetsinstead.relutotanhto match Bahdanau et al. 2014 — the README already claims this architecture.Seq2Seq.forward(src, trg=None, max_len, sos)now works without a target.torch.autograd.Variable; auto-detect CUDA / Apple MPS / CPU intrain.py.requirements.txt; update README to new spacy model names.Issues addressed
reluvstanhin additive attention.Test plan
Verified end-to-end on CPU in a fresh venv:
pip install -r requirements.txtsucceeds against current PyPI.python -m spacy download {de_core_news_sm,en_core_web_sm}succeeds.utils.load_dataset(8)builds vocabs (7853 DE / 9797 EN) and yields 3625 / 127 / 125 train/val/test batches frombentrevett/multi30k.Seq2Seq.forward(src, trg=None, max_len=20, sos=2)produces predictions with no target.🤖 Generated with Claude Code