Saiki (採記) is a small toolkit for Anki-based language learning workflows:
listening playlists, word mining, YouTube transcript mining, TTS sentence
imports, and known/new word comparison.
The name is a coined Japanese compound from 採 as in gathering/collecting and
記 as in remembering or recording. Pronunciation: saiki, roughly
"sigh-key".
./saiki.py --help- Python 3.12 recommended
- Anki with AnkiConnect
ffmpeg- Python dependencies from
requirements.txt - spaCy models for word mining:
python -m spacy download es_core_news_sm
python -m spacy download ja_core_news_lgSetup example:
python3.12 -m venv ~/.venv/saiki
source ~/.venv/saiki/bin/activate
python3 -m pip install -U pip
pip install -r requirements.txt
sudo dnf install ffmpegDefaults are built in, but you can override them with YAML:
~/.config/saiki/config.yamlOr pass a config explicitly:
./saiki.py --config ./config.yaml words jpExample:
anki_connect_url: http://localhost:8765
media_dir: ~/.var/app/net.ankiweb.Anki/data/Anki2/User 1/collection.media
audio_output_root: ~/Languages/Anki/anki-audio
word_output_root: ~/Languages/Anki/anki-words
sentence_dir: ~/Languages/Anki
note_model: Basic
fields:
front: Front
back: Back
languages:
jp:
name: japanese
transcript_code: ja
tts_code: ja
tts_tld: com
tts_tempo: 1.35
decks: ["日本語"]
field: Back
word_model: ja_core_news_lg
sentence_file: sentences_jp.txt
es:
name: spanish
transcript_code: es
tts_code: es
tts_tld: es
tts_tempo: 1.25
decks: ["Español"]
field: Back
word_model: es_core_news_sm
sentence_file: sentences_es.txtA copyable template is also available at examples/config.yaml.
Supported language codes by default:
jpes
Extract audio referenced by [sound:...] tags from configured decks and create
an .m3u playlist.
./saiki.py audio jp
./saiki.py audio es --concat
./saiki.py audio jp --media-dir ~/.local/share/Anki2/User\ 1/collection.media --copy-only-newOutputs go to ~/Languages/Anki/anki-audio/<language>/ by default.
Extract frequent words from Anki notes using AnkiConnect and spaCy.
./saiki.py words jp
./saiki.py words es --deck "Español"
./saiki.py words es --query 'deck:"Español" tag:youtube'
./saiki.py words jp --min-freq 3 --out words_jp.txt
./saiki.py words jp --full-fieldOutput format:
word frequency
Examples:
comer 12
hablar 9
行く (行き) 8
見る (見た) 6
Mine vocabulary or sentence rows from YouTube subtitles.
./saiki.py youtube es VIDEO_ID
./saiki.py youtube es VIDEO_ID --top 50
./saiki.py youtube jp VIDEO_ID --mode sentences
./saiki.py youtube es VIDEO_ID --raw --no-stopwordsExport Anki-ready sentence rows:
./saiki.py youtube es VIDEO_ID --mode sentences --out youtube.tsvExport only rows that appear to contain unknown vocabulary:
./saiki.py youtube es VIDEO_ID \
--mode sentences \
--out youtube_new.tsv \
--known-words ~/Languages/Anki/anki-words/spanish/words_es.txt \
--only-newSentence exports contain:
sentence timestamp video_url vocab_guess
Generate TTS audio and add sentence cards to Anki.
./saiki.py import es
./saiki.py import jp ~/Languages/Anki/sentences_jp.txt
./saiki.py import es youtube.tsv --tags youtube,manualThe importer accepts plain text sentence files and TSV/CSV files with a
sentence column. text-to-speech is always added as a tag. If --tags is not
provided, AI-generated is added.
Compare any generated word list against an existing known list:
./saiki.py compare-words transcript_words.txt ~/Languages/Anki/anki-words/spanish/words_es.txtThis prints entries from the first file whose word key does not appear in the second file.
The default configuration assumes Basic notes with audio on Front and the
target-language sentence on Back. Word mining reads only the first visible
line by default; use --full-field to process the whole field.
- Add support for different Anki note/card types, including configurable field mappings per language and per import workflow.
- Support multiple import profiles, such as sentence cards, vocab cards, audio cards, and cloze cards.
- Let YouTube exports map directly into configurable note fields, not just a
fixed
sentencecolumn. - Add richer transcript filtering, such as minimum/maximum sentence length, duplicate removal, and punctuation cleanup.
- Add optional audio slicing from videos when timestamp data is available.
- Improve known/new word matching with better lemmatization for transcript vocabulary.
- Add more language profiles beyond Japanese and Spanish.
- Add a dry-run mode for imports that previews notes before sending anything to AnkiConnect.
- Build a GUI for common workflows like transcript review, sentence selection, import previews, and configuration editing.
- Add integration tests with mocked AnkiConnect responses.
- Add shell completion or a small installed command once packaging becomes useful.
Pure logic tests use the standard library test runner:
python -m unittest discover -s testsThis project is licensed under the MIT License. See LICENSE.
