I want to learn Japanese vocabulary and I've heard good things about Anki. However, manually editing the cards is a pain. I want a script that takes a word as the input and generate a card with detailed definitions for me and add the card to the deck (the latter can be achieved via anki-connect). Parsing definition webpages from e.g. goo dictionary, which are highly unstructured, proved difficult, and thanks to Fabian's blog post, I actually found it is easier to reverse-engineer Apple's dictionaries, which are also a more authorative source.
- python 3.9.5 with lxml 4.6.3
- rustc (cargo) 1.52.1 (can be installed from https://rustup.rs/)
Haven't tested with lower versions, but you can try.
Prefer building on Linux. The lxml library on MacOS may fail for no reason.
-
Extract raw dictionary data (in
raw/) into json, stored inextract/.mkdir extract && python extract.py -
Convert json into the bincode format, by which Rust programs can decode efficiently
cd conv cargo run cd .. -
Build the python module
cd pyjisho cargo build --release cd .. -
Copy or link the dynamic library to the directory where Python can
import jisholn -s jisho/target/release/libjisho.so jisho.so # libjisho.dylib on MacOS -
Now you can
import jisho, orimport pyjisho, which is a higher-level wrapper. Seetest.pyfor example usage and runpython test.pyto check example output.
See out/result.html, which should be the result of running python test.py
The styles do not look exactly the same in all browsers, because CSS attributes that are specific to Apple systems are extensively used.
With regard to the font, on macOS the native fonts will be used. On other systems where these fonts are not available, the fallback font is Noto Sans CJK JP, a high quality open-source font that can be installed from here.
- building on macOS
- why does lxml randomly crash?
- rewrite in Rust
- why does flate2's zlib decoder behave differently from Python's?
- what crate is comparable to lxml?
