A distilled, offline-first kanji database for Node.js with zero dependencies. Provides instant access to 13,000+ kanji characters and vocabulary, optimized with lazy-loading shards for memory-constrained serverless environments.
β‘οΈ Production Use: This library is used to assist in compiling the comprehensive kanji data for Jepang.org.
- Repository: github.com/sepTN/kanji-data
- Documentation: jepang.org/kanji-data
- npm: npmjs.com/package/kanji-data
Typically, accessing a comprehensive Japanese dictionary offline means parsing a massive 100MB+ JSON file.
- Loading a file that large blocks the Node.js event loop, resulting in terrible app startup times.
- It easily consumes 300MB+ of RAM once parsed, which instantly crashes serverless environments (like AWS Lambda, Vercel, or Netlify).
- Relying on local databases (like SQLite) often introduces bulky C++ dependencies (
node-gyp) that cause cross-platform installation errors.
kanji-data solves the memory problem using build-time data sharding and lazy evaluation.
Instead of shipping one massive file, the database is pre-compiled into tiny optimized chunks. Core metadata is loaded instantly, while massive vocabulary lists are split by Unicode hex-prefix and only loaded into memory (~1MB at a time) exactly when requested.
- π¦ Zero Dependencies: Pure JavaScript and JSON. No databases, no binaries.
- β‘ Serverless Ready: Cold starts are nearly instantaneous with a tiny memory footprint.
- π΄ 100% Offline: No API keys, no rate limits, no network latency.
- π§ Smart Caching: Chunks are cached in memory after the first read for lightning-fast subsequent queries.
- π· TypeScript Ready: Full
.d.tstype definitions included.
npm install kanji-dataconst kanji = require('kanji-data');
// 1. Get core kanji metadata (meanings, readings, stroke count, etc.)
const neko = kanji.get('η«');
console.log(neko.meanings); // ['cat']
console.log(neko.kun_readings); // ['γγ']
console.log(neko.jlpt); // 3
console.log(neko.stroke_count); // 11
// 2. Fetch vocabulary containing a specific kanji
// (lazily loads the required ~1MB vocabulary shard on first call)
const nekoWords = kanji.getWords('η«');
console.log(nekoWords[0]);
/*
{
"variants": [
{ "written": "η«", "pronounced": "γγ", "priorities": ["spec1"] }
],
"meanings": [
{ "glosses": ["cat"] }
]
}
*/
// 3. Get lists of kanji by JLPT level (N5 to N1)
const n5Kanji = kanji.getJlpt(5);
console.log(n5Kanji); // ['δΈ', 'δΊ', 'δΈ', 'ζ₯', 'ζ', ...]
// 4. Get lists of kanji by school grade
const grade1 = kanji.getGrade(1);
console.log(grade1); // ['δΈ', 'ε³', 'ι¨', 'ε', 'η', ...]Returns core metadata for a given kanji character. Returns null if not found.
{
kanji: "η«",
grade: 8, // School grade (1β6, 8β9) or null
stroke_count: 11,
meanings: ["cat"],
kun_readings: ["γγ"],
on_readings: ["γγ§γ¦"],
name_readings: [],
jlpt: 3, // JLPT level (1β5) or null
unicode: "732B",
heisig_en: "cat", // Heisig keyword (may be null)
freq_mainichi_shinbun: 1702, // Newspaper frequency rank (may be null)
notes: []
}Returns an array of vocabulary words that use the specified kanji. Returns [] if none found.
Uses lazy loading β the first call reads a ~1MB shard from disk and caches it. Subsequent calls in the same shard are instantaneous.
{
variants: [
{
written: "η«",
pronounced: "γγ",
priorities: ["spec1", "ichi1"] // frequency lists (may be empty)
}
],
meanings: [
{ glosses: ["cat"] }
]
}Returns kanji in the specified JLPT level (1β5). Returns [] for invalid levels.
kanji.getJlpt(5); // ['δΈ', 'δΊ', 'δΈ', ...] β N5 (easiest)
kanji.getJlpt(1); // ['θΉ΄', 'δΈ²', 'ε¨', ...] β N1 (hardest)Returns kanji taught in the specified Japanese school grade. Returns [] for grades with no data.
| Grade | Level |
|---|---|
| 1β6 | Elementary school (ζθ²ζΌ’ε) |
| 8 | Secondary school / JΕyΕ kanji not in grades 1β6 |
| 9 | JinmeiyΕ kanji (used in names) |
kanji.getGrade(1); // ['δΈ', 'ε³', 'ι¨', ...]
kanji.getGrade(8); // ['δΊ', 'ε', 'ζ‘', ...]The examples/ directory contains a fully interactive console quiz that demos the package.
# Run the quiz directly (data is included!)
node examples/quiz.js
# Options
node examples/quiz.js --level=5 # N5 only (easiest, 79 kanji)
node examples/quiz.js --level=3 # N5βN3 (default, ~600 kanji)
node examples/quiz.js --rounds=20 # longer sessionEach round presents a 4-option multiple-choice question β either "guess the meaning" or "which kanji matches this reading". After every answer it shows example vocabulary words loaded live from the data shards.
Found a bug or want to improve the data pipeline? PRs are welcome!
- Bug reports β GitHub Issues
- The raw data lives in
references/kanjiapi_full.json - Run
npm testbefore submitting a PR
kanji-data is an npm package authored and maintained by Septian Ganendra S. K. at Jepang.org β Indonesia's comprehensive Japanese learning platform. This package optimizes and repackages the kanjiapi.dev dataset into lazy-loading shards for production Node.js use.
π If you use this package in your project, we'd appreciate a link back to Jepang.org! It helps us continue maintaining and expanding this free resource for Japanese learners worldwide.
- kanji-png β Generate kanji PNGs and animated stroke-order GIFs.
- kotowaza β Japanese proverbs (γγ¨γγ) dataset with bilingual meanings and JLPT levels.
This package is licensed under the MIT License β see LICENSE for details.
The underlying dictionary data originates from kanjiapi.dev (MIT), which uses the EDICT and KANJIDIC dictionary files β the property of the Electronic Dictionary Research and Development Group, used in conformance with the Group's licence. JLPT level data sourced from Jonathan Waller's JLPT Resources.