Skip to content

A distilled, offline-first Kanji database for Node.js. 13,000+ characters mapped into lazy-loading shards for zero dependencies and instant serverless cold starts.

License

Notifications You must be signed in to change notification settings

sepTN/kanji-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

kanji-data πŸ‘Ή β€” Offline Kanji Database for Node.js

License: MIT NPM Version NPM Downloads

A distilled, offline-first kanji database for Node.js with zero dependencies. Provides instant access to 13,000+ kanji characters and vocabulary, optimized with lazy-loading shards for memory-constrained serverless environments.

⚑️ Production Use: This library is used to assist in compiling the comprehensive kanji data for Jepang.org.


The Problem

Typically, accessing a comprehensive Japanese dictionary offline means parsing a massive 100MB+ JSON file.

  • Loading a file that large blocks the Node.js event loop, resulting in terrible app startup times.
  • It easily consumes 300MB+ of RAM once parsed, which instantly crashes serverless environments (like AWS Lambda, Vercel, or Netlify).
  • Relying on local databases (like SQLite) often introduces bulky C++ dependencies (node-gyp) that cause cross-platform installation errors.

The Solution

kanji-data solves the memory problem using build-time data sharding and lazy evaluation.

Instead of shipping one massive file, the database is pre-compiled into tiny optimized chunks. Core metadata is loaded instantly, while massive vocabulary lists are split by Unicode hex-prefix and only loaded into memory (~1MB at a time) exactly when requested.

Features

  • πŸ“¦ Zero Dependencies: Pure JavaScript and JSON. No databases, no binaries.
  • ⚑ Serverless Ready: Cold starts are nearly instantaneous with a tiny memory footprint.
  • πŸ“΄ 100% Offline: No API keys, no rate limits, no network latency.
  • 🧠 Smart Caching: Chunks are cached in memory after the first read for lightning-fast subsequent queries.
  • πŸ”· TypeScript Ready: Full .d.ts type definitions included.

Installation

npm install kanji-data

Usage

const kanji = require('kanji-data');

// 1. Get core kanji metadata (meanings, readings, stroke count, etc.)
const neko = kanji.get('猫');
console.log(neko.meanings);       // ['cat']
console.log(neko.kun_readings);   // ['ねこ']
console.log(neko.jlpt);           // 3
console.log(neko.stroke_count);   // 11

// 2. Fetch vocabulary containing a specific kanji
// (lazily loads the required ~1MB vocabulary shard on first call)
const nekoWords = kanji.getWords('猫');
console.log(nekoWords[0]);
/*
{
  "variants": [
    { "written": "猫", "pronounced": "ねこ", "priorities": ["spec1"] }
  ],
  "meanings": [
    { "glosses": ["cat"] }
  ]
}
*/

// 3. Get lists of kanji by JLPT level (N5 to N1)
const n5Kanji = kanji.getJlpt(5);
console.log(n5Kanji); // ['δΈ€', '二', 'δΈ‰', 'ζ—₯', '月', ...]

// 4. Get lists of kanji by school grade
const grade1 = kanji.getGrade(1);
console.log(grade1); // ['δΈ€', '右', '雨', '円', 'ηŽ‹', ...]

API Reference

get(character: string): KanjiMetadata | null

Returns core metadata for a given kanji character. Returns null if not found.

{
  kanji: "猫",
  grade: 8,                      // School grade (1–6, 8–9) or null
  stroke_count: 11,
  meanings: ["cat"],
  kun_readings: ["ねこ"],
  on_readings: ["ビョウ"],
  name_readings: [],
  jlpt: 3,                       // JLPT level (1–5) or null
  unicode: "732B",
  heisig_en: "cat",              // Heisig keyword (may be null)
  freq_mainichi_shinbun: 1702,   // Newspaper frequency rank (may be null)
  notes: []
}

getWords(character: string): Word[]

Returns an array of vocabulary words that use the specified kanji. Returns [] if none found.

Uses lazy loading β€” the first call reads a ~1MB shard from disk and caches it. Subsequent calls in the same shard are instantaneous.

{
  variants: [
    {
      written: "猫",
      pronounced: "ねこ",
      priorities: ["spec1", "ichi1"]   // frequency lists (may be empty)
    }
  ],
  meanings: [
    { glosses: ["cat"] }
  ]
}

getJlpt(level: number): string[]

Returns kanji in the specified JLPT level (1–5). Returns [] for invalid levels.

kanji.getJlpt(5);  // ['δΈ€', '二', 'δΈ‰', ...]  ← N5 (easiest)
kanji.getJlpt(1);  // ['θΉ΄', 'δΈ²', '厨', ...]  ← N1 (hardest)

getGrade(grade: number): string[]

Returns kanji taught in the specified Japanese school grade. Returns [] for grades with no data.

Grade Level
1–6 Elementary school (ζ•™θ‚²ζΌ’ε­—)
8 Secondary school / Jōyō kanji not in grades 1–6
9 Jinmeiyō kanji (used in names)
kanji.getGrade(1); // ['δΈ€', '右', '雨', ...]
kanji.getGrade(8); // ['亜', 'ε“€', '揑', ...]

Examples

The examples/ directory contains a fully interactive console quiz that demos the package.

# Run the quiz directly (data is included!)
node examples/quiz.js

# Options
node examples/quiz.js --level=5      # N5 only (easiest, 79 kanji)
node examples/quiz.js --level=3      # N5–N3 (default, ~600 kanji)
node examples/quiz.js --rounds=20    # longer session

Each round presents a 4-option multiple-choice question β€” either "guess the meaning" or "which kanji matches this reading". After every answer it shows example vocabulary words loaded live from the data shards.

Contributing

Found a bug or want to improve the data pipeline? PRs are welcome!

  • Bug reports β†’ GitHub Issues
  • The raw data lives in references/kanjiapi_full.json
  • Run npm test before submitting a PR

About

kanji-data is an npm package authored and maintained by Septian Ganendra S. K. at Jepang.org β€” Indonesia's comprehensive Japanese learning platform. This package optimizes and repackages the kanjiapi.dev dataset into lazy-loading shards for production Node.js use.

πŸ“š If you use this package in your project, we'd appreciate a link back to Jepang.org! It helps us continue maintaining and expanding this free resource for Japanese learners worldwide.

Related Packages

  • kanji-png β€” Generate kanji PNGs and animated stroke-order GIFs.
  • kotowaza β€” Japanese proverbs (ことわざ) dataset with bilingual meanings and JLPT levels.

Attribution & License

This package is licensed under the MIT License β€” see LICENSE for details.

The underlying dictionary data originates from kanjiapi.dev (MIT), which uses the EDICT and KANJIDIC dictionary files β€” the property of the Electronic Dictionary Research and Development Group, used in conformance with the Group's licence. JLPT level data sourced from Jonathan Waller's JLPT Resources.


MIT Β© Septian Ganendra S. K.

About

A distilled, offline-first Kanji database for Node.js. 13,000+ characters mapped into lazy-loading shards for zero dependencies and instant serverless cold starts.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published