🧶 Himotoki (紐解き)

Himotoki (紐解き, "unraveling" or "untying strings") is a Python remake of ichiran, the comprehensive Japanese morphological analyzer. It provides sophisticated text segmentation, dictionary lookup, and conjugation analysis, all powered by a portable SQLite backend.

✨ Key Features

🚀 Fast & Portable: Uses SQLite for rapid dictionary lookups without the need for a complex PostgreSQL setup.
🧠 Smart Segmentation: Employs dynamic programming (Viterbi-style) to find the most linguistically plausible segmentation.
📚 Deep Dictionary Integration: Built on JMDict, providing rich metadata, glosses, and part-of-speech information.
🔄 Advanced Deconjugation: Recursively traces conjugated verbs and adjectives back to their dictionary forms.
📊 Scoring Engine: Implements the "synergy" and penalty rules from ichiran to ensure high-quality results.
🛠️ Developer Friendly: Clean Python API and a robust CLI for quick analysis.

🚀 Getting Started

Installation

pip install himotoki

First-Time Setup

On first use, Himotoki will prompt you to download and initialize the dictionary database:

himotoki "日本語テキスト"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🧶 Welcome to Himotoki!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

First-time setup required. This will:
  • Download JMdict dictionary data (~15MB compressed)
  • Generate optimized SQLite database (~3GB)
  • Store data in ~/.himotoki/

Proceed with setup? [Y/n]:

⚠️ Disk Space: The database requires approximately 3GB of free disk space.
The setup process takes approximately 10-20 minutes to complete.

You can also run setup manually:

himotoki setup            # Interactive setup
himotoki setup --yes      # Non-interactive (for scripts/CI)

Quick CLI Usage

Analyze Japanese text directly from your terminal:

# Default: Dictionary info only
himotoki "学校で勉強しています"

# Simple romanization
himotoki -r "学校で勉強しています"

# Full output (romanization + dictionary info)
himotoki -f "学校で勉強しています"

# Kana reading with spaces
himotoki -k "学校で勉強しています"

# JSON output for integration
himotoki -j "学校で勉強しています"

Python API Example

Integrate Himotoki into your own projects with ease:

import himotoki

# Optional: pre-warm caches for faster first request
himotoki.warm_up()

# Analyze Japanese text
results = himotoki.analyze("日本語を勉強しています")

for words, score in results:
    for w in words:
        print(f"{w.text} 【{w.kana}】 - {w.gloss[:50]}...")

🏗️ Project Structure

Himotoki is designed with modularity in mind, keeping the database, logic, and output layers distinct.

himotoki/
├── himotoki/          # Main package
│   ├── 🧠 segment.py    # Pathfinding and segmentation logic
│   ├── 📖 lookup.py     # Dictionary retrieval and scoring
│   ├── 🔄 constants.py  # Shared constants and SEQ definitions
│   ├── 🗄️ db/           # SQLAlchemy models and connection
│   ├── 📚 loading/      # JMdict and conjugation loaders
│   └── 🖥️ cli.py        # Command line interface
├── scripts/           # Developer tools
│   ├── compare.py       # Ichiran comparison suite
│   ├── init_db.py       # Database initialization
│   └── report.py        # HTML report generator
├── tests/             # Test suite
├── data/              # Dictionary data files
├── output/            # Generated results and reports
└── docs/              # Documentation

🛠️ Development

We welcome contributions! To get started:

Install from Source

git clone https://github.com/msr2903/himotoki.git
cd himotoki
pip install -e ".[dev]"

Development Commands

Tests: pytest
Coverage: pytest --cov=himotoki
Linting: ruff check .
Formatting: black .

LLM Accuracy Evaluation (Local)

Run LLM evaluation: python -m scripts.llm_eval --quick
Run with mock mode: python -m scripts.llm_eval --quick --mock
Run one sentence: python -m scripts.llm_eval --onesentence "猫が食べる"
Start labeler UI: python -m scripts.llm_labeler --host 127.0.0.1 --port 8008

Set LLM_PROVIDER=openai with OPENAI_BASE_URL (for example, http://127.0.0.1:3030/v1) and OPENAI_API_KEY (use not-needed for local servers that ignore keys) to use a local OpenAI-compatible server. Use --mock for offline runs.

Set LLM_PROVIDER=gemini with GEMINI_API_KEY and GEMINI_MODEL (default: gemini-3-flash-preview) to use Gemini.

Use --concurrency 5 (or LLM_CONCURRENCY=5) to send multiple LLM requests in parallel. Use --rpm (or LLM_RPM) to cap request rate per minute (defaults: 2 for openai, 1 for gemini).

Install the optional dependencies for the labeler UI:

pip install -e ".[eval]"

📜 License

Distributed under the MIT License. See LICENSE for more information.

🙏 Acknowledgments

tshatrov for the original ichiran implementation.
EDRDG for the invaluable JMDict resource.

"Unraveling the complexities of the Japanese language, one string at a time."

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.beads		.beads
.github/workflows		.github/workflows
data		data
docs		docs
himotoki		himotoki
output		output
releases		releases
scripts		scripts
tests		tests
.chunkhound.json		.chunkhound.json
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
fork_project.sh		fork_project.sh
opencode.json		opencode.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧶 Himotoki (紐解き)

✨ Key Features

🚀 Getting Started

Installation

First-Time Setup

Quick CLI Usage

Python API Example

🏗️ Project Structure

🛠️ Development

Install from Source

Development Commands

LLM Accuracy Evaluation (Local)

📜 License

🙏 Acknowledgments

About

Uh oh!

Releases 6

Packages

Contributors 2

Uh oh!

Languages

License

msr2903/himotoki

Folders and files

Latest commit

History

Repository files navigation

🧶 Himotoki (紐解き)

✨ Key Features

🚀 Getting Started

Installation

First-Time Setup

Quick CLI Usage

Python API Example

🏗️ Project Structure

🛠️ Development

Install from Source

Development Commands

LLM Accuracy Evaluation (Local)

📜 License

🙏 Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 2

Uh oh!

Languages

Packages