Skip to content

rskmoi/namedivider-python

Repository files navigation

namedivider-python🦒

NameDivider Logo

PyPI version Python versions PyPI downloads CI

NameDivider is a tool that divides Japanese full names into family and given names.

🚀 Try Live Demo📖 Documentation (日本語)🐳 Docker API⚡ Rust Version


💡 Why NameDivider?

Japanese full names like "菅義偉" are typically stored as single strings with no clear boundary between family and given names. NameDivider solves this with exceptional accuracy.

Unlike cloud-based AI solutions, NameDivider processes all data locally — no external API calls, no data transmission, and full privacy control.

# Before
person_name = "菅義偉"  # How do you know where to divide?

# After  
from namedivider import BasicNameDivider
divider = BasicNameDivider()
result = divider.divide_name("菅義偉")
print(f"Family: {result.family}, Given: {result.given}")
# Family: 菅, Given: 義偉

✨ Key Features

  • 🎯 99.91% accuracy - Tested on real-world Japanese names
  • Multiple algorithms - Choose between speed (Basic) or accuracy (GBDT)
  • 🔐 Privacy-first – Local-only processing, ideal for sensitive data
  • 🔧 Production ready - CLI, Python library, and Docker support
  • 🎨 Interactive demo - Try it live with Streamlit
  • 📊 Confidence scoring - Know when to trust the results
  • 🛠️ Customizable rules - Add domain-specific patterns

🚀 Quick Start

Installation

pip install namedivider-python

Basic Usage

from namedivider import BasicNameDivider, GBDTNameDivider

# Fast but good accuracy (99.3%)
basic_divider = BasicNameDivider()
result = basic_divider.divide_name("菅義偉")
print(result)  # 菅 義偉

# Slower but best accuracy (99.9%)
gbdt_divider = GBDTNameDivider()
result = gbdt_divider.divide_name("菅義偉")
print(result.to_dict())
# {
#   'algorithm': 'gbdt',
#   'family': '菅',
#   'given': '義偉',
#   'score': 0.7300634880343344,
#   'separator': ' '
# }

🔧 Multiple Interfaces

🖥️ Command Line Interface

Perfect for batch processing and automation:

# Single name
$ nmdiv name 菅義偉
菅 義偉

# Process file with progress bar
$ nmdiv file customer_names.txt
100%|██████████| 1000/1000 [00:02<00:00, 431.2it/s]

# Check accuracy on labeled data
$ nmdiv accuracy test_data.txt
Accuracy: 99.1%

🐳 REST API (Docker)

For environments where Python cannot be used, we provide a containerized REST API:

# Run the API server
docker run -d -p 8000:8000 rskmoi/namedivider-api

# Send batch requests
curl -X POST localhost:8000/divide \
  -H "Content-Type: application/json" \
  -d '{"names": ["竈門炭治郎", "竈門禰豆子"]}'

Response:

{
  "divided_names": [
    {"family": "竈門", "given": "炭治郎", "separator": " ", "score": 0.3004587452426102, "algorithm": "kanji_feature"},
    {"family": "竈門", "given": "禰豆子", "separator": " ", "score": 0.30480429696983175, "algorithm": "kanji_feature"}
  ]
}

🎯 Interactive Web Demo

Try NameDivider instantly in your browser: Live Demo →

Run locally:

cd examples/demo
pip install -r requirements.txt
streamlit run example_streamlit.py

📊 Performance & Benchmarks

Algorithm Accuracy Speed (names/sec) Use Case
BasicNameDivider / backend=python 99.3% 4152.8 Stable & compatible
BasicNameDivider / backend=rust 99.3% 18597.7 Max performance (if available)
GBDTNameDivider / backend=python 99.9% 1143.3 Best accuracy, guaranteed
GBDTNameDivider / backend=rust 99.9% 6277.4 Fast + accurate (if available)

Run your own benchmarks:

bash scripts/benchmark_sample.sh

🛠️ Advanced Features

Custom Rules

Handle domain-specific names with custom patterns:

from namedivider import BasicNameDivider, BasicNameDividerConfig
from namedivider import SpecificFamilyNameRule

config = BasicNameDividerConfig(
    custom_rules=[
        SpecificFamilyNameRule(family_names=["竜胆"]),  # Rare family names
    ]
)
divider = BasicNameDivider(config=config)
result = divider.divide_name("竜胆尊")
# DividedName(family='竜胆', given='尊', separator=' ', score=1.0, algorithm='rule_specific_family')

Speed Up

For high-volume processing, NameDivider offers several optimization options:

from namedivider import BasicNameDivider, BasicNameDividerConfig

# Load your names
with open("names.txt", "r", encoding="utf-8") as f:
    names = [line.strip() for line in f]

# Option 1: Enable caching (faster repeated processing)
config = BasicNameDividerConfig(cache_mask=True)
divider = BasicNameDivider(config=config)
results = [divider.divide_name(name) for name in names]

# Option 2: (beta) Use Rust backend (up to 4x faster)
# First install: pip install namedivider-core
config = BasicNameDividerConfig(backend="rust")
divider = BasicNameDivider(config=config)
results = [divider.divide_name(name) for name in names]

🏢 Typical Use Cases

  • Customer Data Processing - Clean and standardize name databases
  • Form Validation - Real-time name splitting in web applications
  • Analytics & Reports - Generate family name statistics
  • Data Migration - Convert legacy systems with combined name fields
  • Government & Municipal - Process citizen registration data
  • Security-sensitive Environments - Process names without sending data to external APIs

📚 Examples & Tutorials

📄 License

Source code and gbdt_model_v1.txt

MIT License

bert_katakana_v0_3_0.pt

cc-by-sa-4.0

family_name_repository.pickle

English

(1) Purpose of use

family_name_repository.pickle is available for commercial/non-commercial use if you use this software to divide name, and to develop algorithms for dividing name.

Any other use of family_name_repository.pickle is prohibited.

(2) Liability

The author or copyright holder assumes no responsibility for the software.

Japanese / 日本語

(1) 利用目的

このソフトウェアを用いて姓名分割、および姓名分割アルゴリズムの開発をする場合、family_name_repository.pickleは商用/非商用問わず利用可能です。

それ以外の目的でのfamily_name_repository.pickleの利用を禁じます。

(2) 責任

作者または著作権者は、family_name_repository.pickleに関して一切の責任を負いません。

The family name data used in family_name_repository.pickle is provided by Myoji-Yurai.net(名字由来net).

🔗 Related Projects

📈 Project Stats

GitHub stars GitHub forks Docker Pulls

Trusted by developers worldwide


Made with ❤️ by @rskmoi • Contact @rskmoi

About

A tool that divides Japanese full names into family and given names.

Resources

License

Stars

Watchers

Forks

Packages

No packages published