Search-by-similarity for Japanese kanji
CSS Python JavaScript HTML Makefile
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Author: Lars Yencken <>
Date: 21st Jan 2011


SimSearch is a dictionary search-by-similarity interface for Japanese kanji, providing a nice front-end for Kanjidic. It lets you find a kanji you don't know, using kanji that are visually similar.

If you're viewing this source code, you should be a developer, or someone at least a little comfortable with Python.


This is a quick guide to getting SimSearch up and running locally.


SimSearch uses MongoDB as its database backend. If you don't already have it, install MongoDB first. By default, it will create and use a database called simsearch in MongoDB.

Next, you need Python (2.6/2.7), pip and virtualenv. Then you can install the necessary packages in an environment for simsearch:

$ pip -E ss-env install ./simsearch

Occasionally a dependency will fail to install cleanly (e.g. NLTK). In that case, you will need to download a package for it, enter the virtual environment and install the package from there:

$ tar xfz nltk-v2.08b.tgz
$ cd nltk-v2.08b
$ source /path/to/simsearch/ss-env/bin/activate
(ss-env) $ python install

Building and running

Once installed, build the database with:

$ python -m simsearch.models
Building similarity matrix
Building neighbourhood graph

You can then run the debug server with the command The server will be available at http://localhost:5000/.


Please see Flask documentation around deployment. Feel free to email me as well, if you have any issues.