Join GitHub today
WaDokuJT dictionary search
Picky is used to find dictionary entries in WaDokuJT, the largest Japanese-German dictionary.
Amount of Data
There are around 250.000 entries in the WaDokuJT file, with around 5 main fields. The file is only 60 mb, but the field for the German version of the entries has a lot of internal structure that would in other cases often be modeled with database relations. These relations have a lot of semantic information and have to be remodeled in the indexing step.
All categories are indexed with full partial search. Also, several virtual fields exist that are created at indexing time, like the romaji field, which is generated directly from the Japanese characters. In the future it is planned to add more virtual fields like headwords, place names etc. Picky makes this very easy, as you can just write these virtual fields with standard Ruby code.
Queries are often just one word searches and not very complex. Picky can usually serve the request in under a millisecond. A "like %"-based SQLite search on the same data took around 2~3 seconds.
Indexing is very fast,, too. The server is an Xserve3,1 with a Quad-Core Xeon with 2.26 Ghz.
edv@rokuhara:~/Sites/picky_speed_test$ time bundle exec rake index Loaded picky with environment 'development' in /Users/edv/Sites/picky_s peed_test on Ruby 1.9.2. Application loaded. [...] real 8m39.234s user 11m46.977s sys 1m11.744s
Using Picky is one of the things that makes wadoku.eu good and easy to use. Having just one search field instead of the usual "advanced search" is great and we expect it to be a great advantage. This still has to be tested by our users, though.
Having your search completely seperated from your database design is a huge relief and makes it easy to change and optimize both search and database functions separately. Picky can also serve as a lightweight search API for third party services that want to use our data without any additional work on our part.