libpinyin TODOs

Peng Wu edited this page Jul 3, 2012 · 9 revisions

libpinyin TODOs

TODO items:

  1. input by keystroke sequence, and output a list of candidate word. (maybe in input method engine.)

  2. Berkeley database replacement. (optional)

  3. Tri-gram support.

Note: the task order is unimportant.

In Progress Items:

  1. large web raw corpus training. (in progress.)

  2. try to support fuzzy pinyin segment. (initial support done.)

tasks for sunpinyin backward compatibility:

  1. support for bow value of back-off model.

    a. an inherited class from n-gram.
    b. or re-write a new n-gram class for back-off model.
  2. computing bow value, and store it in a sub-class of n-gram.

  3. Back-off pinyin lookup algorithms.

  4. Entropy-based pruning. (maybe optional)

Note: the above tasks are for sunpinyin backward compatibility, as some items are already partially implemented in sunpinyin, we will try to port it to libpinyin.