- Deserialized words are strings instead of unicode
- Correctly serialize hyperparameters
- Serialize and deserialize vocabulary
- Prediction doesn't require random number generator
- Add
active_dishes
method to state
- Bug where too many dishes are created when deserializing
- Bug fixed where dishes_ is constructed incorrectly when dish ids aren't contiguous
- Issue where deserialization sometimes failed due to deleted tables not being pruned from all vectors.
- Docstrings for most relevant/public methods
- Add serialization of state objects. They are also picklable.
- Make model_definition picklable.
- Random number generator is no longer required for state object constructor
- Bug where perplexity calculation is assumed to be strictly monotone
- Bugs in explicit initialization. Now fully functional.
- Bug in sort order for term_relevance_by_topic
- Add utility function for getting pyLDAvis data
- Added utility functions for translating data formats
- Wrote tests based on those in ariddell's LDA
- Added
term_relevance_by_topic
to get terms and relevance values as described by Sievert and Shirley
- New README
- Rename
word_distribution
method toword_distribution_by_topic
- Rename
document_distribution
method totopic_distribution_by_document
.
- Multidish initialization actually works (from Python)
- Word distributions are normalized
- Word distributions map from word (hashable objects) to probabilities instead of index to probability.
- Fixed bug where
initialize
incorrectly converted words (hashable objects) to integers (causing bad sampling issues).
- Documents can now be any list-of-list of hashable objects
- Expose nwords method of state object to Python layer
- Can initialize state with more than one dish (topic)
- Removed biology abstract test script and data
- State object rolls up m_k instead of tracking m
- Reuse create_table and create_dish in state constructor
- Rename state.t_ji to state.table_doc_word
- Initial (alpha) implementation HDP-LDA Posterior sampling in the Chinese restaurant franchise (Section 5.1 in Teh, et al) based on derivations done by Nakatani Shuyo.
- Currently uses vector of vector of integers to represent documents (instead of variadic dataview).
- Several changes to initialization API including hyperparameter setting.
- Initial attempt at HDP-LDA. Sampler does not work, however provides much of the code infrastructure for model.