topica 0.1.0
First public release of topica.
topica is a fast topic-modeling library for Python with more than a dozen models, built for social scientists who want to move from text data to publishable results in a single workflow. It brings together models and tools usually split across JVM software like MALLET and R packages like stm, and runs them on a parallel Rust core, with every fit reproducible from a fixed seed.
Install
pip install topicaPrebuilt wheels ship for CPython 3.9+ on Linux (manylinux x86_64 / aarch64), macOS (x86_64 / Apple Silicon), and Windows. No Rust toolchain or JVM required.
What's included
Models. LDA (with optional multithreaded and LightLDA alias samplers), DMR, Labeled LDA, CTM, STM (prevalence and content covariates), SAGE, HDP, dynamic topics (DTM), Supervised LDA, the short-text models PT and GSDMM, the guided models SeededLDA and KeyATM, and the hierarchies PA and HLDA.
Tools for publishable work. Covariate effects by the method of composition with cluster-robust standard errors and GLM links, searchK, FREX labeling, coherence (computed in the Rust core), exclusivity, word and document intrusion tests, bootstrap stability, and Fighting Words.
Validated, not approximated. The LDA core reproduces MALLET's train output bit-for-bit; the other models are checked against their reference implementations (R stm, the Blei-lab samplers, gensim, keyATM, seededlda).
Documentation
Guides, a full API reference, worked examples, and a Publishing in a social science journal methodology track: https://nealcaren.github.io/topica/