Skip to content

topica 0.1.0

Choose a tag to compare

@github-actions github-actions released this 03 Jun 19:25
· 306 commits to main since this release

First public release of topica.

topica is a fast topic-modeling library for Python with more than a dozen models, built for social scientists who want to move from text data to publishable results in a single workflow. It brings together models and tools usually split across JVM software like MALLET and R packages like stm, and runs them on a parallel Rust core, with every fit reproducible from a fixed seed.

Install

pip install topica

Prebuilt wheels ship for CPython 3.9+ on Linux (manylinux x86_64 / aarch64), macOS (x86_64 / Apple Silicon), and Windows. No Rust toolchain or JVM required.

What's included

Models. LDA (with optional multithreaded and LightLDA alias samplers), DMR, Labeled LDA, CTM, STM (prevalence and content covariates), SAGE, HDP, dynamic topics (DTM), Supervised LDA, the short-text models PT and GSDMM, the guided models SeededLDA and KeyATM, and the hierarchies PA and HLDA.

Tools for publishable work. Covariate effects by the method of composition with cluster-robust standard errors and GLM links, searchK, FREX labeling, coherence (computed in the Rust core), exclusivity, word and document intrusion tests, bootstrap stability, and Fighting Words.

Validated, not approximated. The LDA core reproduces MALLET's train output bit-for-bit; the other models are checked against their reference implementations (R stm, the Blei-lab samplers, gensim, keyATM, seededlda).

Documentation

Guides, a full API reference, worked examples, and a Publishing in a social science journal methodology track: https://nealcaren.github.io/topica/