Skip to content

Tool for training LDA (latent dirichlet allocation)-based topic models, implemented in Python using scikit-learn.

Notifications You must be signed in to change notification settings

steveneale/topic_modeller

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

topic_modeller

version Travis CI

topic_modeller is a program for training LDA (Latent Dirichlet Allocation)-based topic models, implemented using Python and scikit-learn.

Dependencies

topic_modeller is written in Python, and so a recent version of Python 3 should be downloaded before using it. Downloads for Python can be found at https://www.python.org/downloads/.

topic_modeller depends on a number of external libraries, and so a requirements.txt file has been included in the root directory. To run it from the command line, type:

pip install -r requirements.txt

Usage

topic_modeller's entry point is the TopicModeller class, which can be imported and used either using the Python interpreter or as part of your own Python project.

Training a topic model

To train a topic model, instantiate a new instance of the TopicModeller class and call it's build_topic_model function, passing the following arguments:

  • input file path - .csv file containing training data to be processed.
  • dataset (keyword arg) - the type of dataset being loaded ("abcnews" [default]).

For example

from topic_modeller import TopicModeller

modeller = TopicModeller()
modeller.build_topic_model("relative/path/to/input.csv", dataset="abcnews")

Once training has successfully completed, a new TopicModel object will be created containing the trained LDA model and count vectoriser created during training, and will be assigned to the TopicModeller instance's topic_model attribute.

Saving a trained topic model

To save a TopicModeller instance's trained topic_model (TopicModel object), call TopicModeller's save_topic_model_with_name function, passing the following arguments:

  • output model name - name with which to save the trained topic model.

For example:

modeller.save_topic_model("new_model")

A new directory output/models/new_model/ will be created, containing topic_model.pkl and vectoriser.pkl.

About

Tool for training LDA (latent dirichlet allocation)-based topic models, implemented in Python using scikit-learn.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published