TOPIC MODELING USING LATENT DIRICHLET ALLOCATION (LDA)

SCIKIT IMPLEMENTATION

We implement LDA using scikit-learn on two different datasets.

Datasets

Text documents from the Associated Press found here and here in our project.
Speech-to-Text recordings of IFT6269 lectures at the MILA (Université de Montréal) found here.

Pre-processing

We pre-process the data and store it as corpus.txt.

Training

Use train to train the model and save it as a pickle file.
Use save to save the topics extracted from training.

PYTHON IMPLEMENTATION

DATASETS

Text documents from the scribe notes of IFT6269 here
Text documents from the Associated Press found here and here in our project.

Pre-processing and training

Completed in lda. However, it needs to be fixed and cleaned as it was run in Colab.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
lda-python		lda-python
lda-scikit		lda-scikit
poster		poster
report		report
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TOPIC MODELING USING LATENT DIRICHLET ALLOCATION (LDA)

SCIKIT IMPLEMENTATION

Datasets

Pre-processing

Training

PYTHON IMPLEMENTATION

DATASETS

Pre-processing and training

About

Releases

Packages

Contributors 3

Languages

mokleit/topic-modeling-lda

Folders and files

Latest commit

History

Repository files navigation

TOPIC MODELING USING LATENT DIRICHLET ALLOCATION (LDA)

SCIKIT IMPLEMENTATION

Datasets

Pre-processing

Training

PYTHON IMPLEMENTATION

DATASETS

Pre-processing and training

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages