Skip to content

siddharth271101/Gensim-Topic-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Topic Modeling on news articles

Large amounts of data are collected everyday. As more information becomes available, it becomes difficult to access what we are looking for. So, we need tools and techniques to organize, search and understand vast quantities of information.Topic modelling provides us with methods to organize, understand and summarize large collections of textual information.

Problem Description

The npr dataset has around 12000 news articles,unlabeled,we need to extract groups of important words from these articles.These groups of words are basically topics which would help in ascertaining what each article is about.

Objectives

Our goal is to identify the number of topics and determine the theme of each topic.

Approaches

This is an unsupervised learning problem in which we find various topics that are present in the news articles.I will be using the Latent Dirichlet Allocation (LDA) from Gensim package along with the Mallet’s implementation.

Libraries used

gensim,nltk and spacy are used to process text.pyLDAvis,matplotlb and seaborn for visualization and numpy and pandas for manipulating and viewing data in a tabular format.

About

An implementation of Topic Modeling with Gensim and NLTK

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published