TopicModellingUsingLDA

LDA assumes documents are produced from a mixture of topics. Those topics then generate words based on their probability distribution. Given a dataset of documents, LDA backtracks and tries to figure out what topics would create those documents in the first place. By using the topic modelling technique, we can get an insight so as to what a given document is all about and can tell about the theme of the document.

Implementation of the algorithm: The user shall enter the path of the PDF file and he can deduce the topic by studying the probability distribution of the terms in the topic. He will get the probability distribution of the top four words of the topic which can further be changed by changing the value of the variable num_words.

Packages Needed (Commands for a Linux System):

sudo apt-get install python-tk
sudo apt-get install python-matplotlib
sudo pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose
sudo apt-get install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev
sudo pip install textract
sudo pip install --upgrade gensim

//Python provides many great libraries for text mining practices, “gensim” is one such clean and beautiful library to handle text data. It is scalable, robust and efficient.

Open a Python console and do the following: import nltk nltk.download()

Working:

We can then deduce that the given pdf tells us about ‘Switching techniques’ or ‘Packet switching’.

Input: PDF FILE PATH

OUTPUT: Top 4 related words for that topic.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
2.pdf		2.pdf
README.md		README.md
Report.pdf		Report.pdf
analyzer.py		analyzer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TopicModellingUsingLDA

About

Releases

Packages

Languages

gopal10sep/TopicModellingUsingLDA

Folders and files

Latest commit

History

Repository files navigation

TopicModellingUsingLDA

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages