Skip to content

Exploring the use of Page Rank to extract key terms from a collection of documents.

Notifications You must be signed in to change notification settings

ianleaman/PageRankKeywordClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project explores the use of Google Page Rank as a means to generate key words from an input of simlar documents. 

It is a work in progress and was built to learn about Page Rank, not to create a better keyword classifier.

Currently the algorithm performs poorly compared to conventional keyword generation such as TF-IDF. Generally ranking stop words such as "the" or "it" highest.

Some improvements:
    - Expand to Ngrams or part of speach tags
    - Add the ability to give command line input.

To run on sample data:
    - Install Numpy
    - in terminal type "python run.py"

About

Exploring the use of Page Rank to extract key terms from a collection of documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages