Categorize pocket articles
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
img
.gitignore
README.rst
apply_lda.py
apply_lsa.py
clean_data.py
get_data.py
get_tokens.py
hiearchical.ipynb
larry.dict
larry.lda
larry.lda.state
larry.lsa
larry.lsa.projection
larry.mm
larry.mm.index
lda.ipynb
lda_assign_topics_test_data.ipynb
lda_model_analysis.ipynb
lda_test.ipynb
lsa_assign_topics_test_data.ipynb
lsa_model_analysis.ipynb
requirements.txt
users.py
utils.py

README.rst

Topic Modeling of my own articles on POCKET web service.

img/robot.png

Summary:

This project is derived from my over-zealous saving of web articles. The reason that I save articles is to learn about a topic. I hoped by doing this project; I could cluster my articles into bite-size chunks that would allow for easier learning.

Results:

Using Latent Dirichlet(LDA) and Latent Semantic Analysis(LSA) I performed topic modeling on 1000 articles that I saved over the past three years into my Pocket account.

With LDA modeling, I found that the central topics were:

img/lda.png

With LSA modeling, I found that the central topics were:

img/lsa.png

Run your own analysis:

  1. You need an API account with Pocket

  2. Gather Training and Test Data

    $ python get_data.py
    
  3. Clean Data

    $ python clean_data.py
    
  4. Apply LDA or LSA modeling

    $ python apply_lda.py
    
  5. Visualize results: lda.ipynb or lsa.ipynb