LDA Analysis of the Twitter feed of @josephmisiti
OpenEdge ABL Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.DS_Store
.gitignore
10_topics
10_topics100it
15_topics
README.md
calculate_distributions.py
corpus.txt
get_authors.py
get_html_with_diffbot.py
lda.py
ldaphi_K10.p
parse_twitter_csv.py
results.json
tweets.csv
tweets_content.dat
tweets_topic_dist_10_topics.CSV
twitter_urls.dat
vocabulary.py

README.md

Download From Twitter Archive
Parse the content

cat tweets.csv | awk -F"," '{print $6}' > tweets_content.dat

Parse HTML

python parse_twitter_csv.py tweets_content.dat > twitter_urls.dat

Get the real-HTML with diffbot

python get_html_with_diffbot.py twitter_urls.dat

Download results from diffbot API

curl http://api.diffbot.com/v3/bulk/download/asdfasdfasdfasdf-MISITI_data.json -o results.json

Make the corpus

Run LDA

python lda.py -f corpus.txt -k 10 --alpha=0.5 --beta=0.5 -i 25