Twittersa

BC CSCI339 (Natural Language Processing) Final Project

A Twitter sentiment analysis web app written with Flask.

Setup and Deployment

Note: deployment to Heroku depends on scikit-learn and the numpy/scipy stack, which is tricky to run with Heroku. We depend on @thenovices custom Heroku/scipy buildpack which can be set with heroku config:set BUILDPACK_URL=https://github.com/thenovices/heroku-buildpack-scipy

This is a Heroku app with gunicorn as the web server, but the standalone app can be run on localhost with python twittersa.py or foreman start and requires Flask, Tweepy, and Scikit-Learn + dependencies, which can be installed with

pip install -r requirements.txt

Twittersa requires application-level authentication from a registered Twitter application, and thus requires valid Consumer Key and Consumer Secret API keys from http://dev.twitter.com/apps.

These keys must be set as environment variables (export CONSUMER_KEY and CONSUMER_SECRET), or set them in .env and run with foreman or Heroku.

Classifiers

sentiment/classifiers.py includes a command-line script to facilitate the testing of various Naive Bayes classifiers with different data sets and feature extraction techniques.

Usage should be pretty self explanatory by accessing help:

python sentiment/classifiers.py --help

Examples

# MultinomialNB, tested with random sampling 5 times, unigrams + bigrams,
# and TF-IDF weighted transformation. Accuracy is printed as an average
# of the 5 samples.
python sentiment/classifiers.py -N 5 -n 2 25000 --tfidf -c multinomial

# BernoulliNB with unigrams and bigrams, 0 variance threshold removal,
# serialization of the classifier, and an interactive REPL for
# classification after training on the 25000 tweet data set
python sentiment/classifiers.py -n 2 25000 -vpr

Currently the global variables present in the script prefixed with PROD_ will be automatically selected in Twittersa to serve as the classifier backing the web application.

Testing

python tests.py

Util

Contains helper scripts for related tasks. NB: These are intended to be ran from the repository home directory, e.g. python util/noslang_parser.py. I should probably make this a module?

noslang_parser.py
- Parses and serializes abbrevations from noslang dictionary
semeval/
- Contains Tweet corpora from the SemEval 2013 (?) classification task
- To download, use tweet_download.py
tweet_download.py
- Downloads Tweets in the SemEval .tsv files by scraping URLs.
pickle_corpus.py
- Grabs training .csv files specified in corpora/, parses them, removes everything but sentiment and text, and serializes them in lib/.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
corpora		corpora
lib		lib
nltk_data/tokenizers		nltk_data/tokenizers
sentiment		sentiment
static		static
templates		templates
util		util
writeup		writeup
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
requirements.txt		requirements.txt
tests.py		tests.py
twittersa.py		twittersa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twittersa

Setup and Deployment

Classifiers

Examples

Testing

Util

About

Releases

Packages

Languages

License

jayelm/twittersa

Folders and files

Latest commit

History

Repository files navigation

Twittersa

Setup and Deployment

Classifiers

Examples

Testing

Util

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages