Haterz Gonna Hate. But now you know who the haterz are.
Using Scikit Learn's CountVectorizer, some additional hand-rolled features, and Logistic Regression you can start to build out a way to analyize all the comments a user ever has posted in Hacker News or Reddit and rank how big of a troll they are.
Feel free to download either the HTML or iPython Notebook files from this repo. I tried to detail out my explorations and assumtions about the model so you can follow my logic and learn why and how I built everything.
Side Note: I'm going to make it a Chrome App soon. I'm also going to build versions of this for Twitter, reddit, Instagram, Facebook, and maybe even dating apps. If you want to help, reach out! :)
Execute the following commands to clone this repo:
$ git clone git@github.com:kevinmcalear/hater_news.git
$ cd hater_news
You now have a functioning git repository that contains the app as well as a requirements.txt and a Procfile, which are required to run our app in Heroku or using foreman.
You can run the app locally one of two ways:
By using foreman:
$ foreman start
Or by running the app.py file with python:
$ python app.py
foreman gives you a decent preview of how the app will run when on Heroku. I have printed out several things to the console for debugging and exploration which will show up if you run the app with the python app.py
command.
Once installed, you can use the heroku command from your command shell.
Log in using the email address and password you used when creating your Heroku account:
$ heroku login
Enter your Heroku credentials.
Email: python@example.com
Password:
Could not find an existing public key.
Would you like to generate one? [Yn]
Generating new SSH public key.
Uploading ssh public key /Users/username/.ssh/id_rsa.pub
Press enter at the prompt to upload your existing ssh key or create a new one, used for pushing code later on.
To check that your key was added, type heroku keys. If your key isn’t there, you can add it manually by typing heroku keys:add. For more information about SSH keys, see Managing Your SSH Keys.
Create an app on Heroku, which prepares Heroku to receive your source code. * Note: we need to use this buildpack to get everything to work for sklearn and scipy.
For a new app:
heroku create --buildpack https://github.com/thenovices/heroku-buildpack-scipy
For an existing app:
heroku config:set BUILDPACK_URL=https://github.com/thenovices/heroku-buildpack-scipy
This also creates a remote repository (called heroku) which it configures in your local git repo. Heroku generates a random name for your app. * Note: you can pass a parameter to specify your own name, or rename it later with heroku apps:rename
.
Now deploy your code:
$ git push heroku master
git push heroku master
Fetching repository, done.
Counting objects: 7, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 457 bytes | 0 bytes/s, done.
Total 4 (delta 2), reused 0 (delta 0)
-----> Fetching custom git buildpack... done
-----> Python app detected
-----> No runtime.txt provided; assuming python-2.7.4.
-----> Using Python runtime (python-2.7.4)
-----> Detected numpy/scipy in requirements.txt. Downloading prebuilt binaries.
-----> Using cached binaries.
-----> Existing NumPy (1.8.1) package detected.
-----> Existing SciPy (0.14.0) package detected.
-----> Installing dependencies using Pip (1.3.1)
Cleaning up...
-----> Discovering process types
Procfile declares types -> web
-----> Compressing... done, 95.5MB
-----> Launching... done, v39
https://haternews.herokuapp.com/ deployed to Heroku
To git@heroku.com:haternews.git
f335692..bfb7017 master -> master
The application is now deployed. Ensure that at least one instance of the app is running:
$ heroku ps:scale web=1
Now visit the app at the URL generated by its app name. As a handy shortcut, you can open the website as follows:
$ heroku open
Shout Outs to @gavinmh, @jamesbev, @ShawnOakley, and General Assembly's Data Science Program. Without the help I've had I would have no idea how to do anything Data Science related.
- My Training Data.
- The Hacker News API.
- PRAW ("Python Reddit API Wrapper") for Reddit's API.
- Twitter API tweepy python wrapper.
- Flask web framework.
- Jinja2 templating.
- JQuery is life.
- Snap.svg SVG javascript library.
- SweetAlert.js plugin for alerts.
- Guts of Loading Screen Tutorial.
- Custom Heroku buildpack for Python with Numpy 1.8.1 and SciPy 0.14.0.
- Heroku-config for adding env keys.