Create word embeddings using comments from the /r/politics subreddit for the period Jan-Apr 2016.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
examples
README.md
w2v.py

README.md

Word2Vec-on-Reddit-s-Politics-Subreddit-Jan-Apr-2016-

Create word embeddings using all comments from the /r/politics subreddit for the period Jan-Apr 2016. Each 'word vector' is a 300 dimensional vector trained using the Word2Vec CBOW algorithm. (The comments are phrase collocated before training.)

You can find the Reddit Comments Dataset here: https://www.reddit.com/r/bigquery/wiki/datasets

Modeled in Python

##Some examples:

####Words the model thinks is most similar to 'bernie':

Similar to 'bernie'

####Similar to 'hillary':

Similar to 'hillary'

Reddit's preference for Bernie is already apparent!

####Let's make sure the model understands what a 'corrupt politician' means:

Corrupt

Seems like it does.

####Find similarities of the term 'corrupt politician' with Bernie and Hillary:

Corrupt Politician

Haha! The actual values of the cosine similarities might be a little arbitrary, but who has the larger value definitely reflects Reddit's (and nearly everyone else's) sentiment.

####Similarly, to find the 'idealistic' candidate:

Idealistic

Idealistic Candidate

Not surprising.

####I noticed pro-Bernie posts on the subreddit almost never linked to the mainstream media. There were certain sites that always made it to the front page, that were either one of the more independent news outlets, or right-wing websites that were pro-Bernie by virtue of being anti-Hillary:

pro-Bernie News

####Searching for 'Vermont Senator' interestingly didn't bring up Bernie, but caught Patrick Leahy:

Vermont Senator

####Using cosine distances, it's also possible to find exceptions in groups:

Parties

Anchors

Networks