Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
57 lines (41 sloc) 2.4 KB

Sentiment analysis for English and Spanish (Implicit & Explicit)

Sentiment analysis is a “hot topic” in the analysis of social media, blogosphere and news sources. Marketers, journalists and government officials routinely ask questions along the lines of “What does the Blogosphere think” about a certain subject. In many ways analysis of online sentiment is replacing more labor-intensive methods such as focus groups and polling.

However, Internet sentiment analysis suffers from a num- ber of inherent problems. As a replacement for polling, it has a significant selection bias – results are heavily biased toward large urban centers as the concentration of social media users is significantly higher in urban areas. As a replacement for focus groups, sentiment analysis lacks in-depth inspection of con- tents of discourse – thus missing the reasons why sentiment tends to take a specific direction.

Sentiment analysis itself is rather inaccurate. Even human coders perform badly in randomized trials, as sentiment is a subjective metric. For example, a phrase “I bought a Honda yesterday” was perceived by 45 percent as sentiment-neutral and 50 percent as positive (i).

Finally, it is rather distasteful that the entire rich, varied, poetic spectrum of human emotions gets reduced to a number between -1 and 1. It is an inelegant approach to a complex problem, throwing out too much good information too early in the process.

SnowWhite attempts to remedy that with a new approach to sentiment analysis -- Implicit Sentiment.

Impicit Sentiment

Rather then compute how many positive or negative words one used, we look at how much two speakers have in common. This rate of -mirroring- is directly related to expressed sentiment and seems to be a better predictor of what the speaker has actually said then pure word-counting.

How to run SnowWhite

from snowwhite import discourse_mapper


###for all tweets containing ['user'] and ['text']:

get a sentiment graph:


The sentiment graph is a network of pairs of users where value of the edges corresponds to amount of linguistic mirroring obseved in text

get pair-wise sentiment for any pair of users