Fix for lambda syntax

From this issue: cjhutto#11
michael-erasmus · Nov 12, 2016 · 0018cdb · 0018cdb
1 parent 15372a9
commit 0018cdb
Show file tree

Hide file tree

Showing 2 changed files with 85 additions and 82 deletions.
diff --git a/README.md b/README.md
@@ -3,6 +3,9 @@
 
 VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is _specifically attuned to sentiments expressed in social media_. It is fully open-sourced under the [MIT License](http://choosealicense.com/) (we sincerely appreciate all attributions and readily accept most contributions, but please don't hold us liable).
 
+
+**This fork contains fixes to make VADER work in Python3**
+
 =======
 
 ###Introduction
@@ -12,13 +15,13 @@ This README file describes the dataset of the paper:
   **VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text** <br />
   (by C.J. Hutto and Eric Gilbert) <br />
   Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014. <br />
- 
+
 For questions, please contact: <br />
 
 C.J. Hutto <br />
 Georgia Institute of Technology, Atlanta, GA 30032  <br />
 cjhutto [at] gatech [dot] edu <br />
-  
+
 =======
 
 ###Citation Information
@@ -32,7 +35,7 @@ If you use either the dataset or any of the VADER sentiment analysis tools (VADE
 ###Installation
 
 There are a couple of ways to install and use VADER sentiment:  <br />
-- The simplest is to use the command line to do an installtion from PyPI using pip, e.g., 
+- The simplest is to use the command line to do an installtion from PyPI using pip, e.g.,
 ```
 > pip install vaderSentiment
 ```
@@ -50,53 +53,53 @@ The compressed .tar.gz package includes **PRIMARY RESOURCES** (items 1-3) as wel
 
 2. vader_sentiment_lexicon.txt <br />
        Empirically validated by multiple independent human judges, VADER incorporates a "gold-standard" sentiment lexicon that is especially attuned to microblog-like contexts.  <br />
-    The VADER sentiment lexicon is sensitive both the **polarity** and the **intensity** of sentiments 
-	expressed in social media contexts, and is also generally applicable to sentiment analysis 
+    The VADER sentiment lexicon is sensitive both the **polarity** and the **intensity** of sentiments
+	expressed in social media contexts, and is also generally applicable to sentiment analysis
 	in other domains. <br />
-	   Manually creating (much less, validating) a comprehensive sentiment lexicon is 
-	a labor intensive and sometimes error prone process, so it is no wonder that many 
-	opinion mining researchers and practitioners rely so heavily on existing lexicons 
+	   Manually creating (much less, validating) a comprehensive sentiment lexicon is
+	a labor intensive and sometimes error prone process, so it is no wonder that many
+	opinion mining researchers and practitioners rely so heavily on existing lexicons
 	as primary resources. We are pleased to offer ours as a new resource. <br />
-	   We begin by constructing a list inspired by examining existing well-established 
-	sentiment word-banks (LIWC, ANEW, and GI). To this, we next incorporate numerous 
-	lexical features common to sentiment expression in microblogs, including 
-	 - a full list of Western-style emoticons, for example, :-) denotes a smiley face 
+	   We begin by constructing a list inspired by examining existing well-established
+	sentiment word-banks (LIWC, ANEW, and GI). To this, we next incorporate numerous
+	lexical features common to sentiment expression in microblogs, including
+	 - a full list of Western-style emoticons, for example, :-) denotes a smiley face
 	   and generally indicates positive sentiment)
-	 - sentiment-related acronyms and initialisms (e.g., LOL and WTF are both examples of 
+	 - sentiment-related acronyms and initialisms (e.g., LOL and WTF are both examples of
 	   sentiment-laden initialisms)
-	 - commonly used slang with sentiment value (e.g., nah, meh and giggly). 
-	
-	This process provided us with over 9,000 lexical feature candidates. Next, we assessed 
-	the general applicability of each feature candidate to sentiment expressions. We 
-	used a wisdom-of-the-crowd13 (WotC) approach (Surowiecki, 2004) to acquire a valid 
-	point estimate for the sentiment valence (intensity) of each context-free candidate 
-	feature. We collected intensity ratings on each of our candidate lexical features 
-	from ten independent human raters (for a total of 90,000+ ratings). Features were 
-	rated on a scale from "[–4] Extremely Negative" to "[4] Extremely Positive", with 
+	 - commonly used slang with sentiment value (e.g., nah, meh and giggly).
+
+	This process provided us with over 9,000 lexical feature candidates. Next, we assessed
+	the general applicability of each feature candidate to sentiment expressions. We
+	used a wisdom-of-the-crowd13 (WotC) approach (Surowiecki, 2004) to acquire a valid
+	point estimate for the sentiment valence (intensity) of each context-free candidate
+	feature. We collected intensity ratings on each of our candidate lexical features
+	from ten independent human raters (for a total of 90,000+ ratings). Features were
+	rated on a scale from "[–4] Extremely Negative" to "[4] Extremely Positive", with
 	allowance for "[0] Neutral (or Neither, N/A)".  <br />
-	   We kept every lexical feature that had a non-zero mean rating, and whose standard 
-	deviation was less than 2.5 as determined by the aggregate of ten independent raters. 
-	This left us with just over 7,500 lexical features with validated valence scores that 
-	indicated both the sentiment polarity (positive/negative), and the sentiment intensity 
-	on a scale from –4 to +4. For example, the word "okay" has a positive valence of 0.9, 
-	"good" is 1.9, and "great" is 3.1, whereas "horrible" is –2.5, the frowning emoticon :( 
-	is –2.2, and "sucks" and it's slang derivative "sux" are both –1.5. 
+	   We kept every lexical feature that had a non-zero mean rating, and whose standard
+	deviation was less than 2.5 as determined by the aggregate of ten independent raters.
+	This left us with just over 7,500 lexical features with validated valence scores that
+	indicated both the sentiment polarity (positive/negative), and the sentiment intensity
+	on a scale from –4 to +4. For example, the word "okay" has a positive valence of 0.9,
+	"good" is 1.9, and "great" is 3.1, whereas "horrible" is –2.5, the frowning emoticon :(
+	is –2.2, and "sucks" and it's slang derivative "sux" are both –1.5.
 
 3. vaderSentiment.py <br />
-    The Python code for the rule-based sentiment analysis engine. Implements the 
-	grammatical and syntactical rules described in the paper, incorporating empirically 
-	derived quantifications for the impact of each rule on the perceived intensity of 
-	sentiment in sentence-level text. Importantly, these heuristics go beyond what would 
-	normally be captured in a typical bag-of-words model. They incorporate **word-order 
-	sensitive relationships** between terms. For example, degree modifiers (also called 
-	intensifiers, booster words, or degree adverbs) impact sentiment intensity by either 
+    The Python code for the rule-based sentiment analysis engine. Implements the
+	grammatical and syntactical rules described in the paper, incorporating empirically
+	derived quantifications for the impact of each rule on the perceived intensity of
+	sentiment in sentence-level text. Importantly, these heuristics go beyond what would
+	normally be captured in a typical bag-of-words model. They incorporate **word-order
+	sensitive relationships** between terms. For example, degree modifiers (also called
+	intensifiers, booster words, or degree adverbs) impact sentiment intensity by either
 	increasing or decreasing the intensity. Consider these examples: <br />
 	   (a) "The service here is extremely good"  <br />
 	   (b) "The service here is good" <br />
 	   (c) "The service here is marginally good" <br />
 	From Table 3 in the paper, we see that for 95% of the data, using a degree modifier
-    increases the positive sentiment intensity of example (a) by 0.227 to 0.36, with a 
-	mean difference of 0.293 on a rating scale from 1 to 4. Likewise, example (c) reduces 
+    increases the positive sentiment intensity of example (a) by 0.227 to 0.36, with a
+	mean difference of 0.293 on a rating scale from 1 to 4. Likewise, example (c) reduces
 	the perceived sentiment intensity by 0.293, on average.
 
 4. tweets_GroundTruth.txt <br />
@@ -141,7 +144,7 @@ The compressed .tar.gz package includes **PRIMARY RESOURCES** (items 1-3) as wel
 =======
 ##Python Code EXAMPLE:
 ```
-	from vaderSentiment import sentiment as vaderSentiment 
+	from vaderSentiment import sentiment as vaderSentiment
 	#note: depending on how you installed (e.g., using source code download versus pip install), you may need to import like this:
 	#from vaderSentiment.vaderSentiment import sentiment as vaderSentiment
 
@@ -169,7 +172,7 @@ The compressed .tar.gz package includes **PRIMARY RESOURCES** (items 1-3) as wel
 		print sentence,
 		vs = vaderSentiment(sentence)
 		print "\n\t" + str(vs)
-	
+
 # --- output for the above example code ---
 VADER is smart, handsome, and funny.
  	{'neg': 0.0, 'neu': 0.254, 'pos': 0.746, 'compound': 0.8316}
@@ -195,7 +198,7 @@ At least it isn't a horrible book.
  	{'neg': 0.0, 'neu': 0.637, 'pos': 0.363, 'compound': 0.431}
 :) and :D
  	{'neg': 0.0, 'neu': 0.124, 'pos': 0.876, 'compound': 0.7925}
- 
+
     {'neg': 0.0, 'neu': 0.0, 'pos': 0.0, 'compound': 0.0}
 Today sux
  	{'neg': 0.714, 'neu': 0.286, 'pos': 0.0, 'compound': -0.3612}

diff --git a/vaderSentiment/vaderSentiment.py b/vaderSentiment/vaderSentiment.py
@@ -1,27 +1,27 @@
 #!/usr/bin/python
-# coding: utf-8 
+# coding: utf-8
 '''
 Created on July 04, 2013
 @author: C.J. Hutto
 
 Citation Information
 
-If you use any of the VADER sentiment analysis tools 
-(VADER sentiment lexicon or Python code for rule-based sentiment 
-analysis engine) in your work or research, please cite the paper. 
+If you use any of the VADER sentiment analysis tools
+(VADER sentiment lexicon or Python code for rule-based sentiment
+analysis engine) in your work or research, please cite the paper.
 For example:
 
-  Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for 
-  Sentiment Analysis of Social Media Text. Eighth International Conference on 
+  Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for
+  Sentiment Analysis of Social Media Text. Eighth International Conference on
   Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
 '''
 
-import os, math, re, sys, fnmatch, string 
+import os, math, re, sys, fnmatch, string
 reload(sys)
 
 def make_lex_dict(f):
-    return dict(map(lambda (w, m): (w, float(m)), [wmsr.strip().split('\t')[0:2] for wmsr in open(f) ]))
-    
+    return dict(map(lambda w, m: (w, float(m)), [wmsr.strip().split('\t')[0:2] for wmsr in open(f) ]))
+
 f = 'vader_sentiment_lexicon.txt' # empirically derived valence ratings for words, emoticons, slang, swear words, acronyms/initialisms
 try:
     WORD_VALENCE_DICT = make_lex_dict(f)
@@ -138,7 +138,7 @@ def sentiment(text):
     # get rid of empty items or single letter "words" like 'a' and 'I' from wordsOnly
     for word in wordsOnly:
         if len(word) <= 1:
-            wordsOnly.remove(word)    
+            wordsOnly.remove(word)
     # now remove adjacent & redundant punctuation from [wordsAndEmoticons] while keeping emoticons and contractions
 
     for word in wordsOnly:
@@ -150,7 +150,7 @@ def sentiment(text):
                 wordsAndEmoticons.remove(pword)
                 wordsAndEmoticons.insert(i, word)
                 x1 = wordsAndEmoticons.count(pword)
-            
+
             wordp = word + p
             x2 = wordsAndEmoticons.count(wordp)
             while x2 > 0:
@@ -163,13 +163,13 @@ def sentiment(text):
     for word in wordsAndEmoticons:
         if len(word) <= 1:
             wordsAndEmoticons.remove(word)
-    
+
     # remove stopwords from [wordsAndEmoticons]
     #stopwords = [str(word).strip() for word in open('stopwords.txt')]
     #for word in wordsAndEmoticons:
     #    if word in stopwords:
     #        wordsAndEmoticons.remove(word)
-    
+
     # check for negation
 
     isCap_diff = isALLCAP_differential(wordsAndEmoticons)
@@ -186,9 +186,9 @@ def sentiment(text):
         if item_lowercase in WORD_VALENCE_DICT:
             #get the sentiment valence
             v = float(WORD_VALENCE_DICT[item_lowercase])
-            
+
             #check if sentiment laden word is in ALLCAPS (while others aren't)
-            
+
             if item.isupper() and isCap_diff:
                 if v > 0: v += c_INCR
                 else: v -= c_INCR
@@ -204,8 +204,8 @@ def sentiment(text):
                 if s2 != 0: s2 = s2*0.95
                 v = v+s2
                 # check for special use of 'never' as valence modifier instead of negation
-                if wordsAndEmoticons[i-2] == "never" and (wordsAndEmoticons[i-1] == "so" or wordsAndEmoticons[i-1] == "this"): 
-                    v = v*1.5                    
+                if wordsAndEmoticons[i-2] == "never" and (wordsAndEmoticons[i-1] == "so" or wordsAndEmoticons[i-1] == "this"):
+                    v = v*1.5
                 # otherwise, check for negation/nullification
                 elif negated([wordsAndEmoticons[i-2]]): v = v*n_scalar
             if i > 2 and wordsAndEmoticons[i-3].lower() not in WORD_VALENCE_DICT:
@@ -219,12 +219,12 @@ def sentiment(text):
                     v = v*1.25
                 # otherwise, check for negation/nullification
                 elif negated([wordsAndEmoticons[i-3]]): v = v*n_scalar
-                
+
 
                 # future work: consider other sentiment-laden idioms
-                #other_idioms = {"back handed": -2, "blow smoke": -2, "blowing smoke": -2, "upper hand": 1, "break a leg": 2, 
+                #other_idioms = {"back handed": -2, "blow smoke": -2, "blowing smoke": -2, "upper hand": 1, "break a leg": 2,
                 #                "cooking with gas": 2, "in the black": 2, "in the red": -2, "on the ball": 2,"under the weather": -2}
-            
+
                 onezero = u"{} {}".format(wordsAndEmoticons[i-1], wordsAndEmoticons[i])
                 twoonezero = u"{} {} {}".format(wordsAndEmoticons[i-2], wordsAndEmoticons[i-1], wordsAndEmoticons[i])
                 twoone = u"{} {}".format(wordsAndEmoticons[i-2], wordsAndEmoticons[i-1])
@@ -248,11 +248,11 @@ def sentiment(text):
                     zeroonetwo = u"{} {}".format(wordsAndEmoticons[i], wordsAndEmoticons[i+1], wordsAndEmoticons[i+2])
                     if zeroonetwo in SPECIAL_CASE_IDIOMS:
                         v = SPECIAL_CASE_IDIOMS[zeroonetwo]
-                
+
                 # check for booster/dampener bi-grams such as 'sort of' or 'kind of'
                 if threetwo in BOOSTER_DICT or twoone in BOOSTER_DICT:
                     v = v+B_DECR
-            
+
             # check for negation case using "least"
             if i > 1 and wordsAndEmoticons[i-1].lower() not in WORD_VALENCE_DICT \
                 and wordsAndEmoticons[i-1].lower() == "least":
@@ -261,32 +261,32 @@ def sentiment(text):
             elif i > 0 and wordsAndEmoticons[i-1].lower() not in WORD_VALENCE_DICT \
                 and wordsAndEmoticons[i-1].lower() == "least":
                 v = v*n_scalar
-        sentiments.append(v) 
-            
+        sentiments.append(v)
+
     # check for modification in sentiment due to contrastive conjunction 'but'
     if 'but' in wordsAndEmoticons or 'BUT' in wordsAndEmoticons:
         try: bi = wordsAndEmoticons.index('but')
         except: bi = wordsAndEmoticons.index('BUT')
         for s in sentiments:
             si = sentiments.index(s)
-            if si < bi: 
+            if si < bi:
                 sentiments.pop(si)
                 sentiments.insert(si, s*0.5)
-            elif si > bi: 
+            elif si > bi:
                 sentiments.pop(si)
-                sentiments.insert(si, s*1.5) 
-                
-    if sentiments:                      
+                sentiments.insert(si, s*1.5)
+
+    if sentiments:
         sum_s = float(sum(sentiments))
         #print sentiments, sum_s
-        
+
         # check for added emphasis resulting from exclamation points (up to 4 of them)
         ep_count = text.count("!")
         if ep_count > 4: ep_count = 4
         ep_amplifier = ep_count*0.292 #(empirically derived mean sentiment intensity rating increase for exclamation points)
         if sum_s > 0:  sum_s += ep_amplifier
         elif  sum_s < 0: sum_s -= ep_amplifier
-        
+
         # check for added emphasis resulting from question marks (2 or 3+)
         qm_count = text.count("?")
         qm_amplifier = 0
@@ -297,7 +297,7 @@ def sentiment(text):
             elif  sum_s < 0: sum_s -= qm_amplifier
 
         compound = normalize(sum_s)
-        
+
         # want separate positive versus negative sentiment scores
         pos_sum = 0.0
         neg_sum = 0.0
@@ -309,19 +309,19 @@ def sentiment(text):
                 neg_sum += (float(sentiment_score) -1) # when used with math.fabs(), compensates for neutrals
             if sentiment_score == 0:
                 neu_count += 1
-        
+
         if pos_sum > math.fabs(neg_sum): pos_sum += (ep_amplifier+qm_amplifier)
         elif pos_sum < math.fabs(neg_sum): neg_sum -= (ep_amplifier+qm_amplifier)
-        
+
         total = pos_sum + math.fabs(neg_sum) + neu_count
         pos = math.fabs(pos_sum / total)
         neg = math.fabs(neg_sum / total)
         neu = math.fabs(neu_count / total)
-        
+
     else:
         compound = 0.0; pos = 0.0; neg = 0.0; neu = 0.0
-        
-    s = {"neg" : round(neg, 3), 
+
+    s = {"neg" : round(neg, 3),
          "neu" : round(neu, 3),
          "pos" : round(pos, 3),
          "compound" : round(compound, 4)}
@@ -352,11 +352,11 @@ def sentiment(text):
     paragraph = "It was one of the worst movies I've seen, despite good reviews. \
     Unbelievably bad acting!! Poor direction. VERY poor production. \
     The movie was bad. Very bad movie. VERY bad movie. VERY BAD movie. VERY BAD movie!"
-    
+
     from nltk import tokenize
     lines_list = tokenize.sent_tokenize(paragraph)
     sentences.extend(lines_list)
-    
+
     tricky_sentences = [
                         "Most automated sentiment analysis tools are shit.",
                         "VADER sentiment analysis is the shit.",
@@ -371,7 +371,7 @@ def sentiment(text):
                         "This movie doesn't care about cleverness, wit or any other kind of intelligent humor.",
                         "Those who find ugly meanings in beautiful things are corrupt without being charming.",
                         "There are slow and repetitive parts, BUT it has just enough spice to keep it interesting.",
-                        "The script is not fantastic, but the acting is decent and the cinematography is EXCELLENT!", 
+                        "The script is not fantastic, but the acting is decent and the cinematography is EXCELLENT!",
                         "Roger Dodger is one of the most compelling variations on this theme.",
                         "Roger Dodger is one of the least compelling variations on this theme.",
                         "Roger Dodger is at least compelling as a variation on the theme.",
@@ -386,5 +386,5 @@ def sentiment(text):
         print sentence
         ss = sentiment(sentence)
         print "\t" + str(ss)
-    
+
     print "\n\n Done!"