Text Analysis with Python Redux
Please note: This repo is in response to An introduction to text analysis with Python by Neal Caren. I found a slight error in his code, which skewed the results.
First, run the results using the original code:
- Download sentiment_before.py
- Run the file. This will create a CSV with the results of the analysis.
Next, take a look at the modified code, highlighting the error:
Run the file. You will see these results:
[('Obama has called the GOP budget social Darwinism. Nice try, but they believe in social creationism.', 0.05263157894736842, 0.0)] [''] 
The first array is the normal output, indicating that the percent positive is .05263157894736842. This means that there is 1 word in the tweet that is positive. The second and third arrays show the positive and negative words in that specific tweet, respectively. In other words, the blank word, '', within the is being counted as a positive word.
Where does this word even come from?
Simple. In this for loop--
for p in list(punctuation): tweet_processed=tweet_processed.replace(p,'')
--the original script replaces all the punctuation marks with ''.
Thus, if the next for loop is updated to the code below, this problem will be eliminated:
for word in words: if word in positive_words and word != '': pos_words.append(word) positive_counter=positive_counter+1 elif word in negative_words and word != '': neg_words.append(word) negative_counter=negative_counter+1
Let's look at the correct results:
- Download sentiment_after.py
- Run the file. This will create a CSV with the correct results of the analysis.
- You can also look at the analysis to see which tweets changed (highlighted).