# Append VADER ratings to JSON of Tweets

### Notebook Input: 
* Two JSON files of President Trump's tweets. 
  * First JSON file is 200 coronavirus-related tweets which includes tweet data as well as IBM Watson Tone Analyzer metadata. 
  * Second JSON file is 200 non-coronavirus-related tweets which includes tweet data as well as IBM Watson Tone Analyzer metadata

### Notebook Output: 
* Two JSON files with data from the Input files + VADER polarity scores

In [19]:
import json
import nltk

# ran this to install vader
# nltk.download('vader_lexicon')

from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Initialize VADER
sid = SentimentIntensityAnalyzer()

In [15]:
# Load in the JSON of tweets that already have the IBM Watson Tone Analyzer Results appended to them
coronavirus_tweets     = []
non_coronavirus_tweets = []

with open('coronavirus_tweets_with_tone.json', encoding="utf8") as f:
  coronavirus_tweets = json.load(f)

with open('non_coronavirus_tweets_with_tone.json', encoding="utf8") as f:
  non_coronavirus_tweets = json.load(f)

coronavirus_tweets[0]

{'text': 'China has been working very hard to contain the Coronavirus. The United States greatly appreciates their efforts and transparency. It will all work out well. In particular, on behalf of the American People, I want to thank President Xi!',
 'created_at': 'Fri Jan 24 21:18:15 +0000 2020',
 'id_str': '1220818115354923009',
 'analytical': 0.971713,
 'anger': 0.0,
 'confident': 0.973794,
 'fear': 0.0,
 'joy': 0.680207,
 'sadness': 0.0,
 'tentative': 0.0}

In [16]:
# Append the VADER Polarity rating for each tweet
for coronavirus_tweet in coronavirus_tweets:
    coronavirus_tweet['vader_polarity'] = sid.polarity_scores(coronavirus_tweet['text'])['compound']

for non_coronavirus_tweet in non_coronavirus_tweets:
    non_coronavirus_tweet['vader_polarity'] = sid.polarity_scores(non_coronavirus_tweet['text'])['compound']

In [17]:
# Save the Resulting Tweets
with open("coronavirus_tweets_with_tone_and_sentiment.json", "w", encoding="utf8") as outfile:
    json.dump(coronavirus_tweets, outfile)
    
with open("non_coronavirus_tweets_with_tone_and_sentiment.json", "w", encoding="utf8") as outfile:
    json.dump(non_coronavirus_tweets, outfile)

## Summary
After running this notebook, each JSON object has the following attributes:

| Attribute Name     | Description   |
| -------------      |---------------|
| **id_str**         | the tweet's id on twitter |
| **created_at**     | the tweet's timestamp on twitter                                                            |
| **text**           | the tweet text                                                                              |
| **anger**          | IBM Watson Ton Analyzer score for anger, (placeholder value 0.0 if score was below .5)      |
| **analytical**     | IBM Watson Ton Analyzer score for analytical, (placeholder value 0.0 if score was below .5) |
| **confident**      | IBM Watson Ton Analyzer score for confident, (placeholder value 0.0 if score was below .5)  |
| **fear**           | IBM Watson Ton Analyzer score for fear, (placeholder value 0.0 if score was below .5)       |
| **joy**            | IBM Watson Ton Analyzer score for joy, (placeholder value 0.0 if score was below .5)        |
| **sadness**        | IBM Watson Ton Analyzer score for sadness, (placeholder value 0.0 if score was below .5)    |
| **tentative**      | IBM Watson Ton Analyzer score for tentative, (placeholder value 0.0 if score was below .5)  |
| **vader_polarity** | Ranges from -1 - 1, where -1 is negative sentiment and 1 is positive sentiment              |