# Benchmark For Tone and Sentiment Regarding Twitter Users Regarding Coronavirus

Doctor Hall Provided me a collection of tweets that are coronavirus-related. The purpose is to prepare a collection of tweets with sentiment and tone scores which we can use as a bench mark to compare against Trump's tweets.

**This code does the following:**
* Grabs a random sample of 200 tweets from the collection of tweets 
  * 200 tweets because that is the same amount of tweets in each of the two sets of Trump Tweets
* Removes all columns except for the `NewID, Text, Date` columns
* `Curls` for the IBM Watson Tone Analyzer scores for each of the tweets and appends the result to the dataframe
* Appends the VADER polarity scores for each of the tweets and appends the result to the dataframe
* Pickles the dataframe

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import json
import pickle
import nltk

from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Initialize VADER
sid = SentimentIntensityAnalyzer()

In [None]:
gen_tweets_df = pd.read_csv('all-data-combined.csv', encoding = "ISO-8859-1")

# limit the columns to the id, text, and date
limit_cols_gen_tweets_df = gen_tweets_df[['NewID','Text', 'Date']]

# only grab 200 tweets
sample_of_tweets = limit_cols_gen_tweets_df.sample(200)

In [None]:
# Create the IBM Tone Analyzer Columns and initialize to 0.0
sample_of_tweets['analytical'] = 0.0
sample_of_tweets['anger']      = 0.0
sample_of_tweets['confident']  = 0.0
sample_of_tweets['fear']       = 0.0
sample_of_tweets['joy']        = 0.0
sample_of_tweets['sadness']    = 0.0
sample_of_tweets['tentative']  = 0.0

In [None]:
# Iterate through the rows, call IBM Watson Tone Analyzer and pass in the text
# Record the results in the corresponding row's columns

for i, row in sample_of_tweets.iterrows():
    with open("temp_text.txt", "w", encoding="utf8") as outfile:
        outfile.write(row["Text"])

    response = !curl -X POST -u "apikey:MY-KEY" --header "Content-Type: text/plain" --data-binary @\Users\netho\Desktop\TRUMP_TWEETS\trump-corona-sentiment\temp_text.txt "MY-URL"
    
    # convert to JSON
    json_response = json.loads(response[-1])
    
    for tone in json_response['document_tone']['tones']:
        if tone['tone_id'] == 'analytical':
            sample_of_tweets.at[i,'analytical'] = tone['score']
        elif tone['tone_id'] == 'anger':
            sample_of_tweets.at[i,'anger'] = tone['score']
        elif tone['tone_id'] == 'confident':
            sample_of_tweets.at[i,'confident'] = tone['score']
        elif tone['tone_id'] == 'fear':
            sample_of_tweets.at[i,'fear'] = tone['score']
        elif tone['tone_id'] == 'joy':
            sample_of_tweets.at[i,'joy'] = tone['score']
        elif tone['tone_id'] == 'sadness':
            sample_of_tweets.at[i,'sadness'] = tone['score']
        elif tone['tone_id'] == 'tentative':
            sample_of_tweets.at[i,'tentative'] = tone['score']
    

In [None]:
# pickle the tweets which now have the IBM Watson Tone Analyzer scores
sample_of_tweets.to_pickle("general_tweets_with_tone.pkl")

In [None]:
# create and initialize the VADER polarity column within the dataframe
sample_of_tweets['vader_polarity'] = 0.0

In [None]:
# Set the VADER polarity scores per row
for i, row in sample_of_tweets.iterrows():
    sample_of_tweets.at[i,'vader_polarity'] = sid.polarity_scores(row["Text"])['compound']


In [None]:
# pickle the tweets which now have both the IBM Watson Tone Analyzer scores, as well as VADER polarity scores
sample_of_tweets.to_pickle("general_tweets_with_tone_and_sentiment.pkl")