## Sentiment Analysis with Google Cloud Natural Language API

* This notebook will add two columns to the csv file with the US covid daily tweets
* First, upload the JSON file conatining your Googel develope API credentials
* Then, upload the csv of tweets for the day you want to perform sentiment analysis on
* Change the 'day' variable so that it corresponds to the correct day and replace path with the path to your csv of tweets

In [11]:
path = './../daily_tweets/daily_coronavirus_tweets_us_04_02.csv'
day = '02'

In [3]:
#installing the google cloud library
! python -m pip install tqdm
! python -m pip install --upgrade google-cloud-language

You should consider upgrading via the 'C:\Users\joyji\Anaconda3\python.exe -m pip install --upgrade pip' command.
Requirement already up-to-date: google-cloud-language in c:\users\joyji\anaconda3\lib\site-packages (2.0.0)
You should consider upgrading via the 'C:\Users\joyji\Anaconda3\python.exe -m pip install --upgrade pip' command.


In [8]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "QM2-project-441aae4d7e45.json"
print('Credentials from environ: {}'.format(os.environ.get('GOOGLE_APPLICATION_CREDENTIALS')))

Credentials from environ: QM2-project-441aae4d7e45.json


The code cell below is a test taken from the Google Natural Language API how-to guide: https://cloud.google.com/natural-language/docs/analyzing-sentiment


In [9]:
# Imports the Google Cloud client library
from google.cloud import language_v1

# Instantiates a client
client = language_v1.LanguageServiceClient()

# The text to analyze
text = u"wassup"
document = language_v1.Document(content=text, type_=language_v1.Document.Type.PLAIN_TEXT)

# Detects the sentiment of the text
sentiment = client.analyze_sentiment(request={'document': document}).document_sentiment

print("Text: {}".format(text))
print("Sentiment: {}, {}".format(sentiment.score, sentiment.magnitude))

Text: wassup
Sentiment: 0.4000000059604645, 0.4000000059604645




---



Now we perform sentiment analysis on the actual tweets

**replace the path to df with the path to the csv of tweets**

In [12]:
import pandas as pd
import numpy as np

df = pd.read_csv(path)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59342 entries, 0 to 59341
Data columns (total 37 columns):
Unnamed: 0                    59342 non-null int64
coordinates                   41 non-null object
created_at                    59341 non-null object
hashtags                      9944 non-null object
media                         3334 non-null object
urls                          15215 non-null object
favorite_count                59341 non-null object
id                            59341 non-null object
in_reply_to_screen_name       3843 non-null object
in_reply_to_status_id         3294 non-null float64
in_reply_to_user_id           3839 non-null float64
lang                          59341 non-null object
place                         667 non-null object
possibly_sensitive            17374 non-null object
retweet_count                 59341 non-null object
retweet_id                    43921 non-null float64
retweet_screen_name           43921 non-null object
source         

In [13]:
def google_sentiment(text):
    if type(text) != str:
      return (np.nan,np.nan)
    # The text to analyze
    document = language_v1.Document(content=text, type_=language_v1.Document.Type.PLAIN_TEXT)
    # Detects the sentiment of the text
    try: 
      sentiment = client.analyze_sentiment(request={'document': document}).document_sentiment
      score = sentiment.score
      magnitude = sentiment.magnitude
      return (score,magnitude)
    except:
      return (np.nan,np.nan)


# check link on how to interpret google sentiment scores

In [14]:
# https://stackoverflow.com/questions/18603270/progress-indicator-during-pandas-operations
from tqdm import tqdm
from tqdm.auto import tqdm  # for notebooks
from tqdm.notebook import tqdm_notebook


# Create and register a new `tqdm` instance with `pandas`
# (can use tqdm_gui, optional kwargs, etc.)
tqdm.pandas()

# Now you can use `progress_apply` instead of `apply`
sentiments = df.progress_apply(lambda row: google_sentiment(row['text']), axis = 1, result_type='expand')

100%|██████████| 59342/59342 [7:05:14<00:00,  2.33it/s]


In [15]:
#df[['sentiment', 'magnitude']] = df.apply(lambda row: pd.Series(google_sentiment(row['text'])[0], google_sentiment(row['text'])[1]), axis = 1)

In [16]:
sentiments

Unnamed: 0,0,1
0,0.0,0.0
1,-0.4,1.7
2,0.3,1.7
3,0.0,0.0
4,0.0,0.8
...,...,...
59337,0.3,0.7
59338,0.1,0.3
59339,-0.2,0.5
59340,-0.2,0.5


In [17]:
sentiments.rename(columns = {0:'sentiment', 1:'magnitude'}, inplace=True)

In [18]:
sentiments

Unnamed: 0,sentiment,magnitude
0,0.0,0.0
1,-0.4,1.7
2,0.3,1.7
3,0.0,0.0
4,0.0,0.8
...,...,...
59337,0.3,0.7
59338,0.1,0.3
59339,-0.2,0.5
59340,-0.2,0.5


In [19]:
tweet_sentiments = df.merge(sentiments, left_index=True, right_index=True)
tweet_sentiments.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59342 entries, 0 to 59341
Data columns (total 39 columns):
Unnamed: 0                    59342 non-null int64
coordinates                   41 non-null object
created_at                    59341 non-null object
hashtags                      9944 non-null object
media                         3334 non-null object
urls                          15215 non-null object
favorite_count                59341 non-null object
id                            59341 non-null object
in_reply_to_screen_name       3843 non-null object
in_reply_to_status_id         3294 non-null float64
in_reply_to_user_id           3839 non-null float64
lang                          59341 non-null object
place                         667 non-null object
possibly_sensitive            17374 non-null object
retweet_count                 59341 non-null object
retweet_id                    43921 non-null float64
retweet_screen_name           43921 non-null object
source         

In [20]:
tweet_sentiments.drop(columns=['Unnamed: 0'], inplace=True)

In [21]:
tweet_sentiments

Unnamed: 0,coordinates,created_at,hashtags,media,urls,favorite_count,id,in_reply_to_screen_name,in_reply_to_status_id,in_reply_to_user_id,...,user_name,user_screen_name.1,user_statuses_count,user_time_zone,user_urls,user_verified,is_state,state,sentiment,magnitude
0,,Thu Apr 02 00:02:45 +0000 2020,,,https://hann.it/3dHZBgK,0,1245501890529562624,,,,...,ROR,RORoldan1965,9507.0,,,False,True,GA,0.0,0.0
1,,Thu Apr 02 00:02:45 +0000 2020,,,https://nyti.ms/3bAaGyu,0,1245501889447366657,,,,...,Cecileb,treetop224u,7943.0,,,False,True,TX,-0.4,1.7
2,,Thu Apr 02 00:02:45 +0000 2020,,,,0,1245501889514536963,ktdeapp,1.245500e+18,2.530797e+08,...,AC,LADYLUVSDABEARS,249068.0,,,False,True,MS,0.3,1.7
3,,Thu Apr 02 00:02:45 +0000 2020,,,,0,1245501889736781825,,,,...,Barbara Riddle,barbara_riddle,10750.0,,,False,True,IN,0.0,0.0
4,,Thu Apr 02 00:02:45 +0000 2020,,,https://pluralist.com/jim-banks-chinese-journa...,0,1245501890030231554,,,,...,Jamie pearson,momuar42,64243.0,,,False,True,MO,0.0,0.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59337,,Thu Apr 02 23:43:57 +0000 2020,PEPFAR SouthAfrica HIV ART COVID19,,,1,1245859547131985925,DrZweliMkhize,1.245704e+18,8.444867e+17,...,PEPFAR,PEPFAR,16250.0,,http://www.state.gov/pepfar/,True,True,WA,0.3,0.7
59338,,Thu Apr 02 23:43:56 +0000 2020,,,,0,1245859543575220226,,,,...,MejiasJPR,JaimeMejias9,1446.0,,,False,True,FL,0.1,0.3
59339,,Thu Apr 02 23:29:02 +0000 2020,coronavirus police confinement,,,0,1245855793280212993,,,,...,Globule Rouge,Globule68,3764.0,,,False,True,NY,-0.2,0.5
59340,,Thu Apr 02 23:27:55 +0000 2020,coronavirus police confinement,,,0,1245855512823877633,,,,...,orpheum,UnePeteSec,60149.0,,http://dundermifflin.com,False,True,NY,-0.2,0.5


In [22]:
tweet_sentiments.to_csv('covid_sentiment_daily_04_{}.csv'.format(day), index=False)