# TUD IS sentiment analysis tool
This notebook can be used as a tool for  Sentiment Analysis of Tweets from Twitter with the Google Natural Language API about given keywords.
## Preparations

Installation of python requirements

In [23]:
!pip install searchtweets-v2 google-cloud-language==2.2.2



You should consider upgrading via the 'C:\Users\Lukas\.virtualenvs\tud-is-sentiment-eogrpOkI\Scripts\python.exe -m pip install --upgrade pip' command.


In [24]:
import os
import time
import pandas as pd
import matplotlib.pyplot as plt
from google.cloud import language
from google.auth import load_credentials_from_file
from searchtweets import ResultStream, gen_request_parameters, load_credentials


### Preparations Google NLP-API
To authenticate against the Google NLP-API an enviromentvariable `GOOGLE_APPLICATION_CREDENTIALS` pointing to a credentials file must be present in the executing environment.

In addition the `analyze_text`-function uses a basic caching mechanism to save on API-Calls and network time.

In [25]:
# Instantiates a client
nlp_client = language.LanguageServiceClient(credentials=load_credentials_from_file('./credentials/tud-is-sentiment.json')[0])


def analyze_text(text, scope='document'):
    document = language.Document(
        content=text, type_=language.Document.Type.PLAIN_TEXT)
    
    f = nlp_client.analyze_entity_sentiment if scope == 'entity' else nlp_client.analyze_sentiment

    analysis = f(request={'document': document})

    return analysis


### Preparations for Twitter-API
Tweets from twitter are fetched using the search-tweets library. To fetch tweets a `fetch-tweets`-function is defined that returns tweets for the query from a disk cache if possible.

[https://github.com/twitterdev/search-tweets-python/tree/v2](https://github.com/twitterdev/search-tweets-python/tree/v2)

In [26]:
twitter_credentials_filename = "./credentials/twitter-academic.yml"

search_args_all = load_credentials(
    filename=twitter_credentials_filename,
    yaml_key="search_all_tweets_v2"
)

search_args_recent = load_credentials(
    filename=twitter_credentials_filename,
    yaml_key="search_recent_tweets_v2"
)

cache_directory = 'cache'


def fetch_tweets(search_term, credentials=search_args_all, max_results=100):
    query = "{} -is:retweet".format(search_term)

    cache_path = os.path.join('.', cache_directory, '{}.csv'.format(
        ''.join(l for l in query if l not in [' ', ':'])
    ))

    os.makedirs(os.path.dirname(cache_path), exist_ok=True)

    try:
        df = pd.read_csv(cache_path)

    except FileNotFoundError:
        tweet_fields = [
            'id',
            'created_at',
            'text',
            'lang',
            # 'entities',
            'geo',
            # 'public_metrics',
            'source'
        ]

        rs = ResultStream(
            request_parameters=gen_request_parameters(
                query,
                None,
                results_per_call=100,
                tweet_fields=','.join(tweet_fields)
            ),
            max_results=max_results,
            **credentials
        )

        df = pd.DataFrame(

            data=sum([page['data'] for page in rs.stream()], [])
        )

        df.to_csv(cache_path, index=False)

    df = df.convert_dtypes()
    df['created_at'] = pd.to_datetime(df['created_at'])

    return df


In [27]:
df = fetch_tweets('artificial intelligence')
df.head()

Unnamed: 0,text,lang,id,created_at,source,geo
0,Scientists Discover the Key to Artistic Succes...,en,1437807953231368200,2021-09-14 15:58:23+00:00,IFTTT,
1,Scientists identify key conditions to set up a...,en,1437807867889868801,2021-09-14 15:58:03+00:00,Twitter for iPhone,
2,"Join Pega's Head of Voice AI, Sabrina Atienza,...",en,1437807828220059649,2021-09-14 15:57:54+00:00,Dynamic Signal,
3,"#AI caramba, those #neuralnetworks are power-h...",en,1437807774398746627,2021-09-14 15:57:41+00:00,Twitter Web App,
4,50% of recruiting programs reject anyone with ...,en,1437807530793635843,2021-09-14 15:56:43+00:00,Hootsuite Inc.,


In [28]:
sentiment_df = df.groupby('lang').resample('H', on='created_at').agg({'text': '. '.join})
ds = [analyze_text(row).document_sentiment for row in sentiment_df.text]
scores, magnitudes = zip(*[(s.score, s.magnitude) for s in ds])

sentiment_df['score'] = scores
sentiment_df['magnitude'] = magnitudes

sentiment_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,text,score,magnitude
lang,created_at,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ar,2021-09-14 14:00:00+00:00,@Latefoic كل ما له علاقة بال tech وال artifici...,0.0,0.0
da,2021-09-14 14:00:00+00:00,#RaviVisvesvarayaSharadaPrasad https://t.co/T...,0.1,0.3
de,2021-09-14 14:00:00+00:00,Wie viele Artikel und Erwägungsgründe umfassen...,0.2,0.7
de,2021-09-14 15:00:00+00:00,"Wer in #Datenpolitik mitreden will, sollte #DS...",0.4,1.6
en,2021-09-14 13:00:00+00:00,https://t.co/rIP892urnF Play #FPL? Check this...,0.2,1.7
