### Tweets processing and sentiment analysis
---

In this notebook we load the tweets we previously collected using the ```Twitter streamer.py```. Along the way, we will flatten the Twitter JSON, select the text objects among the several options (main tweet, re-tweet, quote, etc.), clean them (remove non-alphabetic characters), translate non-English tweets, compute the sentiment of the text, and associate a location given a user-defined location or an automatic geolocalization.

__Note: And accompanying ```Tweets processing and sentiment.py``` file, contains all the code in this notebook and it is meant to be run in the terminal.__

---

We start by loading the tweet object from the ```.json``` files in the ```Twitter/Tweets/``` directory.

In [1]:
import glob
import json

# list all files containing tweets
files = list(glob.iglob('Twitter/Tweets/*.json'))

tweets_data = []
for file in files:
    
    tweets_file = open(file, "r", encoding = 'utf-8')

    # Read in tweets and store in list: tweets_data
    for line in tweets_file:
        tweet = json.loads(line)
        tweets_data.append(tweet)

    tweets_file.close()

In [2]:
print('There are', len(tweets_data), 'tweets in the dataset.') 

There are 52830 tweets in the dataset.


## Processing JSON
---

There are multiple fields in the [Twitter JSON](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object) which contains textual data. In a typical tweet, there's the tweet text, the user description, and the user location. In a tweet longer than 140 characters, there's the extended tweet child JSON. And in a quoted tweet, there's the original tweet text and the commentary with the quoted tweet. The next image shows a portion of the Twitter JSON contents:

![](Images/TwitterMap.jpg)

To analyze tweets at scale, we will want to __flatten__ the tweet JSON into a single level. This will allow us to store the tweets in a DataFrame format. To do this, we will define the function ```flatten_tweets()``` which will take several fields regarding text and location (stored in ```place```).

In [3]:
def flatten_tweets(tweets):
    """ Flattens out tweet dictionaries so relevant JSON is 
        in a top-level dictionary. """
    
    tweets_list = []
    
    # Iterate through each tweet
    for tweet_obj in tweets:
    
        ''' User info'''
        # Store the user screen name in 'user-screen_name'
        tweet_obj['user-screen_name'] = tweet_obj['user']['screen_name']
        
        # Store the user location
        tweet_obj['user-location'] = tweet_obj['user']['location']
    
        ''' Text info'''
        # Check if this is a 140+ character tweet
        if 'extended_tweet' in tweet_obj:
            # Store the extended tweet text in 'extended_tweet-full_text'
            tweet_obj['extended_tweet-full_text'] = \
                                    tweet_obj['extended_tweet']['full_text']
    
        if 'retweeted_status' in tweet_obj:
            # Store the retweet user screen name in 
            # 'retweeted_status-user-screen_name'
            tweet_obj['retweeted_status-user-screen_name'] = \
                        tweet_obj['retweeted_status']['user']['screen_name']

            # Store the retweet text in 'retweeted_status-text'
            tweet_obj['retweeted_status-text'] = \
                                        tweet_obj['retweeted_status']['text']
    
            if 'extended_tweet' in tweet_obj['retweeted_status']:
                # Store the extended retweet text in 
                #'retweeted_status-extended_tweet-full_text'
                tweet_obj['retweeted_status-extended_tweet-full_text'] = \
                tweet_obj['retweeted_status']['extended_tweet']['full_text']
                
        if 'quoted_status' in tweet_obj:
            # Store the retweet user screen name in 
            #'retweeted_status-user-screen_name'
            tweet_obj['quoted_status-user-screen_name'] = \
                            tweet_obj['quoted_status']['user']['screen_name']

            # Store the retweet text in 'retweeted_status-text'
            tweet_obj['quoted_status-text'] = \
                                            tweet_obj['quoted_status']['text']
    
            if 'extended_tweet' in tweet_obj['quoted_status']:
                # Store the extended retweet text in 
                #'retweeted_status-extended_tweet-full_text'
                tweet_obj['quoted_status-extended_tweet-full_text'] = \
                    tweet_obj['quoted_status']['extended_tweet']['full_text']
        
        ''' Place info'''
        if 'place' in tweet_obj:
            # Store the country code in 'place-country_code'
            try:
                tweet_obj['place-country'] = \
                                            tweet_obj['place']['country']
                
                tweet_obj['place-country_code'] = \
                                            tweet_obj['place']['country_code']
                
                tweet_obj['location-coordinates'] = \
                            tweet_obj['place']['bounding_box']['coordinates']
            except: pass
        
        tweets_list.append(tweet_obj)
        
    return tweets_list

In the context of this project though, we are interested in just one text field. Therefore, we now define a function ```select_text(tweets)``` that selects the main text whether the tweet is a principal tweet or a re-tweet, and we decide to drop the quoted text as it usually is repetitive and may not be informative.

In [4]:
def select_text(tweets):
    ''' Assigns the main text to only one column depending
        on whether the tweet is a RT/quote or not'''
    
    tweets_list = []
    
    # Iterate through each tweet
    for tweet_obj in tweets:
        
        if 'retweeted_status-extended_tweet-full_text' in tweet_obj:
            tweet_obj['text'] = \
                        tweet_obj['retweeted_status-extended_tweet-full_text']
        
        elif 'retweeted_status-text' in tweet_obj:
            tweet_obj['text'] = tweet_obj['retweeted_status-text']
            
        elif 'extended_tweet-full_text' in tweet_obj:
                    tweet_obj['text'] = tweet_obj['extended_tweet-full_text']
                
        tweets_list.append(tweet_obj)
        
    return tweets_list

We now build the data frame. Notice that we choose the columns (fields) relevant for our analysis. This includes the language of the tweet, ```lang```.

We also keep ```user-location```, which is set manually by the user, and the ```country```, ```country_code``` and ```coordinates``` fields from ```place```. These fields appear when the tweet is geo-tagged and it is usually contained in less than the 10% of the total of tweets.

In [5]:
import pandas as pd

# flatten tweets
tweets = flatten_tweets(tweets_data)
columns_all_text = ['text', 'extended_tweet-full_text', 'retweeted_status-text', 
           'retweeted_status-extended_tweet-full_text', 'quoted_status-text', 
           'quoted_status-extended_tweet-full_text', 'lang', 'user-location', 
           'place-country_code']

# select text
tweets = select_text(tweets)
columns = ['text', 'lang', 'user-location', 'place-country', 
           'place-country_code', 'location-coordinates', 'user-screen_name']

# Create a DataFrame from `tweets`
df_tweets = pd.DataFrame(tweets, columns=columns)
# replaces NaNs by Nones
df_tweets.where(pd.notnull(df_tweets), None, inplace=True)
#
df_tweets.head()

Unnamed: 0,text,lang,user-location,place-country,place-country_code,location-coordinates,user-screen_name
0,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,en,,,,,Fa37im
1,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",nl,USA,,,,tarun_patna
2,FIFA 21 стала самой дорогой игрой в PSN. Она с...,ru,Moscow,,,,gguru_ru
3,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,en,,,,,Alsn_29
4,سحب على FIFA21 او قيمتها 60$ 🔥\nالشروط بسيطه:\...,ar,,,,,fg_2w


In [6]:
df_tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52830 entries, 0 to 52829
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   text                  52830 non-null  object
 1   lang                  52830 non-null  object
 2   user-location         29888 non-null  object
 3   place-country         485 non-null    object
 4   place-country_code    485 non-null    object
 5   location-coordinates  485 non-null    object
 6   user-screen_name      52830 non-null  object
dtypes: object(7)
memory usage: 2.8+ MB


__++++++++++++++++++++++++++++++++++++++++  [Take just a sample for quick checks]__

In [7]:
df_tweets_sample = df_tweets.copy()[:50]

__++++++++++++++++++++++++++++++++++++++++__

## Languages
---

In this part of this process we will replace the languages codes in ```lang``` by the actual language name. We will do this with the auxiliary ```Countries/languages_codes.csv``` dataset.

In [8]:
with open('Countries/languages.json', 'r', encoding='utf-8') as json_file:
    languages_dict = json.load(json_file)

{k: languages_dict[k] for k in list(languages_dict)[:5]}

{'aa': {'name': 'Afar', 'native': 'Afar'},
 'ab': {'name': 'Abkhazian', 'native': 'Аҧсуа'},
 'af': {'name': 'Afrikaans', 'native': 'Afrikaans'},
 'ak': {'name': 'Akan', 'native': 'Akana'},
 'am': {'name': 'Amharic', 'native': 'አማርኛ'}}

In [9]:
names = []
for idx, row in df_tweets_sample.iterrows():
    lang = row['lang']
    if lang == 'und':
        names.append(None)
    elif lang == 'in':
        name = languages_dict['id']['name']
        names.append(name)
    elif lang == 'iw':
        name = languages_dict['he']['name']
        names.append(name)
    else:
        name = languages_dict[lang]['name']
        names.append(name)

df_tweets_sample['language'] = names
df_tweets_sample.drop(['lang'], axis=1, inplace=True)
#
df_tweets_sample.head()

Unnamed: 0,text,user-location,place-country,place-country_code,location-coordinates,user-screen_name,language
0,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,,,,,Fa37im,English
1,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",USA,,,,tarun_patna,Dutch
2,FIFA 21 стала самой дорогой игрой в PSN. Она с...,Moscow,,,,gguru_ru,Russian
3,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,,,,,Alsn_29,English
4,سحب على FIFA21 او قيمتها 60$ 🔥\nالشروط بسيطه:\...,,,,,fg_2w,Arabic


## Locations
---

Now we move to process the locations. We will first treat ```place``` fields and then ```user-location```.

### place-

The data in the ```place``` object is ––obiously–– more reliable than the ```user-location```. Therefore, although it constitutes the 0.91% of our tweets, we will take care of it. First, the country code in ```place-country_code``` comes in ISO 2 form, for which we will translate it to ISO 3 form with [country converter](https://github.com/konstantinstadler/country_converter). Then, we will perform the same to change ```place-country``` names to the standard, short names.

In [10]:
import country_converter as coco

# change codes to iso3 
to_iso3_func = lambda x: coco.convert(names=x, to='iso3', not_found=None) \
                            if x is not None else x

df_tweets_sample['place-country_code'] = \
                   df_tweets_sample['place-country_code'].apply(to_iso3_func)

# change name to standard name
to_std_func = lambda x: coco.convert(names=x, to='name_short', not_found=None) \
                            if x is not None else x

df_tweets_sample['place-country'] = \
                        df_tweets_sample['place-country'].apply(to_std_func)

### user-locations

Here we take the manually-set ```user-locations``` and translate them to country names and codes –– this involves some trusting on the user. We do this using the [GeoPy](https://geopy.readthedocs.io/en/latest/#) library and, again, ```country_converter``` to find the country codes in ISO 3 form.

__A word of caution__: GeoPy connects to an API and, unfortunately, it takes almost a second for each call. This makes the process of computing ~ 50 K tweets rather slow.

In [11]:
from geopy.geocoders import Nominatim
from tqdm import tqdm

tqdm.pandas()

def geo_locator(user_location):
    
    # initialize geolocator
    geolocator = Nominatim(user_agent='Tweet_locator')

    if user_location is not None:
        try :
            # get location
            location = geolocator.geocode(user_location, language='en')
            # get coordinates
            location_exact = geolocator.reverse(
                        [location.latitude, location.longitude], language='en')
            # get country codes
            c_code = location_exact.raw['address']['country_code']

            return c_code

        except:
            return None

    else : 
        return None

# apply geo locator to user-location
loc = df_tweets_sample['user-location'].progress_apply(geo_locator)
df_tweets_sample['user-country_code'] = loc

# change codes to iso3 
df_tweets_sample['user-country_code'] = \
                    df_tweets_sample['user-country_code'].apply(to_iso3_func)

# create user-country column
df_tweets_sample['user-country'] = \
                    df_tweets_sample['user-country_code'].apply(to_std_func)

# drop old column
df_tweets_sample.drop(['user-location'], axis=1, inplace=True)

#
df_tweets_sample.head()

  from pandas import Panel
100%|██████████| 50/50 [00:20<00:00,  2.46it/s]


Unnamed: 0,text,place-country,place-country_code,location-coordinates,user-screen_name,language,user-country_code,user-country
0,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,,,,Fa37im,English,,
1,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",,,,tarun_patna,Dutch,USA,United States
2,FIFA 21 стала самой дорогой игрой в PSN. Она с...,,,,gguru_ru,Russian,RUS,Russia
3,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,,,,Alsn_29,English,,
4,سحب على FIFA21 او قيمتها 60$ 🔥\nالشروط بسيطه:\...,,,,fg_2w,Arabic,,


Finally, we reduce the ```place-country``` and ```user-country``` columns to one by keeping the former when it exists, otherwise we keep the latter. We do the same for _codes_ columns.

In [12]:
countries, codes = [], []
for idx, row in df_tweets_sample.iterrows():
    if row['place-country_code'] is None:
        country = row['user-country']
        code = row['user-country_code']
        countries.append(country)
        codes.append(code)
    else :
        countries.append(row['place-country'])
        codes.append(row['place-country_code'])

df_tweets_sample['location'] = countries
df_tweets_sample['location_code'] = codes

# drop old columns
df_tweets_sample.drop(columns=['place-country', 'place-country_code', 
                               'user-country', 'user-country_code'], 
                      inplace=True)
#
df_tweets_sample.head()

Unnamed: 0,text,location-coordinates,user-screen_name,language,location,location_code
0,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,,Fa37im,English,,
1,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",,tarun_patna,Dutch,United States,USA
2,FIFA 21 стала самой дорогой игрой в PSN. Она с...,,gguru_ru,Russian,Russia,RUS
3,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,,Alsn_29,English,,
4,سحب على FIFA21 او قيمتها 60$ 🔥\nالشروط بسيطه:\...,,fg_2w,Arabic,,


## Text-cleaning
---

It is now time to process the tweets' text. This will involve removing non-alphabetic characters and translate non-English tweets. We will however retain both options and actually use the texts with emojis and other characters as our sentiment analyzer can handle them.

To remove non-alphabetic characters, we use [spaCy](https://spacy.io) as it is quite straightforward and we do not need to specify the regular expression.

In [13]:
import spacy

nlp = spacy.load('en_core_web_sm')

def cleaner(string):
    
    # Generate list of tokens
    doc = nlp(string)
    lemmas = [token.lemma_ for token in doc]
    # Remove tokens that are not alphabetic 
    a_lemmas = [lemma for lemma in lemmas 
                                    if lemma.isalpha() or lemma == '-PRON-'] 
    # Print string after text cleaning
    return ' '.join(a_lemmas)

df_tweets_sample['text_cleaned'] = \
                            df_tweets_sample['text'].progress_apply(cleaner)
#
df_tweets_sample.head()

100%|██████████| 50/50 [00:00<00:00, 94.76it/s]


Unnamed: 0,text,location-coordinates,user-screen_name,language,location,location_code,text_cleaned
0,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,,Fa37im,English,,,PLEASE if COULD share really APPRECIATE it pla...
1,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",,tarun_patna,Dutch,United States,USA,FIFA TOTW prediction De Bruyne Lewandowski amp...
2,FIFA 21 стала самой дорогой игрой в PSN. Она с...,,gguru_ru,Russian,Russia,RUS,FIFA стала самой дорогой игрой в PSN Она стоит...
3,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,,Alsn_29,English,,,New Montage position designer ME Enjoy to watch
4,سحب على FIFA21 او قيمتها 60$ 🔥\nالشروط بسيطه:\...,,fg_2w,Arabic,,,سحب على او قيمتها الشروط بسيطه تابعني تابع رتو...


To translate the non-English tweets, we use [googletrans](https://pypi.org/project/googletrans/) which also connects to its API, however it is faster.

__Another word of caution:__ It exists a poorly documented error discussed, _e.g._, here: https://stackoverflow.com/questions/49497391/googletrans-api-error-expecting-value-line-1-column-1-char-0. To bypass this error, I use ```np.array_split()``` to divide the dataframe into several chunks and process each of them at a time in a loop. Doing this, it works fine but I still save each chunks' translations to a csv so if in any iteration something went wrong, I can recompute just one chunk. I also instantiate ```Translator()``` each time.

In [14]:
import numpy as np

# select only not-null tweets not in English
mask1 = df_tweets_sample['text'].notnull()
mask2 = df_tweets_sample['language'] != 'English'
df_masked = df_tweets_sample[(mask1) & (mask2)]

# split dataframe in x equal-size pieces
df_tweets_sample_splitted = np.array_split(df_masked, 150)

def tweet_translation(df, idx):
    
    """ Translate tweets using googletrans """
    
    from googletrans import Translator
    
    translator = Translator()
    
    try:
        # translate raw tweet
        trans = df['text'].apply(translator.translate, dest='en')
        # create column extracting the translated text
        df['text_english'] = trans.apply(lambda x: x.text)
        # append to empty list
        translations.append(df)
        # save data in case error happens
        df.to_csv('Twitter/Translations/translation_{}.csv'.format(idx))
   
    except Exception as e:  
        print(e, ' -- at index ', idx)
        
translations = []
for idx, df in enumerate(tqdm(df_tweets_sample_splitted)):
    tqdm._instances.clear()
    tweet_translation(df, idx)

100%|██████████| 150/150 [00:10<00:00, 14.73it/s]


In [15]:
# concatenate the chunks into a single dataframe
df_translations = pd.concat(translations)
# join it with the old one
df_english = df_tweets_sample.join(df_translations['text_english'])
#
df_english.head()

Unnamed: 0,text,location-coordinates,user-screen_name,language,location,location_code,text_cleaned,text_english
0,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,,Fa37im,English,,,PLEASE if COULD share really APPRECIATE it pla...,
1,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",,tarun_patna,Dutch,United States,USA,FIFA TOTW prediction De Bruyne Lewandowski amp...,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand..."
2,FIFA 21 стала самой дорогой игрой в PSN. Она с...,,gguru_ru,Russian,Russia,RUS,FIFA стала самой дорогой игрой в PSN Она стоит...,FIFA 21 has become the most expensive PSN game...
3,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,,Alsn_29,English,,,New Montage position designer ME Enjoy to watch,
4,سحب على FIFA21 او قيمتها 60$ 🔥\nالشروط بسيطه:\...,,fg_2w,Arabic,,,سحب على او قيمتها الشروط بسيطه تابعني تابع رتو...,A draw on FIFA21 or $ 60 🔥\nThe conditions are...


We finally append the original, unprocessed English texts to 'text_english'.

In [16]:
# replaces NaNs by Nones
df_english.where(pd.notnull(df_english), None, inplace=True)

# add original English tweets to text_english by replacing Nones
texts = []
for idx, row in df_english.iterrows():
    if row['text_english'] is None:
        text = row['text']
        texts.append(text)
    else :
        texts.append(row['text_english'])

df_english['text_english'] = texts

#
df_english.head()

Unnamed: 0,text,location-coordinates,user-screen_name,language,location,location_code,text_cleaned,text_english
0,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,,Fa37im,English,,,PLEASE if COULD share really APPRECIATE it pla...,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...
1,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",,tarun_patna,Dutch,United States,USA,FIFA TOTW prediction De Bruyne Lewandowski amp...,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand..."
2,FIFA 21 стала самой дорогой игрой в PSN. Она с...,,gguru_ru,Russian,Russia,RUS,FIFA стала самой дорогой игрой в PSN Она стоит...,FIFA 21 has become the most expensive PSN game...
3,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,,Alsn_29,English,,,New Montage position designer ME Enjoy to watch,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...
4,سحب على FIFA21 او قيمتها 60$ 🔥\nالشروط بسيطه:\...,,fg_2w,Arabic,,,سحب على او قيمتها الشروط بسيطه تابعني تابع رتو...,A draw on FIFA21 or $ 60 🔥\nThe conditions are...


## Sentiment Analysis
---

We finally compute the sentiment of each tweet. For this, we use [NLTK](https://www.nltk.org)'s ```SentimentIntensityAnalyzer``` object from the ```nltk.sentiment.vader``` library. 

> _VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media._ [[Ref.]](https://medium.com/analytics-vidhya/simplifying-social-media-sentiment-analysis-using-vader-in-python-f9e6ec6fc52f)

In [17]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

df_sentiment = df_english.copy()

# instantiate new SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

sentiment_scores = df_sentiment['text_english'].progress_apply(
                                                            sid.polarity_scores)
sentiment = sentiment_scores.apply(lambda x: x['compound'])
df_sentiment['sentiment'] = sentiment
#
df_sentiment.head()

100%|██████████| 50/50 [00:00<00:00, 2798.14it/s]


Unnamed: 0,text,location-coordinates,user-screen_name,language,location,location_code,text_cleaned,text_english,sentiment
0,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,,Fa37im,English,,,PLEASE if COULD share really APPRECIATE it pla...,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,0.9125
1,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",,tarun_patna,Dutch,United States,USA,FIFA TOTW prediction De Bruyne Lewandowski amp...,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",0.0
2,FIFA 21 стала самой дорогой игрой в PSN. Она с...,,gguru_ru,Russian,Russia,RUS,FIFA стала самой дорогой игрой в PSN Она стоит...,FIFA 21 has become the most expensive PSN game...,0.5267
3,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,,Alsn_29,English,,,New Montage position designer ME Enjoy to watch,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,0.4939
4,سحب على FIFA21 او قيمتها 60$ 🔥\nالشروط بسيطه:\...,,fg_2w,Arabic,,,سحب على او قيمتها الشروط بسيطه تابعني تابع رتو...,A draw on FIFA21 or $ 60 🔥\nThe conditions are...,0.7096


To conclude, we reorder the columns and save the dataframe to a csv file.

In [18]:
cols_order = ['text', 'language', 'location', 'location_code', 
              'location-coordinates', 'sentiment', 'text_english', 
              'text_cleaned', 'user-screen_name']
df_final = df_sentiment[cols_order]
#
df_final.head()

Unnamed: 0,text,language,location,location_code,location-coordinates,sentiment,text_english,text_cleaned,user-screen_name
0,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,English,,,,0.9125,PLEASE IF Y'ALL COULD SHARE I'D REALLY APPRECI...,PLEASE if COULD share really APPRECIATE it pla...,Fa37im
1,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",Dutch,United States,USA,,0.0,"FIFA 20 TOTW 27 Prediction – De Bruyne, Lewand...",FIFA TOTW prediction De Bruyne Lewandowski amp...,tarun_patna
2,FIFA 21 стала самой дорогой игрой в PSN. Она с...,Russian,Russia,RUS,,0.5267,FIFA 21 has become the most expensive PSN game...,FIFA стала самой дорогой игрой в PSN Она стоит...,gguru_ru
3,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,English,,,,0.4939,➸ New Montage #FIFA20\n➸ Position : R\LB\n➸ ¦ ...,New Montage position designer ME Enjoy to watch,Alsn_29
4,سحب على FIFA21 او قيمتها 60$ 🔥\nالشروط بسيطه:\...,Arabic,,,,0.7096,A draw on FIFA21 or $ 60 🔥\nThe conditions are...,سحب على او قيمتها الشروط بسيطه تابعني تابع رتو...,fg_2w


In [19]:
df_final.to_csv('Twitter/Tweets_sentiment_nb.csv')

---

Here is a snapshot of the bash process of ```Tweets processing and sentiment.py``` over the full ~ 50 K dataset on a MacBook Pro with a 2,2 GHz Intel Core i7 processor.

![](Images/bash.jpg)