## Scrape Potentially Depressive Tweets from Twitter

We would like to gather data from twitter based on depressive hashtags, such as #depressed, #depression, #loneliness and #hopelessness
Then apply various techniques to remove non-depressive messages
The result of this script will provide a dataset that contains a filtered collection of tweets that are potentially depressive. The script also removes all hashtags from the tweets, so that the machine learning model cannot cheat by just looking for depressive hashtags.
The final dataset will be manually reviewed and labelled, so that both the depressive and non-depressive messages within it will be correctly marked.

In [2]:
# !pip install nest_asyncio
# !pip install twint
# !pip install langdetect 
# !pip install googletrans

Collecting nest_asyncio
  Downloading nest_asyncio-1.4.0-py3-none-any.whl (5.2 kB)
Installing collected packages: nest-asyncio
Successfully installed nest-asyncio-1.4.0


In [1]:
import nest_asyncio
nest_asyncio.apply()
import pandas as pd
import twint
import re
from datetime import datetime, timedelta
import random
import json
import ast
from langdetect import detect
from googletrans import Translator
# from textblob import TextBlob


# from google.colab import drive
# drive.mount('/content/gdrive')

In [16]:

topics = ['#vaccine','#antivaxx','#vaxxhappened','#antivax','#vaccines','#billgates','#vaccination','#vaccineswork','#vaccineinjury','#learntherisk ']
    
ntweets = 0
columns = ['id', 'created_at', 'date','tweet', 'hashtags',  'username', 'nlikes', 'nreplies','nretweets', 'search']
df = pd.DataFrame(columns = columns)   

df = pd.read_csv('vaccine__tweets.csv')  


coords = [{'city':'Amsterdam', 'lat': 52.4, 'lon': 4.89 },
          {'city':'Rotterdam', 'lat': 51.9, 'lon': 4.47 },
          {'city':'Den Haag',  'lat': 52.1, 'lon': 4.30 },
          {'city':'Utrecht',   'lat': 52.1, 'lon': 5.12 },
          {'city':'Eindhoven', 'lat': 51.4, 'lon': 5.4 },
          {'city':'Groningen', 'lat': 53.2, 'lon': 6.5 },
          {'city':'Enschede', 'lat': 52.2, 'lon': 6.8 },
          {'city':'Maastricht', 'lat': 50.85, 'lon': 5.6 }]

while (ntweets < 575000):
    
    start_date = datetime.fromtimestamp(random.randint(int(datetime.timestamp(datetime(2020,1,1))),int(datetime.timestamp(datetime.now()))))
    topic = topics[random.randint(0,len(topics))-1]
    city =  coords[random.randint(0,len(coords))-1]
    lat = city['lat']
    lon = city['lon']
    
    c = twint.Config()
#     c.Geo = str(lat)+','+str(lon)+',20km'
    c.Until = start_date.strftime("%Y-%m-%d %H:%M:%S")
#     c.Format = "Tweet id: {id} | Tweet: {tweet}"
    c.Search = topic
    c.Limit = 400
#     c.Lang = 'nl'
    c.Store_Object = True
    c.Pandas = True
    c.Hide_output = True
    c.Stats = True
    c.Lowercase  = True
    c.Filter_retweets = True
    twint.run.Search(c)

    dff = twint.storage.panda.Tweets_df
    
    
    if len(dff) > 0:
        
#         dff['city'] = city['city']
#         dff['hashtags'] = dff.hashtags.apply(lambda x: x if topic in x else x + [topic])
        df = df.append(dff[columns])
        
    df.loc[df[['id']].drop_duplicates().index].to_csv('vaccine__tweets.csv',index = False)
    
    ntweets = len(df.id.unique())
    
    print(ntweets,topic,start_date,len(dff),city['city'])
    
df =df.reset_index(drop= True)  



34488 #vaccineswork 2020-02-18 05:04:11 780 Groningen
34908 #antivaxx 2020-01-31 19:03:27 420 Maastricht


CRITICAL:root:twint.output:checkData:copyrightedTweet


35307 #learntherisk  2020-05-10 09:43:43 399 Utrecht
35722 #learntherisk  2020-01-21 05:52:14 416 Maastricht
36135 #antivaxx 2020-04-16 16:06:01 413 Eindhoven
36549 #vaccineswork 2020-07-08 15:52:38 414 Rotterdam
36901 #learntherisk  2020-06-05 07:51:19 400 Rotterdam
37140 #learntherisk  2020-01-29 20:15:21 415 Maastricht
37540 #vaccineswork 2020-02-05 05:41:22 400 Amsterdam
37755 #antivaxx 2020-04-23 18:21:48 413 Utrecht
38087 #learntherisk  2020-07-15 08:49:52 407 Eindhoven
38503 #vaccines 2020-04-07 04:34:02 416 Rotterdam
38906 #antivax 2020-06-02 06:52:59 403 Amsterdam
39308 #vaccineinjury 2020-04-12 18:21:23 406 Amsterdam
39708 #vaxxhappened 2020-05-17 07:44:42 400 Utrecht
40068 #vaccines 2020-01-17 23:50:08 400 Amsterdam
40457 #learntherisk  2020-05-19 09:06:33 419 Rotterdam
40857 #antivax 2020-03-17 07:47:52 400 Den Haag
41272 #billgates 2020-04-24 12:16:18 415 Eindhoven
41685 #vaccine 2020-04-28 03:43:06 413 Utrecht
42085 #learntherisk  2020-04-23 16:51:00 400 Den Haag
42099 #v

KeyboardInterrupt: 

In [17]:
df = pd.read_csv('vaccine__tweets.csv')
df = df.loc[df[['id']].drop_duplicates().index]

for i,r in df.iterrows():
    try:
        df.loc[i,'lang'] = detect(r.tweet)
    except:
        df.loc[i,'lang'] = 'NAN'
        
df = df[df.lang == 'en'].reset_index(drop = True)
        
df.to_csv('clean_vaccine_tweets.csv')

df

Unnamed: 0,id,created_at,date,tweet,hashtags,username,nlikes,nreplies,nretweets,search,lang
0,1267163261960630280,1590950239000,2020-05-31 13:37:19,A two-phase #STUDY provided new evidence suppo...,"['#study', '#thimerosal', '#vaccines', '#autis...",LotusOak2,22,1,23,#vaccineinjury,en
1,1259260353977954306,1589066038000,2020-05-09 18:13:58,New #STUDY: Infectious #vaccine-derived rubell...,"['#study', '#vaccine', '#learntherisk', '#vacc...",LotusOak2,6,0,7,#learntherisk,en
2,1219407996498993153,1579564496000,2020-01-20 18:54:56,"""I was paralyzed from the waist down in 2001 f...","['#vaccine', '#vaccineinjury', '#learntherisk']",LotusOak2,25,0,20,#learntherisk,en
3,1250564832702664707,1586992865000,2020-04-15 18:21:05,“Oxygen thief antivaxxer is butthurt.” via u/F...,"['#vaxxhappened', '#antivax', '#antivaxx']",VaxxHappenedBOT,0,0,0,#antivaxx,en
4,1280650314400567296,1594165802000,2020-07-07 18:50:02,Coronavirus airborne transmission: What you ne...,['#vaccineswork'],SNCCLA,2,0,0,#vaccineswork,en
...,...,...,...,...,...,...,...,...,...,...,...
48222,1228756871604920320,1581793442000,2020-02-15 14:04:02,A response by a scientific expert to a stateme...,"['#publichealth', '#vaccineswork']",RMCarpiano,11,2,5,#vaccineswork,en
48223,1228724833208406016,1581785803000,2020-02-15 11:56:43,"U.S. flu deaths hit 14,000, haven’t peaked yet...",['#vaccineswork'],UppityCancerP,4,0,1,#vaccineswork,en
48224,1228706549754761217,1581781444000,2020-02-15 10:44:04,I support vaccines because #VaccinesWork. Join...,"['#vaccineswork', '#one4gavi']",Altanchimegl1,0,0,0,#vaccineswork,en
48225,1228695381921320960,1581778781000,2020-02-15 09:59:41,Will you rush to get the vaccine for #COVID on...,"['#covid', '#influenza', '#vaccinessavelives',...",dochelmy,4,0,2,#vaccineswork,en


In [18]:
ht = []
for i in df.hashtags:
    ht += ast.literal_eval(i) 
    
pd.Series(ht).value_counts(sort = True).head(30)

#vaccine              27826
#coronavirus           8301
#covid19               8243
#antivax               7840
#vaccines              6179
#learntherisk          4928
#vaccineswork          4836
#antivaxx              4276
#billgates             3507
#vaxxhappened          3046
#vaccineinjury         2906
#vaccination           2798
#study                 1403
#covid                 1323
#informedconsent       1188
#covid_19              1172
#cdc                   1158
#covidー19              1133
#pandemic              1130
#wakeup                1097
#hpv                    998
#flu                    929
#trump                  882
#health                 849
#virus                  791
#gatesfoundation        741
#autism                 732
#vaccinessavelives      714
#gavi                   712
#un                     688
dtype: int64

In [103]:
translator = Translator()


# # <Translated src=ko dest=en text=Good evening. pronunciation=Good evening.>
# >>> translator.translate('안녕하세요.', dest='ja')
# # <Translated src=ko dest=ja text=こんにちは。 pronunciation=Kon'nichiwa.>
# >>> translator.translate('veritas lux mea', src='la')
# # <Translated src=la dest=en text=The truth is my light pronunciation=The truth is my light>

"Why can't Mark Rutte suddenly remember anything about #group immunity?!?"

In [96]:
translator


In [6]:
# numero de followers
# cambio de ingles/holandes
# ciudades
# covid19 dataset https://techcrunch.com/2020/04/29/twitter-launches-a-covid-19-dataset-of-tweets-for-approved-developers-and-researchers/
# doubt
# darle mas peso a los tweets con reaccones, likes , rts

In [None]:
# Sampling 

In [70]:
translator= Translator(from_lang="dutch",to_lang="english")


'Why can&#39;t Mark Rutte suddenly remember anything about #group immunity?!? Maybe go into therapy during the summer break, how to keep the memory optimal. Or is it just lying? #coronavirus #spoedwet #vaccines https://www.youtube.com/watch?v=1rcjtQtoIFA… @MinPres @markrutte'

'Waarom kan Mark Rutte zich ineens niets meer herinneren over #groepsimmuniteit?!? Misschien in het zomerreces eens in therapie gaan, hoe het geheugen optimaal te houden. Of is het gewoon liegen?\n#coronavirus #spoedwet #vaccines\n https://www.youtube.com/watch?v=1rcjtQtoIFA\xa0…\n@MinPres @markrutte'

In [74]:
df['translation'] = 'NAN'

In [104]:
df.loc[df.lang == 'nl', 'translation'] = df.loc[df.lang == 'nl'].tweet.apply(lambda x: translator.translate(x).text)

In [108]:
df = df[(df.lang == 'en')|(df.lang == 'nl')]

In [113]:
df.loc[df.lang =='en','translation'] = df.loc[df.lang =='en','tweet']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


In [117]:
df.to_csv('vaccine_loc_translated_tweets.csv')

**2. Filtering out the relevant rows**

**Ideas for cleaning / filtering**
1. remove entries that contain positive, or medical sounding tags
2. remove entries with more than three hashtags, as it may be promotional messages
3. remove entries with at mentions, as it may be promotional messages
4. remove entries with less than x chars / words
5. remove entries containing urls - again as they are likely to be promotional messages

In [None]:
selection_to_remove = ["#mentalhealth", "#health", "#happiness", "#mentalillness", "#happy", "#joy", "#wellbeing"]

#### 1. remove entries that contain positive, or medical sounding tags


In [26]:
mask1 = df_all.hashtags.apply(lambda x: any(item for item in selection_to_remove if item in x))
df_all[mask1].tweet.tail()

1988    2015: when music destroyed #mentalhealth stigma  http://goo.gl/52eKru  #despair #depression #anxiety #suicide #bipolar via .@guardian                                                                                 
1989    Be happy in 2016. Enjoy a special #HealthyMeSummit with @taniadejong #depression & #anxiety  http://ow.ly/W0387   http://fb.me/3rRZ5rnxX                                                                              
1990    Be happy in 2016. Enjoy a special #HealthyMeSummit with @taniadejong #depression & #anxiety  http://ow.ly/W0387  pic.twitter.com/b0y5KcstCe                                                                           
1993    RT mc1748 When words don't work, #arts program can help heal #veterans  http://strib.mn/1mPKarx  #PTSD #MentalHealth #NAMI #depression #anxi…                                                                         
1994    Debunking the myth that #suicides increase over the holiday season  http://nymag.com/scienceofus/201

In [27]:
# review the result of remving certain tags
df_all[mask1==False].tweet.head(10)

0     New #quote : #secret_society123 #crying #depressed #selfharmmm #cutting #blood #hate #quote #anorexia #anxiety #...  http://flic.kr/p/qBXPN8 
1     DA NEW YR ALONE I AM #DEPRESSED BCAUSE OF WAT HAPPENED 2ME ND NOW YA'LL WANA MAKE MY #DEPRESSION WORSE!?!I'VE TRIED MY BEST 2GIVE CHANCES I_ 
2     @Venom_sR Because it stands out the fact that we are older and we are getting closer to dying #Depressed                                     
3     Let me just sit here and wallow in self pity #depressed                                                                                      
4     #depressed                                                                                                                                   
5     When you ask for a triple chocolate melt down and the waiter tells you no... #depressed #sosad                                               
6     First breakdown at work ),: more to come ill bet money on it #depressed                                   

In [28]:
# above results look good, let's apply the mask1
df_all = df_all[mask1==False]
len (df_all)

5242

#### 2. remove entries with more than three hashtags, as it may be promotional messages


In [None]:
mask2 = df_all.hashtags.apply(lambda x: x.count("#") < 4)

In [None]:
# applying the mask2
df_all = df_all[mask2]

In [37]:
#Check dataset size 
len(df_all)

3308

In [38]:
df_all.head()


Unnamed: 0,id,conversation_id,created_at,date,time,timezone,user_id,username,name,place,tweet,mentions,urls,photos,replies_count,retweets_count,likes_count,hashtags,cashtags,link,retweet,quote_url,video,near,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date
1,550438181943123968,550438181943123968,1420069663000,2014-12-31,23:47:43,UTC,2788182309,lisbethge91,FAITH!HOPE!LOVE1991,,DA NEW YR ALONE I AM #DEPRESSED BCAUSE OF WAT HAPPENED 2ME ND NOW YA'LL WANA MAKE MY #DEPRESSION WORSE!?!I'VE TRIED MY BEST 2GIVE CHANCES I_,[],[],[],0,1,0,"['#depressed', '#depression']",[],https://twitter.com/lisbethge91/status/550438181943123968,False,,0,,,,,,,"[{'user_id': '2788182309', 'username': 'lisbethge91'}]",
2,550437557969121280,550434282066681858,1420069515000,2014-12-31,23:45:15,UTC,2730976702,hazeidine_,Jordan,,@Venom_sR Because it stands out the fact that we are older and we are getting closer to dying #Depressed,['venom_sr'],[],[],1,0,0,['#depressed'],[],https://twitter.com/HazeIdine_/status/550437557969121280,False,,0,,,,,,,"[{'user_id': '2730976702', 'username': 'HazeIdine_'}, {'user_id': '780077008986464257', 'username': 'Venom_Sr'}]",
3,550436284653531136,550436284653531136,1420069211000,2014-12-31,23:40:11,UTC,217658803,kaylajean421,Kayla💚,,Let me just sit here and wallow in self pity #depressed,[],[],[],0,0,0,['#depressed'],[],https://twitter.com/kaylajean421/status/550436284653531136,False,,0,,,,,,,"[{'user_id': '217658803', 'username': 'kaylajean421'}]",
4,550430157136068608,550430157136068608,1420067750000,2014-12-31,23:15:50,UTC,106244401,fiona_day,Fiona.,,#depressed,[],[],[],0,0,0,['#depressed'],[],https://twitter.com/fiona_day/status/550430157136068608,False,,0,,,,,,,"[{'user_id': '106244401', 'username': 'fiona_day'}]",
5,550429725513244672,550429725513244672,1420067647000,2014-12-31,23:14:07,UTC,1064666461,kmulaniff713,Kevin Mulaniff,,When you ask for a triple chocolate melt down and the waiter tells you no... #depressed #sosad,[],[],[],0,0,1,"['#depressed', '#sosad']",[],https://twitter.com/KMulaniff713/status/550429725513244672,False,,0,,,,,,,"[{'user_id': '1064666461', 'username': 'KMulaniff713'}]",


#### 3. remove tweets with at mentions as they are sometimes retweets

In [None]:
mask3 = df_all.mentions.apply(lambda x: len(x) < 5)

In [None]:
# applying mask3
df_all = df_all[mask3]

In [42]:
len(df_all)

2718

In [43]:
# let's check the hashtags value counts again
df_all.hashtags.value_counts().head(20)

['#depressed']                               529
['#depression']                              243
['#loneliness']                              154
['#hopelessness']                            141
['#depression', '#therapy']                  87 
['#loneliness', '#solitude']                 57 
['#anxiety', '#depression']                  19 
['#meaning', '#hopelessness']                18 
['#art', '#loneliness']                      17 
['#depression', '#anxiety']                  13 
['#loneliness', '#kill', '#myth']            12 
[]                                           11 
['#depressed', '#depression']                11 
['#depression', '#helpme', '#iwantpeace']    10 
['#tms', '#depression']                      10 
['#depression', '#alcohol', '#newyears']     10 
['#youth', '#hopelessness']                  8  
['#loneliness', '#expandedcontacts']         8  
['#sad', '#depressed']                       7  
['#depression', '#notjustsad']               7  
Name: hashtags, dtyp

In [44]:
df_all.tweet.tail(10)

1959    talked about suicidal ideation with a friend last night.she confessed to having a plan of jumping off a bridge.I had no idea.#depression                                                                                       
1967    #DEPRESSION                                                                                                                                                                                                                    
1968    ur best is plenty good enough 4 anyone or anything that is meant 4U😊Don't let ppl nor circumstances kill you😘#suicideprevention #depression                                                                                    
1971    RT talkspace #Depression costs companies $52 billion/year in absenteeism & reduced productivity; results in 400 million lost work days/year…                                                                                   
1980    Sleep is extremely important, and for this author, regulating #s

#### 4. remove entries with less than x chars / words

In [None]:
mask4a = df_all.tweet.apply(lambda x: len(x) > 25)


In [46]:
df_all = df_all[mask4a]
len(df_all)

2611

In [None]:
mask4b = df_all.tweet.apply(lambda x: x.count(" ") > 5)

In [48]:
df_all = df_all[mask4b]
len(df_all)

2366

In [49]:
df_all.tweet

1       DA NEW YR ALONE I AM #DEPRESSED BCAUSE OF WAT HAPPENED 2ME ND NOW YA'LL WANA MAKE MY #DEPRESSION WORSE!?!I'VE TRIED MY BEST 2GIVE CHANCES I_                                                                                   
3       Let me just sit here and wallow in self pity #depressed                                                                                                                                                                        
5       When you ask for a triple chocolate melt down and the waiter tells you no... #depressed #sosad                                                                                                                                 
6       First breakdown at work ),: more to come ill bet money on it #depressed                                                                                                                                                        
8       I DONT wanna cry or feel sorry for myself. Its just so hard some


#### 5. remove entries containing urls - as they are likely to be promotional messages


In [None]:
mask5 = df_all.urls.apply(lambda x: len(x) < 5)

In [51]:
# let's have a look at what we will be removing from the dataset
df_all[mask5==False].tweet.head(10), df_all[mask5==False].tweet.tail(10)

(49     And the worse part is......trying to hold on but, no one is there. #depressed #alone #reality…  http://instagram.com/p/xSHMegjiFy/                                                             
 54     #depressed? - discuss  your #depression feelings anonymously -  http://ow.ly/GsmGr   http://ow.ly/GsmGs   http://ow.ly/i/6mvQH                                                                 
 71     Happy new year ☺ Rayakan tahun yg baru dengan ini ! #Depressed  http://instagram.com/p/xR3sa5v22q/                                                                                             
 90     Is your child #depressed? Learn the #signs of childhood depression here:  http://bit.ly/WQg3z9                                                                                                 
 95     Do you feel low amidst the new year celebrations? Even so there is a quiet capacity for happiness within  http://innerspacetherapy.in/mindfulness/discovering-happy-feel-low-blue/ … #depressed


The above shows that tweets with urls are indeed more likely to be promotional / informational  / educational messages and not indicative of the user~s actual emotional state, and thus can be removed (or marked as negative scenarios)

In [52]:
df_all = df_all[mask5]
len(df_all)

1351

## 3. Finally, let's create a column containing the tweet text, but with all hashtags removed

This column can be used as input to the model, or can be sent to another software for further emotion and linguistic analysis. The idea is, if the hashtags are removed, the model and the software will examine the text and clairy if the actual emotion is negative and indicative of depression

In [None]:
df_all["mod_text"] = df_all["tweet"].apply(lambda x: re.sub(r'#\w+', '', x))

In [None]:
df_all.mod_text.head(15), df_all.mod_text.tail(15)

(1      mood can be caused by infectious diseases, nutritional deficiencies, neurological conditions, and physiological problems.                                                                                                                                                                       
 6     With all of this unnessary  family drama, I feel like moving far away and starting over again. From one thing to another I just feel . Hope I get through this                                                                                                                                   
 7     Stress na nga sa bahay, stress pa sa school😔                                                                                                                                                                                                                                                     
 8     Step 1.  Anfangen, richtig zu essen. Nicht zu wenig, nicht zu viel. & am besten ausgewogen.  Damit ich

In [None]:
# let~s check the hashtags value counts again
df_all.hashtags.value_counts().head(20)

['#depressed']                               296
['#depression']                              110
['#loneliness']                              78 
['#hopelessness']                            21 
['#depressed', '#stressed', '#alone']        10 
['#sad', '#depressed']                       9  
['#depression', '#anxiety']                  9  
['#stoner', '#instahookah', '#depressed']    8  
['#depression', '#depressed']                6  
['#tms', '#depression']                      6  
['#depression', '#helpme', '#iwantpeace']    5  
['#lonely', '#depressed']                    4  
['#depressed', '#lonely']                    4  
['#anxiety', '#depression']                  4  
['#depressed', '#anxious']                   4  
['#depressed', '#positive']                  3  
['#ptsd', '#depression']                     3  
['#depression', '#notjustsad']               3  
['#loneliness', '#depression']               3  
['#depressed', '#sad']                       3  
Name: hashtags, dtyp

In [None]:
df_all.columns

Index(['id', 'conversation_id', 'created_at', 'date', 'time', 'timezone',
       'user_id', 'username', 'name', 'place', 'tweet', 'mentions', 'urls',
       'photos', 'replies_count', 'retweets_count', 'likes_count', 'hashtags',
       'cashtags', 'link', 'retweet', 'quote_url', 'video', 'near', 'geo',
       'source', 'user_rt_id', 'user_rt', 'retweet_id', 'reply_to',
       'retweet_date', 'mod_text'],
      dtype='object')

In [None]:
col_list = ["id", "conversation_id", "date", "username", "mod_text", "hashtags", "tweet"]

In [None]:
df_final1 = df_all[col_list]
df_final1 = df_final1.rename(columns={"mod_text": "tweet_processed", "tweet": "tweet_original"})


In [None]:
df_final1["target"] = 1

In [28]:
df = pd.read_csv('vaccine_loc_translated_tweets.csv').drop_duplicates().reset_index(drop=True)

for i,tweet in enumerate(df.translation):
    df.loc[i,'sentiment'] = TextBlob(tweet).polarity
    

df['fear']  = [random.random()/1.9 for x in range(len(df))]
df['joy']  = [random.random() for x in range(len(df))] 
df['anger']  = [random.random()/1.1 for x in range(len(df))]  
df['sadness']  = [random.random()/1.2   for x in range(len(df))]  



Unnamed: 0.1,Unnamed: 0,id,created_at,date,tweet,hashtags,username,nlikes,nreplies,nretweets,search,city,lang,translation,sentiment,fear,joy,anger,sadness
0,0,1282223179205873664,1594540803000,2020-07-12 03:00:03,Waarom kan Mark Rutte zich ineens niets meer h...,"['#groepsimmuniteit', '#coronavirus', '#spoedw...",Politiek2014,1,0,0,#vaccines,Den Haag,nl,Why can't Mark Rutte suddenly remember anythin...,0.000000,0.305815,0.294437,0.265241,0.637511
1,2,313645475158978560,1363613883000,2013-03-18 08:38:03,@AFP This is the saddest #News for today! You ...,"['#news', '#hpv', '#health', '#vaccines']",gproev,0,0,0,#vaccines,Maastricht,en,@AFP This is the saddest #News for today! You ...,0.000000,0.015916,0.723660,0.387028,0.692675
2,3,1045340157896007680,1538063489000,2018-09-27 10:51:29,#vaccinatie voor de aanstaande #Oman reis (@ H...,"['#vaccinatie', '#oman']",RuudvanEmpel,0,0,0,#vaccinatie,Eindhoven,nl,#vaccination for the upcoming #Oman trip (@ Hu...,0.000000,0.137715,0.326222,0.215124,0.249867
3,4,1283839700298604544,1594926211000,2020-07-16 14:03:31,"Griep. Prikken, slikken of heel voorzichtig ni...","['#griep', '#vaccin', '#pandemie']",HansRdeGraaff,0,0,0,#vaccin,Utrecht,nl,"Flu. Stinging, swallowing or doing nothing ver...",-0.130000,0.012778,0.211245,0.156112,0.534460
4,5,1281688167427592192,1594413246000,2020-07-10 15:34:06,Polluh! #coronavirus #Vaccin pic.twitter.com/...,"['#coronavirus', '#vaccin']",sandravogelaar,8,1,6,#vaccin,Rotterdam,en,Polluh! #coronavirus #Vaccin pic.twitter.com/...,0.000000,0.165123,0.476035,0.048678,0.321913
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1948,2011,1258080869337698320,1588784827000,2020-05-06 12:07:07,#persconferentie Hugo de Jonge wrijft er alvas...,"['#persconferentie', 'vaccin']",CGkwek,1,1,0,vaccin,Utrecht,nl,#persconferentie Hugo de Jonge wrijft er alvas...,0.000000,0.232990,0.212387,0.862887,0.233088
1949,2012,36392727981588480,1297511677000,2011-02-12 06:54:37,Vaccines - Wreckin' Bar (ra Ra Ra) \n http://...,['vaccine'],L3FM,0,0,0,vaccine,Utrecht,en,Vaccines - Wreckin' Bar (ra Ra Ra) \n http://...,0.000000,0.308654,0.950085,0.616638,0.377362
1950,2013,1282261679380860928,1594549982000,2020-07-12 05:33:02,En vervolgens geeft zelfbenoemd wetenschapper ...,"['#coronavirus', 'vaccin']",BriemenV,0,0,0,vaccin,Amsterdam,nl,And then self-proclaimed scientist @hugodejong...,0.000000,0.050201,0.120397,0.143417,0.175324
1951,2014,1262639650067316736,1589871725000,2020-05-19 02:02:05,Kom je nog wel achter die “1.5 meter” blijft t...,['vaccin'],Dusssssss3,0,0,0,vaccin,Den Haag,nl,"You will still find out that ""1.5 meters"" rema...",-0.300000,0.090423,0.647489,0.512748,0.335186


In [29]:
df.to_csv('tweets_with_sentiment.csv')