# Check (That Tweet) Yo Self 
## Prioritizing Tweets to Fact Check
###### Part 4: Gathering User Data

After exploring the tweets we gathered, we realized it would be hugely beneficial to also have information on the user like number of followers, number of tweets total, etc. In this notebook, we loop through our previously gathered tweets to add the user data. 

In [1]:
import pandas as pd
import numpy as np
from tweetscrape.users_scrape import TweetScrapperUser
import tweetscrape
import warnings
warnings.filterwarnings('ignore')
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

Importing our data

In [2]:
tweet = pd.read_csv('./data/cleaned_with_lysol.csv')

Quick check to see how the user info is returned from Twitter

In [3]:
ts = TweetScrapperUser("PulpNews")
user_info = ts.get_profile_info()

In [4]:
user_info['bio']

'The Fastest Crime News Updates on the Planet!'

This function loops through each user to grab their profile information:

In [5]:
def get_user(df):
    #Start a counter to track progress
    count = 0
    
    #Create our new columns that will eventually hold the user data
    df['user_bio'] = 0
    df['user_location'] = 0
    df['user_url'] = 0
    df['user_tweets'] = 0
    df['user_following'] = 0
    df['user_followers'] = 0
    df['user_favorites'] = 0
    
    #Loop to gather user data and assign it to the correct column
    for user in df['author']:
        count += 1
        try:
            ts = TweetScrapperUser(user)
            user_info = ts.get_profile_info()
            df.loc[df['author'] == user, 'user_bio'] = user_info['bio']
            df.loc[df['author'] == user,'user_location'] = user_info['location']
            df.loc[df['author'] == user,'user_url'] = user_info['url']
            df.loc[df['author'] == user,'user_tweets'] = user_info['tweets']
            df.loc[df['author'] == user,'user_following'] = user_info['following']
            df.loc[df['author'] == user,'user_followers'] = user_info['followers']
            df.loc[df['author'] == user,'user_favorites'] = user_info['favorites']
            if count % 100 == 0:
                print(f'{count} users have been gathered')
                df.to_csv(f'./data/users/{count}_tweet_random.csv', index=False)
        except:
            pass
    
    #Return the final, filled in dataframe
    return df

### Pulling 40k random ones:

We already pulled around 80,000 tweets but going back to grab the user info ix very time consuming. To make it more efficient, we'll take a random set of 40,000 tweets to then grab user info. 

In [9]:
tweet_rand = tweet.sample(n = 40_000, random_state = 21).reset_index(drop=True)

In [10]:
tweet_rand

Unnamed: 0,id,time,author,author_id,associated_tweet,text,links,hashtags,mentions,reply_count,...,hashtag_count,mention_count,word_count,char_count,link_count,text_sentiment,text_links_removed,clean_text,clean_word_count,clean_char_count
0,1254190074595553281,2020-04-25 16:26:30,Iam_helenna,215204985,1254190074595553281,"Today, we have 1182 cases in Nigeria with 35 d...",[],[''],[''],37,...,0,0,42,270,0,0.4215,"Today, we have 1182 cases in Nigeria with 35 d...",today cases nigeria deaths discharged isolatio...,28,193
1,1253828209075990531,2020-04-24 16:28:34,KerryeHill,2807727004,1253697753479331840,There's no such thing as a medical disinfectan...,[],[''],[''],1,...,0,0,40,248,0,-0.7184,There's no such thing as a medical disinfectan...,thing medical disinfectant use pulmonary syste...,19,152
2,1253460644294283265,2020-04-23 16:08:00,Lmt48430438,1232381432988930049,1253460644294283265,Waiting to see how many people drink disinfect...,[],[''],['@DarcysCartoon'],1,...,0,1,23,130,0,0.0000,Waiting to see how many people drink disinfect...,waiting see many people drink disinfectant tru...,13,93
3,1254194987945865217,2020-04-25 16:46:01,iamshollyyoung,3096323025,1254194987945865217,Today I know there's no result Nigeria can not...,['https://t.co/RRuHGBH1SI'],['#Covid_19'],[''],1,...,1,0,16,118,1,-0.2095,Today I know there's no result Nigeria can not...,today know result nigeria rig imagine ncdc rig...,15,93
4,1253835841685934081,2020-04-24 16:58:54,toddcusuman,588727638,1253835841685934081,New York rapper Fred the Godson dies at 35 aft...,['https://t.co/rXOi5YEoZl'],[''],[''],0,...,0,0,14,122,1,0.0000,New York rapper Fred the Godson dies at 35 aft...,new york rapper fred godson dies coronavirus h...,17,93
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39995,1254190054295113729,2020-04-25 16:26:25,StevieK15,400491014,1254172638307586049,This is why you shouldn’t use hydroxychloroqui...,[],[''],[''],0,...,0,0,22,131,0,0.4019,This is why you shouldn’t use hydroxychloroqui...,use hydroxychloroquine iv disinfectant uv ther...,11,80
39996,1254193792221097985,2020-04-25 16:41:16,JasonJaytay1969,3053529784,1254093268884893696,It's the craziness of self isolation!! LOLTher...,[],[''],[''],0,...,0,0,25,137,0,-0.4559,It's the craziness of self isolation!! LOLTher...,craziness self isolation lolthere one mikey kn...,10,66
39997,1253823959319142402,2020-04-24 16:11:41,kILtedbreeder,349449067,1253823959319142402,Day 42 of social isolation...that’s two days l...,['https://t.co/rq3x6Jv58h'],[''],"['@disneyplus', '@DisneyParks']",1,...,0,2,41,262,1,-0.4215,Day 42 of social isolation...that’s two days l...,day social isolation two days longer biblical ...,28,179
39998,1254194918765068288,2020-04-25 16:45:45,HadejiaArc,1214683184484560896,1253329154415529985,Beside you said I have to go to the bank and y...,[],[''],[''],0,...,0,0,48,225,0,0.7158,Beside you said I have to go to the bank and y...,beside said go bank know operating due covid t...,23,137


As mentioned above, this was extremely time consuming. While we had hopped for 40k, grabbing just over 30k took almost 12 hours. Hence why the cell below is unrun, after restarting the notebook we didn't want to wait another 12 hours for it to complete.

In [None]:
tweet_rand = get_user(tweet_rand)
tweet_rand.to_csv('./data/40k_random.csv', index = False)

Unable to get last tweet timestamp


100 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


200 users have been gathered


Unable to get last tweet timestamp


300 users have been gathered
400 users have been gathered


Unable to get last tweet timestamp


500 users have been gathered


Unable to get last tweet timestamp


600 users have been gathered


Unable to get last tweet timestamp


700 users have been gathered
800 users have been gathered


Unable to get last tweet timestamp


900 users have been gathered


Unable to get last tweet timestamp


1000 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


1100 users have been gathered
1200 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp
Unable to get last tweet timestamp


1300 users have been gathered
1400 users have been gathered
1500 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


1600 users have been gathered
1700 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


1800 users have been gathered
1900 users have been gathered
2000 users have been gathered
2100 users have been gathered
2200 users have been gathered
2300 users have been gathered
2400 users have been gathered
2500 users have been gathered
2600 users have been gathered
2700 users have been gathered
2800 users have been gathered


Unable to get last tweet timestamp


2900 users have been gathered
3000 users have been gathered


Unable to get last tweet timestamp


3100 users have been gathered
3200 users have been gathered
3300 users have been gathered


Unable to get last tweet timestamp


3400 users have been gathered
3500 users have been gathered


Unable to get last tweet timestamp


3600 users have been gathered
3700 users have been gathered


Unable to get last tweet timestamp


3800 users have been gathered
3900 users have been gathered


Unable to get last tweet timestamp


4000 users have been gathered
4100 users have been gathered
4200 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


4300 users have been gathered


Unable to get last tweet timestamp


4400 users have been gathered
4500 users have been gathered
4600 users have been gathered
4700 users have been gathered
4800 users have been gathered


Unable to get last tweet timestamp


4900 users have been gathered
5000 users have been gathered


Unable to get last tweet timestamp


5100 users have been gathered
5200 users have been gathered
5400 users have been gathered
5500 users have been gathered
5600 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


5700 users have been gathered
5800 users have been gathered
5900 users have been gathered


Unable to get last tweet timestamp


6000 users have been gathered
6100 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp
Unable to get last tweet timestamp


6200 users have been gathered
6300 users have been gathered


Unable to get last tweet timestamp


6400 users have been gathered


Unable to get last tweet timestamp


6500 users have been gathered
6600 users have been gathered


Unable to get last tweet timestamp


6700 users have been gathered
6800 users have been gathered
6900 users have been gathered
7000 users have been gathered


Unable to get last tweet timestamp


7100 users have been gathered


Unable to get last tweet timestamp


7200 users have been gathered


Unable to get last tweet timestamp


7300 users have been gathered
7400 users have been gathered
7500 users have been gathered
7600 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


7700 users have been gathered


Unable to get last tweet timestamp


7800 users have been gathered
7900 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


8000 users have been gathered
8100 users have been gathered
8200 users have been gathered
8300 users have been gathered


Unable to get last tweet timestamp


8400 users have been gathered


Unable to get last tweet timestamp


8500 users have been gathered


Unable to get last tweet timestamp


8600 users have been gathered


Unable to get last tweet timestamp


8700 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


8800 users have been gathered


Unable to get last tweet timestamp


8900 users have been gathered


Unable to get last tweet timestamp


9000 users have been gathered
9100 users have been gathered
9200 users have been gathered
9300 users have been gathered
9400 users have been gathered
9500 users have been gathered
9600 users have been gathered
9700 users have been gathered
9800 users have been gathered
9900 users have been gathered


Unable to get last tweet timestamp


10000 users have been gathered
10100 users have been gathered
10200 users have been gathered


Unable to get last tweet timestamp


10300 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


10400 users have been gathered
10500 users have been gathered
10600 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


10700 users have been gathered


Unable to get last tweet timestamp


10800 users have been gathered
10900 users have been gathered
11000 users have been gathered
11100 users have been gathered
11200 users have been gathered
11300 users have been gathered


Unable to get last tweet timestamp


11400 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


11500 users have been gathered


Unable to get last tweet timestamp


11600 users have been gathered
11700 users have been gathered
11800 users have been gathered
11900 users have been gathered
12000 users have been gathered
12100 users have been gathered


Unable to get last tweet timestamp


12200 users have been gathered


Unable to get last tweet timestamp


12300 users have been gathered


Unable to get last tweet timestamp


12400 users have been gathered
12500 users have been gathered
12600 users have been gathered


Unable to get last tweet timestamp


12700 users have been gathered
12800 users have been gathered


Unable to get last tweet timestamp


12900 users have been gathered
13000 users have been gathered
13100 users have been gathered
13200 users have been gathered
13300 users have been gathered


Unable to get last tweet timestamp


13400 users have been gathered
13500 users have been gathered


Unable to get last tweet timestamp


13600 users have been gathered


Unable to get last tweet timestamp


13700 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


13900 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


14000 users have been gathered


Unable to get last tweet timestamp


14200 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp
Unable to get last tweet timestamp


14300 users have been gathered
14400 users have been gathered
14500 users have been gathered
14600 users have been gathered
14700 users have been gathered
14800 users have been gathered


Unable to get last tweet timestamp


14900 users have been gathered
15000 users have been gathered
15100 users have been gathered
15200 users have been gathered


Unable to get last tweet timestamp


15300 users have been gathered
15400 users have been gathered
15500 users have been gathered
15600 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp
Unable to get last tweet timestamp


15700 users have been gathered
15800 users have been gathered
15900 users have been gathered
16000 users have been gathered


Unable to get last tweet timestamp


16100 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp
Unable to get last tweet timestamp


16200 users have been gathered


Unable to get last tweet timestamp


16300 users have been gathered
16400 users have been gathered


Unable to get last tweet timestamp


16500 users have been gathered


Unable to get last tweet timestamp


16600 users have been gathered


Unable to get last tweet timestamp


16800 users have been gathered


Unable to get last tweet timestamp


16900 users have been gathered
17000 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


17100 users have been gathered
17200 users have been gathered
17300 users have been gathered
17400 users have been gathered


Unable to get last tweet timestamp


17500 users have been gathered
17600 users have been gathered
17700 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


17900 users have been gathered
18100 users have been gathered
18200 users have been gathered


Unable to get last tweet timestamp


18300 users have been gathered


Unable to get last tweet timestamp


18400 users have been gathered


Unable to get last tweet timestamp


18500 users have been gathered


Unable to get last tweet timestamp


18600 users have been gathered
18700 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp
Unable to get last tweet timestamp


18800 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


18900 users have been gathered
19000 users have been gathered
19100 users have been gathered
19200 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


19300 users have been gathered
19400 users have been gathered
19500 users have been gathered


Unable to get last tweet timestamp


19600 users have been gathered


Unable to get last tweet timestamp


19700 users have been gathered


Unable to get last tweet timestamp


19800 users have been gathered
19900 users have been gathered
20000 users have been gathered


Unable to get last tweet timestamp


20100 users have been gathered
20200 users have been gathered
20300 users have been gathered
20400 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


20500 users have been gathered


Unable to get last tweet timestamp


20600 users have been gathered


Unable to get last tweet timestamp


20700 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


20800 users have been gathered


Unable to get last tweet timestamp


20900 users have been gathered


Unable to get last tweet timestamp


21100 users have been gathered
21200 users have been gathered


Unable to get last tweet timestamp


21300 users have been gathered


Unable to get last tweet timestamp


21400 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


21500 users have been gathered
21600 users have been gathered
21700 users have been gathered


Unable to get last tweet timestamp


21800 users have been gathered
21900 users have been gathered


Unable to get last tweet timestamp


22000 users have been gathered
22100 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


22200 users have been gathered
22300 users have been gathered


Unable to get last tweet timestamp


22400 users have been gathered


Unable to get last tweet timestamp


22500 users have been gathered
22600 users have been gathered
22700 users have been gathered


Unable to get last tweet timestamp


22800 users have been gathered
22900 users have been gathered


Unable to get last tweet timestamp


23000 users have been gathered
23100 users have been gathered
23200 users have been gathered
23300 users have been gathered


Unable to get last tweet timestamp


23400 users have been gathered


Unable to get last tweet timestamp


23500 users have been gathered


Unable to get last tweet timestamp


23600 users have been gathered
23700 users have been gathered
23800 users have been gathered


Unable to get last tweet timestamp


23900 users have been gathered


Unable to get last tweet timestamp


24000 users have been gathered
24100 users have been gathered
24200 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


24300 users have been gathered
24400 users have been gathered


Unable to get last tweet timestamp


24500 users have been gathered
24600 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


24700 users have been gathered


Unable to get last tweet timestamp


24900 users have been gathered


Unable to get last tweet timestamp


25000 users have been gathered
25100 users have been gathered
25200 users have been gathered


Unable to get last tweet timestamp


25300 users have been gathered
25400 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp
Unable to get last tweet timestamp


25500 users have been gathered
25600 users have been gathered
25800 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp
Unable to get last tweet timestamp
Unable to get last tweet timestamp


25900 users have been gathered
26100 users have been gathered
26200 users have been gathered


Unable to get last tweet timestamp


26300 users have been gathered


Unable to get last tweet timestamp


26400 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


26500 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


26600 users have been gathered
26700 users have been gathered
26800 users have been gathered
26900 users have been gathered


Unable to get last tweet timestamp


27000 users have been gathered
27100 users have been gathered
27200 users have been gathered
27300 users have been gathered


Unable to get last tweet timestamp


27400 users have been gathered


Unable to get last tweet timestamp


27500 users have been gathered


Unable to get last tweet timestamp


27600 users have been gathered
27700 users have been gathered
27800 users have been gathered
27900 users have been gathered
28000 users have been gathered


Unable to get last tweet timestamp


28100 users have been gathered
28200 users have been gathered
28300 users have been gathered
28400 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


28500 users have been gathered
28600 users have been gathered
28700 users have been gathered
28800 users have been gathered


Unable to get last tweet timestamp


28900 users have been gathered
29000 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


29100 users have been gathered


Unable to get last tweet timestamp


29200 users have been gathered
29300 users have been gathered


Unable to get last tweet timestamp


29400 users have been gathered


Unable to get last tweet timestamp


29500 users have been gathered


Unable to get last tweet timestamp


29600 users have been gathered
29700 users have been gathered
29800 users have been gathered
29900 users have been gathered
30000 users have been gathered


Unable to get last tweet timestamp


30100 users have been gathered


Unable to get last tweet timestamp


30200 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


30300 users have been gathered
30400 users have been gathered
30500 users have been gathered
30600 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


30700 users have been gathered


Unable to get last tweet timestamp


30800 users have been gathered
30900 users have been gathered
31000 users have been gathered
31100 users have been gathered
31200 users have been gathered
31300 users have been gathered


Unable to get last tweet timestamp


31400 users have been gathered
31500 users have been gathered


Unable to get last tweet timestamp


31600 users have been gathered


Unable to get last tweet timestamp


31700 users have been gathered


Unable to get last tweet timestamp


31800 users have been gathered


Unable to get last tweet timestamp


31900 users have been gathered


Unable to get last tweet timestamp


32000 users have been gathered
32100 users have been gathered
32200 users have been gathered


Unable to get last tweet timestamp


32300 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


32400 users have been gathered
32500 users have been gathered
32600 users have been gathered
32700 users have been gathered
32800 users have been gathered
32900 users have been gathered
33000 users have been gathered


Unable to get last tweet timestamp
Unable to get last tweet timestamp


33100 users have been gathered
33200 users have been gathered


Unable to get last tweet timestamp


Now that we have user data, we'll do some more EDA then get to modeling.