# Check (That Tweet) Yo Self 
## Prioritizing Tweets to Fact Check
###### Part 9: Application

For the last part of this project, we're going to build an app so others can go through this process themselves and find tweets to fact check.

In [1]:
from tweetscrape.search_tweets import TweetScrapperSearch
from tweetscrape.users_scrape import TweetScrapperUser
import pandas as pd
import numpy as np
import pickle
import time

First, gather the desired tweets based on keyword, date, and number of total tweets to pull:

In [2]:
def get_tweets(word, date, number):
   
    #gather the tweets and export to a csv
    tweet_scrapper = TweetScrapperSearch(search_all = word, search_till_date= date, num_tweets=number, tweet_dump_path=f'../data/{date}_{word}_{number}tweets.csv', tweet_dump_format='csv')
    tweet_count, tweet_id, tweet_time, dump_path = tweet_scrapper.get_search_tweets()
    
    #read the csv back in as a dataframe
    tweets = pd.read_csv(f'../data/{date}_{word}_{number}tweets.csv')
    tweets = tweets.drop_duplicates()
    
    tweets = tweets.reset_index()
    
    return tweets

Next, go back and grab the user info. Add to the DataFrame:

In [3]:
def get_user(df):
    count = 0
    df['user_bio'] = 0
    df['user_location'] = 0
    df['user_url'] = 0
    df['user_tweets'] = 0
    df['user_following'] = 0
    df['user_followers'] = 0
    df['user_favorites'] = 0
    for user in df['author']:
        count += 1
        try:
            ts = TweetScrapperUser(user)
            user_info = ts.get_profile_info()
            df.loc[df['author'] == user, 'user_bio'] = user_info['bio']
            df.loc[df['author'] == user,'user_location'] = user_info['location']
            df.loc[df['author'] == user,'user_url'] = user_info['url']
            df.loc[df['author'] == user,'user_tweets'] = user_info['tweets']
            df.loc[df['author'] == user,'user_following'] = user_info['following']
            df.loc[df['author'] == user,'user_followers'] = user_info['followers']
            df.loc[df['author'] == user,'user_favorites'] = user_info['favorites']
            
        except:
            pass
    df.to_csv(f'../data/tweets_users_app.csv', index=False)
    return df

Below are a few helpful functions that engineer features used for modeling.

In [4]:
def get_ratio(followers, following):
    if following == 0:
        following = 1
    elif followers == 0:
        return 0
    else:
        return round(int(followers) / int(following), 2)

In [5]:
def make_url(author, idd):
    return f'https://twitter.com/{author}/status/{idd}'

In [6]:
def change_time(x):
    x = x / 1000
    return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(x))

In [7]:
def to_check(x):
    if (x == 0) or (x == 3) or (x == 1):
        return 'Check this tweet!'
    else:
        return 'Not a priority.'

After all the features are created, scale the data and generate predictions.

In [8]:
def find_clusters(df):
    knn = pickle.load(open('../models/knn.pkl', 'rb'))
    ss = pickle.load(open('../models/standardscaler.pkl', 'rb'))

    df['has_bio'] = df['user_bio'].notna().astype(int)
    df['has_location'] = df['user_bio'].notna().astype(int)
    df['has_url'] = df['user_bio'].notna().astype(int)
    df['ratio'] = [get_ratio(m, n) for m, n in zip(df['user_followers'], df['user_following'])]

    to_cluster = df[['user_tweets', 'user_following', 'user_followers',
             'ratio', 'has_url', 'has_location', 'has_bio']]

    to_cluster = to_cluster.fillna(0)
    z_cluster = ss.transform(to_cluster)
    df['groups'] = knn.predict(z_cluster)
    
    df['to_check'] = df['groups'].apply(to_check)
    
    df['time'] = df['time'].apply(change_time)
    df['tweet_url'] = [make_url(author, idd) for author, idd in zip(df['author'], df['id'])]

    df = df.sort_values('to_check')
    df.index = np.arange(1, len(df) + 1)
    columns = ['time', 'author', 'text', 'reply_count', 'favorite_count', 'retweet_count', 'tweet_url', 'groups','to_check']

    return df[columns]

Final function to put it all together:

In [9]:
def put_it_together(word, date, number):
    tweets = get_tweets(word, date, number)
    users = get_user(tweets)
    to_check = find_clusters(users)
    return to_check

Sample call of the function with new inputs:

In [10]:
df = put_it_together('vaccine', '2020-05-12',  50)
df

Unnamed: 0,time,author,text,reply_count,favorite_count,retweet_count,tweet_url,groups,to_check
1,2020-05-11 16:58:06,taylorfd987,"WHEN IT COMES TO VACCINES,,,,,,,MY BODY MY CHO...",0,0,0,https://twitter.com/taylorfd987/status/1259996...,1,Check this tweet!
2,2020-05-11 16:58:50,TreyFlint,Will directives be enforced? A bandana is not ...,0,0,0,https://twitter.com/TreyFlint/status/125999641...,1,Check this tweet!
3,2020-05-11 16:57:46,Johnmcs90628020,Trump coronavirus vaccine goal 'amazingly ambi...,0,0,0,https://twitter.com/Johnmcs90628020/status/125...,1,Check this tweet!
4,2020-05-11 16:57:49,Renee62540473,No to contact tracing!!! No to House Bill 6666...,2,1,0,https://twitter.com/Renee62540473/status/12599...,1,Check this tweet!
5,2020-05-11 16:59:28,1984patriot,Only an idiot would take a Gates backed vaccin...,0,1,0,https://twitter.com/1984patriot/status/1259996...,1,Check this tweet!
6,2020-05-11 16:58:16,Heidi559Heidi,I think the words you're looking for are vacci...,0,0,0,https://twitter.com/Heidi559Heidi/status/12599...,1,Check this tweet!
7,2020-05-11 16:58:02,_don_collins,Good thing Trump and Pence already have had th...,0,0,0,https://twitter.com/_don_collins/status/125999...,1,Check this tweet!
8,2020-05-11 16:58:37,TUSD_Dad,Ok Street Rat.. once you’ve tested everybody. ...,1,2,0,https://twitter.com/TUSD_Dad/status/1259996364...,2,Not a priority.
9,2020-05-11 16:58:30,majordemo,COVID-19 cases won’t peak until the government...,0,0,0,https://twitter.com/majordemo/status/125999633...,2,Not a priority.
10,2020-05-11 16:58:29,TeresaA123,"Completely different family of viruses, comple...",0,2,0,https://twitter.com/TeresaA123/status/12599963...,2,Not a priority.
