# A Cleaner Guide to Twitter 🙄

I blasted through the Twitter content the other day, and my code had a few issues I didn't notice until I was already talking. Here is a better version of working with user data from the REST API that should be easier to understand. 

John 

In [1]:
import tweepy 
import config_twitter
import pandas as pd
import json

In [2]:
accounts = pd.Series(['johnmclevey', '@point_mutation', 'I_am_a_fake_account'])

Let's retrieve the data from twitter and store it. That way we don't need to keep making API requests and risk rate limiting. We can re-run this code when we want to update our data (but let's not overwrite the old data when we do that!) 

In [3]:
for each in accounts:
    print(each)

Jujou_24
@liamerox
I_am_a_fake_account


Now let's authenticate with the Twitter API. 

In [4]:
auth = tweepy.OAuthHandler(config_twitter.API_KEY, config_twitter.API_TOKEN)
auth.set_access_token(config_twitter.ACCESS_TOKEN, config_twitter.ACCESS_TOKEN_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)

Then we can query the API for each user and store their data in a JSON file. 

In [5]:
def clean_username(user_name):
    if user_name.startswith('@'):
        return user_name[1:].lower()
    else:
        return user_name.lower()

In [6]:
accounts = [clean_username(user) for user in accounts]
accounts

['jujou_24', 'liamerox', 'i_am_a_fake_account']

We now have our user account names ready for our Twitter query. Invalid accounts will still throw an error. 

In [7]:
collected = []
not_collected = []
status = []

for user in accounts:
    try:
        user_data = api.get_user(user)
        collected.append(user_data)
        print('Finished collecting user data for ***{}***. Added account name to collected list and updated status list.\n'.format(user))
        status.append(True)
    except:
        print("Encountered a problem with ***{}***. Added account name to not_collected list and updata status list. Moving on through the list.\n".format(user))
        not_collected.append(user)
        status.append(False)

Finished collecting user data for ***jujou_24***. Added account name to collected list and updated status list.

Finished collecting user data for ***liamerox***. Added account name to collected list and updated status list.

Encountered a problem with ***i_am_a_fake_account***. Added account name to not_collected list and updata status list. Moving on through the list.



Now that we have collected the data from these users, we don not need to query the Twitter API *unless we want to update our dataset*. We can store this data and speed up our analysis workflow by avoiding waiting on Twitter to send us new data all the time. 

In [8]:
collected[0]

User(_api=<tweepy.api.API object at 0x7f88c3b43ac8>, _json={'id': 216058101, 'id_str': '216058101', 'name': 'Julia', 'screen_name': 'Jujou_24', 'location': 'Suisse', 'profile_location': None, 'description': 'Woman | Mother | Daughter | Sister | Friend', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 27, 'friends_count': 288, 'listed_count': 0, 'created_at': 'Mon Nov 15 17:48:24 +0000 2010', 'favourites_count': 10614, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 1074, 'lang': None, 'status': {'created_at': 'Fri Mar 06 09:37:53 +0000 2020', 'id': 1235862154521112577, 'id_str': '1235862154521112577', 'text': 'RT @odairannies: the way anastasia was my childhood icon https://t.co/KwUVpfZ2Ka', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'odairannies', 'name': 'ellie', 'id': 892544486206734336, 'id_str': '892544486206734336', 'indices': [3,

In [9]:
len(collected)

2

In [10]:
len(not_collected)

1

In [11]:
len(status)

3

In [12]:
status

[True, True, False]

Let's define a function that gets the data we want from each user whose data is stored in the `collected` list. 

In [13]:
def get_the_interesting_shit(user_account):
    # here is where we will put our user data 
    user_dict = {}
    
    user_dict['unique_id_number'] = user_account.id
    user_dict['screen_name'] = user_account.screen_name
    user_dict['description'] = user_account.description
    user_dict['location'] = user_account.location
    user_dict['followers_num'] = user_account.followers_count
    user_dict['friends_num'] = user_account.friends_count
    user_dict['protection_boolean'] = user_account.protected
    
    return user_dict    

In [14]:
the_user_data = []

for account in collected:
    the_user_data.append(get_the_interesting_shit(account))

We now have a list of dicts. That can go straight into a `dataframe`. 

In [15]:
the_user_data

[{'unique_id_number': 216058101,
  'screen_name': 'Jujou_24',
  'description': 'Woman | Mother | Daughter | Sister | Friend',
  'location': 'Suisse',
  'followers_num': 27,
  'friends_num': 288,
  'protection_boolean': False},
 {'unique_id_number': 1912214276,
  'screen_name': 'liamerox',
  'description': '',
  'location': 'Gland bord du lac LEMAN ',
  'followers_num': 16,
  'friends_num': 92,
  'protection_boolean': False}]

In [16]:
df = pd.DataFrame(the_user_data)
df

Unnamed: 0,unique_id_number,screen_name,description,location,followers_num,friends_num,protection_boolean
0,216058101,Jujou_24,Woman | Mother | Daughter | Sister | Friend,Suisse,27,288,False
1,1912214276,liamerox,,Gland bord du lac LEMAN,16,92,False


In [17]:
df.to_csv('data/users.csv', index=False)