The goal of this notebook is to scrape data from the SHERP Twitter list. Specifically, I want to go through each user on that list, and collect publicly available information (such as followers, retweets, etc.). I will then analyze that data to see which users stand out, and whether there are disparities in the data.

In [30]:
import csv
import tweepy

In [19]:
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

In [20]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)

RT @ayeletw: In July, a stranger broke into our home and attempted to assault me. I wrote an essay about the incident and about the dilemma…
Honestly a mantra we should all live by
Lots of scrubs this week, eh?
OMG I shouldn't have tweeted
Excellent graphic: Where Trump went (and who he was with) leading up to his coronavirus diagnosis https://t.co/B7KxZvX0PE via @politico
.@tomfriedman making so much sense on @CNN.
"a case study in irresponsibility and mismanagement" https://t.co/uOWVqKKCp9
Joan Marks saw a glaring need for counselors to help families face the complex emotional, ethical and legal issues… https://t.co/2PDRQhpFmU
Conspiracy theories like QAnon blossom in trying times, but today they are supercharged by the tools of our hyperco… https://t.co/vAhTcCGosE
“Follow the science,” Chris Wallace said. “If I could say one thing to all of the people out there watching: Forget… https://t.co/TPQrW3mJ5b
RT @kenvogel: NOTABLE: The lead reporter on this @nytimes story chronicling Covid

In [21]:
slug = 'SHERP-Tweeters'
list_owner = 'danfagin'

**Code below is adapted from: https://gist.github.com/macloo/5c69cdf5294fa97eb41d6ad950233cee

In [43]:
def get_list_members(api, owner, slug):
    members = []
    # without this you only get the first 20 list members
    for page in tweepy.Cursor(api.list_members, owner, slug).items():
        members.append(page)
    # create a list containing all usernames
    return [ m.screen_name for m in members ]

In [44]:
# create new CSV file and add column headings
def create_csv(filename, usernames):
    csvfile = open(filename, 'w')
    c = csv.writer(csvfile)
    # write the header row for CSV file
    c.writerow( [ "name",
                "display_name",
                "bio",
                "favorites_count",
                "followers_count",
                "list_memberships",
                "tweets_retweets_count",
                "following_count",
                "acct_created",
                "location" ] )
    # add each member to the csv
    for name in usernames:
        user_info = get_userinfo(name)
        c.writerow( user_info )
    # close and save the CSV
    csvfile.close()

In [45]:
def get_userinfo(name):
    # get all user data via a Tweepy API call
    user = api.get_user(screen_name = name)
    # create row data as a list
    # I am collecting info based on Twitter API: https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview/user-object
    user_info = [ name.encode('utf-8'),
                user.name.encode('utf-8'),
                user.description.encode('utf-8'),
                user.favourites_count,
                user.followers_count,
                user.listed_count,
                user.statuses_count,
                user.friends_count,
                user.created_at,
user.location.encode('utf-8') ]
    # send that one row back
    return user_info

In [47]:
def main():
    # provide name for new CSV
    filename = "SHERP_tweeter_data.csv"
    # create list of all members of the Twitter list
    usernames = get_list_members(api, list_owner, slug)
    # create new CSV and fill it
    create_csv(filename, usernames)
    # tell us how many we got
    print("Number of rows should be %d, plus the header row." % len(usernames))

In [48]:
if __name__ == '__main__':
    main()

Number of rows should be 268, plus the header row.
