# ReThink Media Twitter API: Tutorial and Examples

This notebook will provide a user manual and example use cases for using ReThink Media's Twitter API functions. The functions in this notebook will provide the capabilities to:
- Search Tweets relevant to a query, over different time periods
- Save Tweets and Tweet metadata to a .csv file for later reference and use
- Create wordclouds for frequent keywords and hashtags
- Create plots of Tweet counts over time, with adjustable titles and axes

As an example use case for these functions, this notebook will compare the discussions around the coming out of two transgender celebrities: Caitlin Jenner and Elliot Page.

## Defining Functions

The first part of this notebook is dedicated to defining and explaining the functions mentioned above, with the example use case to follow.

### Authentication & Utility Functions

These functions are utility functions that are embedded within the main ones, and must be initialized before the others are used. Run the cells below before running the other functions.

**IMPORTANT NOTE:** The Twitter API requires API keys and other authentication tokens in order to function properly. A user must have a Twitter Developer account with these keys available in order to use the functions in this notebook. If you have these keys available, create a text file named `.env` in the home folder for your notebook environment with the following format:

```
API_KEY="your_api_key"
API_KEY_SECRET="your_secret_api_key"
BEARER_TOKEN="your_bearer_token"
ACCESS_TOKEN="your_access_token"
ACCESS_SECRET="your_secret_access_token"
```

In [7]:
# function to initialize Twitter API v1.1 instance (for 30-day and full archive search)
def init_api_1():
    
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # Twitter API authentication
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    
    # instantiating Twitter API v1.1 reference
    api_1 = tweepy.API(auth, wait_on_rate_limit=True)
    
    return api_1

In [33]:
# function to initialize Twitter API v2 instance (for 7-day search)
def init_api_2():
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # instantiating Twitter API v2 reference
    api_2 = tweepy.Client(bearer_token=bearer_token,
                         consumer_key=consumer_key,
                         consumer_secret=consumer_secret,
                         access_token=access_token,
                         access_token_secret=access_secret,
                         wait_on_rate_limit=True)
    
    return api_2

In [3]:
# function to parse Twitter API v2 response into a DataFrame of Tweet data
def tweet_df(df, response, tweet_fields):
    
    users = response.includes['users']
    user_data = {user['id']: [user['public_metrics']['followers_count'], user['verified']] for user in users}
        
    # looping through each Tweet in response, parsing data
    for i in range(len(response.data)):
        tweet = response.data[i]
        tweet_id = tweet.id
        tweet_data = {}
        for field in tweet_fields:
            if tweet[field]:
                tweet_data[field] = tweet[field]
                
                # extracting hashtags from "entities" field and adding it as its own column
                if field == "entities":
                    try:
                        hashtag_data = tweet[field]['hashtags']
                        hashtags = [hashtag['tag'] for hashtag in hashtag_data]
                        tweet_data['entities_hashtags'] = hashtags
                    except KeyError:
                        tweet_data['entities_hashtags'] = None
                
                # separating metrics from "public_metrics" field and adding them as their own column
                if field == "public_metrics":
                    metrics = list(tweet[field].keys())
                    for metric in metrics:
                        tweet_data[metric] = tweet[field][metric]
                
            else:
                tweet_data[field] = None
                if field == "entities":
                    tweet_data['entities_hashtags'] = None
        
        # adding user data to DataFrame
        user = user_data[tweet['author_id']]
        tweet_data['followers_count'] = user[0]
        tweet_data['verified'] = user[1]
        
        df.loc[tweet_id] = tweet_data
    
    return df

### Tweet Search Functions

The Twitter API has different limits on how many API requests a user can make and how many Tweets they can receive, depending on how far back the user wants to search. For this reason, there are three different Tweet search functions, and the user should choose the function that best fits their use case:

- `search_7()`: Search Tweets within the past 7 days. Unlimited API requests, 500,000 Tweets per month.
- `search_30()`: Search Tweets within the past 30 days. 250 API requests, 25,000 Tweets per month.
- `search_full()`: Search Tweets from the full archive. 50 API requests, 5,000 Tweets per month.

The Twitter API also has a limit of 100 API requests per 15-minute interval, regardless of which function is used. If the quota runs out, the functions will wait until the time limit resets, and then continue collecting Tweets.

The arguments for these functions are:
- `query`: The query to search the Twitter API for
- `start_date`: The date to start the search (default `None`). If `None`, the function will default to 7 days ago.
- `end_date`: The date to end the search (default `None`). If `None`, the function will default to today.
- `max_results`: The maximum amount of Tweets to return in the DataFrame (default 20).
- `write_csv`: Boolean, whether to save the DataFrame as a csv file or not. Default `False`.
- `filename`: Filename for the csv if `write_csv` is `True`. Default name is `search_7.csv`, `search_30.csv`, or `search_full.csv`, depending on the function used.

In [4]:
# function to retrieve Tweets from the past 7 days relevant to a query
def search_7(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_7.csv"):
    
    # initializing API v1.1 instance
    api_2 = init_api_2()
    
    # parsing dates passed into function
    from dateutil import parser
    from datetime import datetime
    if start_date:
        start_date = parser.parse(start_date)
        start_date = start_date.strftime("%Y%m%d%H%M")
    if end_date:
        end_date = parser.parse(end_date)
        end_date = end_date.strftime("%Y%m%d%H%M")
    
    # setting Tweet and user data to be included in response
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    user_fields = ["public_metrics", "verified"]
    
    # initializing variables for API calls and DataFrame for Tweet data
    import pandas as pd
    next_token = None
    num_tweets = 0
    tweets = pd.DataFrame(columns=tweet_fields+['followers_count', 'verified']+
                          ['entities_hashtags','retweet_count','reply_count','like_count','quote_count'])
    tweets.index.name = "Tweet ID"
    
    # making my own pagination loop to further examine the rate limit
    num_loops = 0
    while num_tweets < max_results:
        
        # the API only retrieves between 10 and 100 Tweets per call
        # NOTE: number of API results isn't consistent. max_results=100 doesn't guarantee 100 Tweets in response
        if max_results - num_tweets >= 100:
            num_results = 100
        else:
            num_results = max_results - num_tweets if max_results - num_tweets > 10 else 10
        
        # calling API and searching Tweets over past 7 days
        response = api_2.search_recent_tweets(f"{query} lang:en", 
                                              start_time=start_date,
                                              end_time=end_date,
                                              max_results=num_results,
                                              next_token=next_token,
                                              tweet_fields=tweet_fields,
                                              expansions='author_id',
                                              user_fields=user_fields)
        
        # setting variables for the next loop
        try:
            next_token = response[3]['next_token']
        except KeyError:
            next_token = None
        num_tweets += len(response.data)
        num_loops += 1
        
        # adding Tweet data to DataFrame
        tweets = tweet_df(tweets, response, tweet_fields)
        
    # dropping "public_metrics" since all the values are unpacked, adding "total_engagements"
    tweets.drop('public_metrics', axis=1, inplace=True)
    total_engagements = tweets["retweet_count"] + tweets["reply_count"] + tweets["like_count"] + tweets["quote_count"]
    tweets["total_engagements"] = total_engagements
        
    # writing Tweet DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [48]:
# function to search Tweets within the past 30 days
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_30(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_30.csv"):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    from datetime import datetime
    if start_date:
        start_date = parser.parse(start_date)
        start_date = start_date.strftime("%Y%m%d%H%M")
    if end_date:
        end_date = parser.parse(end_date)
        end_date = end_date.strftime("%Y%m%d%H%M")
    
    # retrieving Tweets from the past 30 days relevant to query using tweepy's pagination function
    import tweepy
    response_1 = tweepy.Cursor(api_1.search_30_day,
                               label="30day",
                               query=f"{query} lang:en",
                               fromDate=start_date,
                               toDate=end_date,
                               maxResults=100
                              ).items(max_results)
    
    # gathering Tweet ID's in a list
    print(response_1)
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # setting Tweet data to be included in response_2
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    user_fields = ["public_metrics", "verified"]
    
    # initializing variables for API v2 calls and DataFrame for Tweet data
    import pandas as pd
    num_tweets = 0
    tweets = pd.DataFrame(columns=tweet_fields+['followers_count', 'verified']+
                          ['entities_hashtags','retweet_count','reply_count','like_count','quote_count'])    
    tweets.index.name = "Tweet ID"
    
    # loop to retrieve Tweets from ID's through API v2, 100 at a time
    api_2 = init_api_2()
    
    while num_tweets < max_results:
        # slicing tweet_ids since API v2 get_tweets only takes max 100 ID's per request
        try:
            slice_ids = tweet_ids[num_tweets:num_tweets+100]
        except IndexError:
            slice_ids = tweet_ids[num_tweets:]
        if len(slice_ids) == 0:
            break

        # retrieving Tweet data from API v2 and adding to DataFrame
        response_2 = api_2.get_tweets(slice_ids, tweet_fields=tweet_fields, 
                                      expansions='author_id', user_fields=user_fields)
        tweets = tweet_df(tweets, response_2, tweet_fields)
        num_tweets += len(response_2.data)
    
    # dropping "public_metrics" since all the values are unpacked, adding "total_engagements"
    tweets.drop('public_metrics', axis=1, inplace=True)
    total_engagements = tweets["retweet_count"] + tweets["reply_count"] + tweets["like_count"] + tweets["quote_count"]
    tweets["total_engagements"] = total_engagements
    
    # writing Tweet DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [49]:
# function to search Tweets within the full Tweet archive
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_full(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_full.csv"):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    from datetime import datetime
    if start_date:
        start_date = parser.parse(start_date)
        start_date = start_date.strftime("%Y%m%d%H%M")
    if end_date:
        end_date = parser.parse(end_date)
        end_date = end_date.strftime("%Y%m%d%H%M")
    
    # retrieving Tweets from the full tweet archive relevant to query using tweepy's pagination function
    import tweepy
    response_1 = tweepy.Cursor(api_1.search_full_archive,
                               label="full",
                               query=f"{query} lang:en",
                               fromDate=start_date,
                               toDate=end_date,
                               maxResults=100
                              ).items(max_results)
    
    # gathering Tweet ID's in a list
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # setting Tweet data to be included in response
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    user_fields = ["public_metrics", "verified"]
    
    # initializing variables for API calls and DataFrame for Tweet data
    import pandas as pd
    tweets = pd.DataFrame(columns=tweet_fields+["followers_count", "verified"]+
                          ['entities_hashtags','retweet_count','reply_count','like_count','quote_count'])
    tweets.index.name = "Tweet ID"
    
    # loop to retrieve Tweets from ID's through API v2, 100 at a time
    api_2 = init_api_2()
    num_tweets = 0
    while num_tweets < max_results:
        # slicing tweet_ids since API v2 get_tweets only takes max 100 ID's per request
        try:
            slice_ids = tweet_ids[num_tweets:num_tweets+100]
        except IndexError:
            slice_ids = tweet_ids[num_tweets:]
        if len(slice_ids) == 0:
            break

        # retrieving Tweet data from API v2 and adding to DataFrame
        response_2 = api_2.get_tweets(slice_ids, tweet_fields=tweet_fields,
                                     expansions='author_id', user_fields=user_fields)
        tweets = tweet_df(tweets, response_2, tweet_fields)
        num_tweets += len(response_2.data)
    
    # dropping "public_metrics" since all the values are unpacked, adding "total_engagements"
    tweets.drop('public_metrics', axis=1, inplace=True)
    total_engagements = tweets["retweet_count"] + tweets["reply_count"] + tweets["like_count"] + tweets["quote_count"]
    tweets["total_engagements"] = total_engagements
    
    # writing Tweets DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

### Wordclouds

This function creates wordclouds for frequent words and hashtags in Tweet data. To avoid making any unnecessary API calls, this function takes the DataFrame created from the search functions as an input. The arguments for this function are:

- `df`: DataFrame of Tweet data, created from one of the Tweet search functions defined above.
- `query`: The query used to create `df`. If passed into the function, `query` is added to the stop words for the word cloud, so they aren't added to the cloud.
- `save_imgs`: Boolean, whether to save the images to a file or not. The filenames will be `wordcloud.png` and `hashtags.png` in the current working directory.

In [7]:
def word_cloud(df, query=None, save_imgs=False):
    # combining DataFrame text column into one long string, doing some initial pre-processing
    import pandas as pd
    tweet_text = " ".join(df["text"])
    tweet_text = tweet_text.lower()
    tweet_text = tweet_text.replace("\n", " ")
    
    # splitting string into set of words, removing hashtags, usernames, links, and retweet indicator
    word_list = set(tweet_text.split(" "))
    hash_list = {word for word in word_list if word.startswith("#")}
    user_list = {word for word in word_list if word.startswith("@")}
    link_list = {word for word in word_list if word.startswith("http")}
    word_list = {word for word in word_list if word not in hash_list.union(user_list, link_list)}
    word_list = {word for word in word_list if word != "rt"}
    
    # using nltk tokenizer to further pre-process text, removing non-alpha words
    from nltk.tokenize import word_tokenize
    import nltk
    nltk.download('punkt')
    tweet_text = " ".join(word_list)
    word_list = word_tokenize(tweet_text)
    word_list = {word for word in word_list if word.isalpha()}
    
    # joining list of words into final cleaned string
    tweet_text = " ".join(word_list)
    
    # generating word cloud
    from wordcloud import WordCloud, STOPWORDS
    import matplotlib.pyplot as plt

    stopwords = set(STOPWORDS)
    
    # adding words from query to stop words so they don't show up in the word cloud
    if query:
        stopwords.update(query.split())

    # word cloud for text
    words_fig = plt.figure()
    word_cloud = WordCloud(background_color="white", width=3000, height=2000, max_font_size=500,
                           max_words=100, prefer_horizontal=1.0, stopwords=stopwords)
    word_cloud.generate(tweet_text)
    plt.imshow(word_cloud)
    plt.axis("off")
    plt.title("Frequent keywords in Tweets", fontsize=15)
    plt.show()
    if save_imgs:
        word_cloud.to_file("wordcloud.png")

    # word cloud for hashtags
    hash_fig = plt.figure()
    word_cloud = WordCloud(background_color="white", width=3000, height=2000, max_font_size=500,
                           max_words=100, prefer_horizontal=1.0, stopwords=stopwords)
    word_cloud.generate(" ".join(hash_list))
    plt.imshow(word_cloud)
    plt.axis("off")
    plt.title("Frequent hashtags in Tweets", fontsize=15)
    plt.show()
    if save_imgs:
        word_cloud.to_file("hashtags.png")
    
    return words_fig, hash_fig

### Attention Over Time Plots

This function plots the volume of tweets relevant to a query over time. Similar to the wordcloud function, this function avoids additional API calls and takes the DataFrame from the Tweet search functions as an input. The user can adjust aspects of the plot to fit different use cases, such as the title, plot type, and x-axis labels. The arguments for this function are:

- `df`: DataFrame of Tweet data, created from one of the Tweet search functions defined above.
- `query`: The query used to create `df`. If passed into this function, adds a subtitle to the plot with the query.
- `title`: The title of the plot.
- `xlabel`: "month", "year", or "day" (default "month"). Granularity of ticks and labels on the x-axis.
- `plot_type`: "line" or "bar" (default "line"). Choose between line or bar plot for attention over time.
- `figsize`: Default (10,5). Size of the figure outputted by this function.

In [8]:
# plot function
def attention_plots(df, query=None, title="Tweet count over time", xlabel="month", plot_type="line", figsize=(10,5)):
    
    # ensuring the correct parameters have been passed
    assert plot_type in ("line", "bar"), "Please input 'line' or 'bar' into plot_type"
    assert xlabel in ("day", "month", "year"), "Please input 'day', 'month', or 'year' into xlabel"
        
    # converting dates to datetime, getting counts of tweets per day
    import pandas as pd
    df["created_at"] = pd.to_datetime(df["created_at"])
    daily_counts = test.groupby(test["created_at"].dt.date).count()
    dates = pd.to_datetime(daily_counts.index)
    
    # creating figure for plot
    import matplotlib.pyplot as plt
    figure = plt.figure(figsize=figsize)
    
    # line or bar graph, depending on input
    if plot_type == "line":
        plt.plot(daily_counts.index, daily_counts["text"])
    else:
        plt.bar(daily_counts.index, daily_counts["text"])
    
    # setting x-axis ticks to be month, day, or year, depending on input
    if xlabel == "month":
        period = "M"
        tick_labels = dates.to_period(period).unique().strftime("%b %Y")
    elif xlabel == "day":
        period = "D"
        tick_labels = dates.to_period(period).unique().strftime("%m-%d-%Y")
    elif xlabel == "year":
        period = "Y"
        tick_labels = dates.to_period(period).unique()
    tick_locs = dates.to_period(period).unique()
    plt.xticks(ticks=tick_locs, labels=tick_labels, rotation=90)
    
    # setting plot title and subtitle (if query is passed)
    plt.suptitle(title, fontsize=15)
    if query:
        plt.title(f"Query: {query}")
    plt.xlabel("Date")
    plt.ylabel("Number of Tweets")
    plt.show()
    
    return figure

## Example Use Case: Caitlin Jenner & Elliot Page

The rest of the notebook will walk through an example use case for these functions: comparing the discussions around Caitlin Jenner and Elliot Page when they came out as transgender. The example will use all of the functions defined above as a simple baseline for users to see how they work and what their outputs are.

In [1]:
# importing functions from rethink_twitter_functions.py
from rethink_twitter_functions import *

# importing a module so we can time how long the functions take
import time

# defining some search strings for the API queries
query1 = '"elliot page" OR "ellen page"'
query1_label = "Elliot Page"
query2 = '"caitlin jenner" OR "bruce jenner"'
query2_label = "Caitlin Jenner"

### Current relevance

We can get an initial idea about the difference in how these two celebrities are viewed by looking at what people are saying about them. We can use the `search_7()` function to see how people are talking about these celebrities right now.

In [2]:
# running and timing the search_7 function for Elliot Page 
start = time.time()
page_7 = search_7(query1, max_results=2000, write_csv=True)
end = time.time()

print(f"Time taken: {(end-start)/60} min")
print(f'{query1} mentioned {len(page_7)} times')
page_7.head()

Time taken: 0.3340771118799845 min
"elliot page" OR "ellen page" mentioned 2004 times


Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,referenced_tweets,followers_count,verified,entities_hashtags,retweet_count,reply_count,like_count,quote_count,total_engagements
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1465785991931871248,"RT @BoosvrouwNL: Waarom, @filmtotaal? Waarom v...",,897405572898803712,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1465785991931871248,2021-11-30 20:53:08+00:00,"{'mentions': [{'start': 3, 'end': 15, 'usernam...",,,nl,"[(type, id)]",1049,False,,1,0,0,0,1
1465783896222715907,RT @Dazed: ‘That feeling you get when you fina...,,2895973654,,1465783896222715907,2021-11-30 20:44:48+00:00,"{'urls': [{'start': 88, 'end': 111, 'url': 'ht...",,,en,"[(type, id)]",690,False,,3,0,0,0,3
1465783849808396288,"RT @GeorgeTakei: Way to go, Elliot! 👏 https://...",,879459933988585472,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1465783849808396288,2021-11-30 20:44:37+00:00,"{'urls': [{'start': 38, 'end': 61, 'url': 'htt...",,,en,"[(type, id)]",2461,False,,21,0,0,0,21
1465783685832232966,"Tras cambio de género, Elliot Page presume su ...",,1596010039,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1465783685832232966,2021-11-30 20:43:58+00:00,"{'urls': [{'start': 62, 'end': 85, 'url': 'htt...",,,es,,14,False,,0,0,0,0,0
1465783283594272773,@nowthisnews @colbertlateshow Wow calling Elle...,,1399145309901557771,"[{'domain': {'id': '3', 'name': 'TV Shows', 'd...",1093382099778777089,2021-11-30 20:42:22+00:00,"{'annotations': [{'start': 42, 'end': 46, 'pro...",,701725963.0,en,"[(type, id)]",52,False,,0,0,0,0,0


In [3]:
# showing samples of tweets within DataFrame
page_sample = page_7.sample(n=10)
num_tweets = 1
for tweet in page_sample.text:
    print(f"Tweet {num_tweets}:")
    print(tweet,"\n")
    num_tweets += 1

Tweet 1:
Elliot Page, #39;Juno#39; star, shows off six-packs in shirtless mirror selfie https://t.co/UUM06JVw4X 

Tweet 2:
https://t.co/3IfJ7NrEl1 

Tweet 3:
RT @BreitbartNews: Well thank goodness for that. Now all us working class peasants can go back to figuring out how to afford groceries and… 

Tweet 4:
エリオット・ペイジ、シャツなしのミラーセルフィーで6パックを披露。"Oh Good New Phone Works" https://t.co/oWncuXvaPw
https://t.co/nk3MNGd77f 

Tweet 5:
Elliot Page: Η selfie χωρίς μπλούζα που ενθουσίασε τους followers του https://t.co/BPncHAljNe 

Tweet 6:
Fans are loving Elliot Page’s latest selfie: ‘You’re embodying your confidence’ https://t.co/dbK9U15jl0 

Tweet 7:
RT @MrAndyNgo: Canadian actress @EllenPage broke down in tears on @colbertlateshow talking about the alleged attack on #JussieSmollett. She… 

Tweet 8:
RT @TMZ: Elliot Page flashes THOSE washboard abs! (via @toofab)
https://t.co/YV7rrj49kH 

Tweet 9:
RT @PinkNews: Elliot Page radiates gender euphoria in shirtless selfie and fans are in awe https://t.c

In [4]:
# running and timing the search_7 function for Caitlin Jenner
start = time.time()
jenn_7 = search_7(query2, max_results=2000, write_csv=True)
end = time.time()

print(f"Time taken: {(end-start)/60} min")
print(f'{query2} mentioned {len(jenn_7)} times')
jenn_7.head()

Time taken: 0.23790675799051922 min
"caitlin jenner" OR "bruce jenner" mentioned 495 times


Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,referenced_tweets,followers_count,verified,entities_hashtags,retweet_count,reply_count,like_count,quote_count,total_engagements
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1465787633427169289,@NoLieWithBTC I'd prefer to leave Caitlin Jenn...,,369119901,,1465761727283404801,2021-11-30 20:59:39+00:00,"{'annotations': [{'start': 34, 'end': 47, 'pro...",,1.2682236904806154e+18,en,"[(type, id)]",1585,False,,0,0,0,0,0
1465781874064871427,@ChrisStigall Caitlin Jenner level of no thank...,,1183743767456747523,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1465769310295674885,2021-11-30 20:36:46+00:00,"{'annotations': [{'start': 14, 'end': 27, 'pro...",,22008787.0,en,"[(type, id)]",73,False,,0,0,0,0,0
1465778526645223428,Why did Bruce Jenner break Kendall Jenner? To ...,,1322653007171837954,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1465778526645223428,2021-11-30 20:23:28+00:00,"{'annotations': [{'start': 8, 'end': 19, 'prob...",,,en,,38,False,,0,0,0,0,0
1465768902361759746,Did they at least give my guy Bruce Jenner a r...,,1417493305496068103,,1465768902361759746,2021-11-30 19:45:13+00:00,"{'annotations': [{'start': 30, 'end': 41, 'pro...",,,en,,243,False,,0,3,3,0,6
1465768816604897285,Crazy to think Bruce Jenner was an Olympian li...,,1115745392040103936,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1465768816604897285,2021-11-30 19:44:53+00:00,"{'annotations': [{'start': 15, 'end': 26, 'pro...",,,en,,619,False,,0,0,0,0,0


In [5]:
# showing samples of tweets within DataFrame
jenn_sample = jenn_7.sample(n=10)
num_tweets = 1
for tweet in jenn_sample.text:
    print(f"Tweet {num_tweets}:")
    print(tweet,"\n")
    num_tweets += 1

Tweet 1:
@G_Riedel @MattWalshBlog Scared liberals.  This is why they call people like Bruce Jenner “brave”, because the actual form of bravery is gone from leftist thought patterns. 

Tweet 2:
@bambi70587659 @czarlatans @osriggle @NightMovesPat @jk_rowling I'm of the mind that gender ("womanhood") is a construct but sex (female vs. male) is definitely not. Trans women *are* women, but context matters. Bringing it back to equality discussions, I find it appalling that Caitlin Jenner's prolife views would be given any weight, for ex. 

Tweet 3:
@MysterySolvent She looks like Caitlin Jenner 

Tweet 4:
@gregkellyusa i thought that was Caitlin Jenner 

Tweet 5:
@yannispappas Isnt Bruce Jenner part indigenous….. 

Tweet 6:
I always felt that my greatest asset was not my physical ability, it was my mental ability. – Bruce Jenner #leadership #quote
LIKE ▪️ SHARE ▪️ FOLLOW 

Tweet 7:
RT @Phabrick: "WHY IS IT EASIER FOR BRUCE JENNER TO CHANGE HIS GENDER, THAN IT IS FOR CASSIUS CLAY TO CHANGE HIS

Elliot Page has been mentioned over 2,000 times in the past week, whereas Caitlin Jenner has been mentioned 495 times. From the samples that we've displayed, it also seems like Elliot Page is held in higher regard than Caitlin Jenner on Twitter. We only display 10 samples for each celebrity though, so the samples may not be representative of the rest of the population.

### Names and deadnames

We can get an idea of how much these two celebrities' identities are respected by looking at how many times they are referenced by their "deadname." A deadname is the birth name that a transgender person drops when they transition, often in favor of a name that fits their gender. We can use the `search_30()` function to get an idea about how Elliot and Caitlin are viewed in the public eye, whether they are more referenced by their chosen name or their deadname.

In [6]:
# running and timing the search_30 function for Elliot Page 
start = time.time()
page_dn = search_30('"ellen page"', max_results=3000, write_csv=True)
end = time.time()

print(f"Time taken: {(end-start)/60} min")
print(f'Elliot Page has been deadnamed {len(page_dn)} times')
page_dn.head()

<tweepy.cursor.ItemIterator object at 0x7f8186efa5b0>
Time taken: 0.12924110889434814 min
Elliot Page has been deadnamed 578 times


Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,referenced_tweets,followers_count,verified,entities_hashtags,retweet_count,reply_count,like_count,quote_count,total_engagements
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1465783283594272773,@nowthisnews @colbertlateshow Wow calling Elle...,,1399145309901557771,"[{'domain': {'id': '3', 'name': 'TV Shows', 'd...",1093382099778777089,2021-11-30 20:42:22+00:00,"{'annotations': [{'start': 42, 'end': 46, 'pro...",,701725963.0,en,"[(type, id)]",52,False,,0,0,0,0,0
1465781235733696527,RT @Awake823Mama: Yes this is the same person....,,399821632,,1465781235733696527,2021-11-30 20:34:14+00:00,"{'annotations': [{'start': 47, 'end': 51, 'pro...",,,en,"[(type, id)]",411,False,,3,0,0,0,3
1465773620412788745,"""This world would be a whole lot better if we ...",{'media_keys': ['7_1465773520554708995']},3283858680,,1465773620412788745,2021-11-30 20:03:58+00:00,"{'urls': [{'start': 116, 'end': 139, 'url': 'h...",,,en,,232,False,,0,0,0,0,0
1465769020544716801,RT @MrAndyNgo: Canadian actress @EllenPage bro...,,1265727417395695623,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1465769020544716801,2021-11-30 19:45:41+00:00,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",,,en,"[(type, id)]",221,False,,635,0,0,0,635
1465766368104660993,@jokesdepartment Tbh it's a bit weird seeing a...,,1437224125018353671,,1465358431699750914,2021-11-30 19:35:09+00:00,"{'annotations': [{'start': 71, 'end': 75, 'pro...",,1.3900601438706767e+18,en,"[(type, id)]",86,False,,0,0,0,0,0


In [2]:
# running and timing the search_30 function for Caitlin Jenner 
start = time.time()
jenn_dn = search_30('"bruce jenner"', max_results=2000, write_csv=True)
end = time.time()

print(f"Time taken: {(end-start)/60} min")
print(f'Caitlin Jenner has been deadnamed {len(jenn_dn)} times')
jenn_dn.head()

1465794393814798340
1465778526645223428
1465768902361759746
1465768816604897285
1465764623399931908
1465748351492702209
1465744956102828034
1465741394425745413
1465739723197067282
1465739576992010244
1465725188155351040
1465720758588481539
1465712681420308480
1465712553078796292
1465710124333297681
1465697474312433681
1465695339231416328
1465694721032876053
1465691947507531780
1465674173120856064
1465670877928992777
1465646587263299584
1465646472603525122
1465634329242046464
1465605382986821633
1465596699833946231
1465593495545020417
1465591027423719425
1465589611342675972
1465560668199915520
1465557747005419521
1465551758818693120
1465546263672147975
1465513091572064256
1465510531666747396
1465499961806708739
1465493109064847366
1465492091354628106
1465467123308929026
1465449339254296578
1465445853490040835
1465435136695095306
1465431145538105348
1465430236946911237
1465411039571423235
1465410902388314120
1465410881714593814
1465410859648393231
1465410839155036167
1465410813309726729


1462397340304642052
1462397293139660803
1462397196846862337
1462397156887724033
1462397020434403328
1462397009621659657
1462396921524404231
1462365141924651011
1462364964077723652
1462363131162742788
1462362281254080526
1462359078919979008
1462358716158812160
1462349604863844360
1462349603769143299
1462345077334765570
1462313797008384002
1462309071491674112
1462306468271730688
1462305632506114052
1462304479135420418
1462281629322330119
1462267770876399621
1462266397480923137
1462266201271414785
1462257018148990977
1462237544481972229
1462235695825752064
1462218593135964160
1462216382863396873
1462182099788697600
1462178521246146564
1462176020665143307
1462175195486031875
1462175114137583624
1462174761757327362
1462173139870257153
1462152834535723010
1462149449384439812
1462148273628598275
1462127151914766338
1462123770584317958
1462122743793524736
1462120273906438151
1462102790696751111
1462102247693762561
1462101886257926146
1462094011598983168
1462093795374284808
1462088155868377094


NameError: name 'tweet_ids' is not defined

In the past month, Elliot Page has been deadnamed 578 times, and Caitlin Jenner has been deadnamed so much that the Twitter API threw an error trying to retrieve all of the Tweets.

In [10]:
api1 = init_api_1()
api1.rate_limit_status()

Unauthorized: 401 Unauthorized