# ReThink Media Twitter API: Tutorial and Examples

This notebook will provide a user manual and example use cases for using ReThink Media's Twitter API functions. The functions in this notebook will provide the capabilities to:
- Search Tweets relevant to a query, over different time periods
- Save Tweets and Tweet metadata to a .csv file for later reference and use
- Create wordclouds for frequent keywords and hashtags
- Create plots of Tweet counts over time, with adjustable titles and axes

As an example use case for these functions, this notebook will compare the discussions around the coming out of two transgender celebrities: Caitlin Jenner and Elliot Page.

## Defining Functions

The first part of this notebook is dedicated to defining and explaining the functions mentioned above, with the example use case to follow.

### Authentication & Utility Functions

These functions are utility functions that are embedded within the main ones, and must be initialized before the others are used. Run the cells below before running the other functions.

**IMPORTANT NOTE:** The Twitter API requires API keys and other authentication tokens in order to function properly. A user must have a Twitter Developer account with these keys available in order to use the functions in this notebook. If you have these keys available, create a text file named `.env` in the home folder for your notebook environment with the following format:

```
API_KEY="your_api_key"
API_KEY_SECRET="your_secret_api_key"
BEARER_TOKEN="your_bearer_token"
ACCESS_TOKEN="your_access_token"
ACCESS_SECRET="your_secret_access_token"
```

In [32]:
# function to initialize Twitter API v1.1 instance (for 30-day and full archive search)
def init_api_1():
    
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # Twitter API authentication
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    
    # instantiating Twitter API v1.1 reference
    api_1 = tweepy.API(auth, wait_on_rate_limit=True)
    
    return api_1

In [33]:
# function to initialize Twitter API v2 instance (for 7-day search)
def init_api_2():
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # instantiating Twitter API v2 reference
    api_2 = tweepy.Client(bearer_token=bearer_token,
                         consumer_key=consumer_key,
                         consumer_secret=consumer_secret,
                         access_token=access_token,
                         access_token_secret=access_secret,
                         wait_on_rate_limit=True)
    
    return api_2

In [3]:
# function to parse Twitter API v2 response into a DataFrame of Tweet data
def tweet_df(df, response, tweet_fields):
    
    users = response.includes['users']
    user_data = {user['id']: [user['public_metrics']['followers_count'], user['verified']] for user in users}
        
    # looping through each Tweet in response, parsing data
    for i in range(len(response.data)):
        tweet = response.data[i]
        tweet_id = tweet.id
        tweet_data = {}
        for field in tweet_fields:
            if tweet[field]:
                tweet_data[field] = tweet[field]
                
                # extracting hashtags from "entities" field and adding it as its own column
                if field == "entities":
                    try:
                        hashtag_data = tweet[field]['hashtags']
                        hashtags = [hashtag['tag'] for hashtag in hashtag_data]
                        tweet_data['entities_hashtags'] = hashtags
                    except KeyError:
                        tweet_data['entities_hashtags'] = None
                
                # separating metrics from "public_metrics" field and adding them as their own column
                if field == "public_metrics":
                    metrics = list(tweet[field].keys())
                    for metric in metrics:
                        tweet_data[metric] = tweet[field][metric]
                
            else:
                tweet_data[field] = None
                if field == "entities":
                    tweet_data['entities_hashtags'] = None
        
        # adding user data to DataFrame
        user = user_data[tweet['author_id']]
        tweet_data['followers_count'] = user[0]
        tweet_data['verified'] = user[1]
        
        df.loc[tweet_id] = tweet_data
    
    return df

### Tweet Search Functions

The Twitter API has different limits on how many API requests a user can make and how many Tweets they can receive, depending on how far back the user wants to search. For this reason, there are three different Tweet search functions, and the user should choose the function that best fits their use case:

- `search_7()`: Search Tweets within the past 7 days. Unlimited API requests, 500,000 Tweets per month.
- `search_30()`: Search Tweets within the past 30 days. 250 API requests, 25,000 Tweets per month.
- `search_full()`: Search Tweets from the full archive. 50 API requests, 5,000 Tweets per month.

The Twitter API also has a limit of 100 API requests per 15-minute interval, regardless of which function is used. If the quota runs out, the functions will wait until the time limit resets, and then continue collecting Tweets.

The arguments for these functions are:
- `query`: The query to search the Twitter API for
- `start_date`: The date to start the search (default `None`). If `None`, the function will default to 7 days ago.
- `end_date`: The date to end the search (default `None`). If `None`, the function will default to today.
- `max_results`: The maximum amount of Tweets to return in the DataFrame (default 20).
- `write_csv`: Boolean, whether to save the DataFrame as a csv file or not. Default `False`.
- `filename`: Filename for the csv if `write_csv` is `True`. Default name is `search_7.csv`, `search_30.csv`, or `search_full.csv`, depending on the function used.

In [4]:
# function to retrieve Tweets from the past 7 days relevant to a query
def search_7(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_7.csv"):
    
    # initializing API v1.1 instance
    api_2 = init_api_2()
    
    # parsing dates passed into function
    from dateutil import parser
    from datetime import datetime
    if start_date:
        start_date = parser.parse(start_date)
        start_date = start_date.strftime("%Y%m%d%H%M")
    if end_date:
        end_date = parser.parse(end_date)
        end_date = end_date.strftime("%Y%m%d%H%M")
    
    # setting Tweet and user data to be included in response
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    user_fields = ["public_metrics", "verified"]
    
    # initializing variables for API calls and DataFrame for Tweet data
    import pandas as pd
    next_token = None
    num_tweets = 0
    tweets = pd.DataFrame(columns=tweet_fields+['followers_count', 'verified']+
                          ['entities_hashtags','retweet_count','reply_count','like_count','quote_count'])
    tweets.index.name = "Tweet ID"
    
    # making my own pagination loop to further examine the rate limit
    num_loops = 0
    while num_tweets < max_results:
        
        # the API only retrieves between 10 and 100 Tweets per call
        # NOTE: number of API results isn't consistent. max_results=100 doesn't guarantee 100 Tweets in response
        if max_results - num_tweets >= 100:
            num_results = 100
        else:
            num_results = max_results - num_tweets if max_results - num_tweets > 10 else 10
        
        # calling API and searching Tweets over past 7 days
        response = api_2.search_recent_tweets(f"{query} lang:en", 
                                              start_time=start_date,
                                              end_time=end_date,
                                              max_results=num_results,
                                              next_token=next_token,
                                              tweet_fields=tweet_fields,
                                              expansions='author_id',
                                              user_fields=user_fields)
        
        # setting variables for the next loop
        try:
            next_token = response[3]['next_token']
        except KeyError:
            next_token = None
        num_tweets += len(response.data)
        num_loops += 1
        
        # adding Tweet data to DataFrame
        tweets = tweet_df(tweets, response, tweet_fields)
        
    # dropping "public_metrics" since all the values are unpacked, adding "total_engagements"
    tweets.drop('public_metrics', axis=1, inplace=True)
    total_engagements = tweets["retweet_count"] + tweets["reply_count"] + tweets["like_count"] + tweets["quote_count"]
    tweets["total_engagements"] = total_engagements
        
    # writing Tweet DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [48]:
# function to search Tweets within the past 30 days
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_30(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_30.csv"):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    from datetime import datetime
    if start_date:
        start_date = parser.parse(start_date)
        start_date = start_date.strftime("%Y%m%d%H%M")
    if end_date:
        end_date = parser.parse(end_date)
        end_date = end_date.strftime("%Y%m%d%H%M")
    
    # retrieving Tweets from the past 30 days relevant to query using tweepy's pagination function
    import tweepy
    response_1 = tweepy.Cursor(api_1.search_30_day,
                               label="30day",
                               query=f"{query} lang:en",
                               fromDate=start_date,
                               toDate=end_date,
                               maxResults=100
                              ).items(max_results)
    
    # gathering Tweet ID's in a list
    print(response_1)
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # setting Tweet data to be included in response_2
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    user_fields = ["public_metrics", "verified"]
    
    # initializing variables for API v2 calls and DataFrame for Tweet data
    import pandas as pd
    num_tweets = 0
    tweets = pd.DataFrame(columns=tweet_fields+['followers_count', 'verified']+
                          ['entities_hashtags','retweet_count','reply_count','like_count','quote_count'])    
    tweets.index.name = "Tweet ID"
    
    # loop to retrieve Tweets from ID's through API v2, 100 at a time
    api_2 = init_api_2()
    
    while num_tweets < max_results:
        # slicing tweet_ids since API v2 get_tweets only takes max 100 ID's per request
        try:
            slice_ids = tweet_ids[num_tweets:num_tweets+100]
        except IndexError:
            slice_ids = tweet_ids[num_tweets:]
        if len(slice_ids) == 0:
            break

        # retrieving Tweet data from API v2 and adding to DataFrame
        response_2 = api_2.get_tweets(slice_ids, tweet_fields=tweet_fields, 
                                      expansions='author_id', user_fields=user_fields)
        tweets = tweet_df(tweets, response_2, tweet_fields)
        num_tweets += len(response_2.data)
    
    # dropping "public_metrics" since all the values are unpacked, adding "total_engagements"
    tweets.drop('public_metrics', axis=1, inplace=True)
    total_engagements = tweets["retweet_count"] + tweets["reply_count"] + tweets["like_count"] + tweets["quote_count"]
    tweets["total_engagements"] = total_engagements
    
    # writing Tweet DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [49]:
# function to search Tweets within the full Tweet archive
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_full(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_full.csv"):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    from datetime import datetime
    if start_date:
        start_date = parser.parse(start_date)
        start_date = start_date.strftime("%Y%m%d%H%M")
    if end_date:
        end_date = parser.parse(end_date)
        end_date = end_date.strftime("%Y%m%d%H%M")
    
    # retrieving Tweets from the full tweet archive relevant to query using tweepy's pagination function
    import tweepy
    response_1 = tweepy.Cursor(api_1.search_full_archive,
                               label="full",
                               query=f"{query} lang:en",
                               fromDate=start_date,
                               toDate=end_date,
                               maxResults=100
                              ).items(max_results)
    
    # gathering Tweet ID's in a list
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # setting Tweet data to be included in response
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    user_fields = ["public_metrics", "verified"]
    
    # initializing variables for API calls and DataFrame for Tweet data
    import pandas as pd
    tweets = pd.DataFrame(columns=tweet_fields+["followers_count", "verified"]+
                          ['entities_hashtags','retweet_count','reply_count','like_count','quote_count'])
    tweets.index.name = "Tweet ID"
    
    # loop to retrieve Tweets from ID's through API v2, 100 at a time
    api_2 = init_api_2()
    num_tweets = 0
    while num_tweets < max_results:
        # slicing tweet_ids since API v2 get_tweets only takes max 100 ID's per request
        try:
            slice_ids = tweet_ids[num_tweets:num_tweets+100]
        except IndexError:
            slice_ids = tweet_ids[num_tweets:]
        if len(slice_ids) == 0:
            break

        # retrieving Tweet data from API v2 and adding to DataFrame
        response_2 = api_2.get_tweets(slice_ids, tweet_fields=tweet_fields,
                                     expansions='author_id', user_fields=user_fields)
        tweets = tweet_df(tweets, response_2, tweet_fields)
        num_tweets += len(response_2.data)
    
    # dropping "public_metrics" since all the values are unpacked, adding "total_engagements"
    tweets.drop('public_metrics', axis=1, inplace=True)
    total_engagements = tweets["retweet_count"] + tweets["reply_count"] + tweets["like_count"] + tweets["quote_count"]
    tweets["total_engagements"] = total_engagements
    
    # writing Tweets DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

### Wordclouds

This function creates wordclouds for frequent words and hashtags in Tweet data. To avoid making any unnecessary API calls, this function takes the DataFrame created from the search functions as an input. The arguments for this function are:

- `df`: DataFrame of Tweet data, created from one of the Tweet search functions defined above.
- `query`: The query used to create `df`. If passed into the function, `query` is added to the stop words for the word cloud, so they aren't added to the cloud.
- `save_imgs`: Boolean, whether to save the images to a file or not. The filenames will be `wordcloud.png` and `hashtags.png` in the current working directory.

In [7]:
def word_cloud(df, query=None, save_imgs=False):
    # combining DataFrame text column into one long string, doing some initial pre-processing
    import pandas as pd
    tweet_text = " ".join(df["text"])
    tweet_text = tweet_text.lower()
    tweet_text = tweet_text.replace("\n", " ")
    
    # splitting string into set of words, removing hashtags, usernames, links, and retweet indicator
    word_list = set(tweet_text.split(" "))
    hash_list = {word for word in word_list if word.startswith("#")}
    user_list = {word for word in word_list if word.startswith("@")}
    link_list = {word for word in word_list if word.startswith("http")}
    word_list = {word for word in word_list if word not in hash_list.union(user_list, link_list)}
    word_list = {word for word in word_list if word != "rt"}
    
    # using nltk tokenizer to further pre-process text, removing non-alpha words
    from nltk.tokenize import word_tokenize
    import nltk
    nltk.download('punkt')
    tweet_text = " ".join(word_list)
    word_list = word_tokenize(tweet_text)
    word_list = {word for word in word_list if word.isalpha()}
    
    # joining list of words into final cleaned string
    tweet_text = " ".join(word_list)
    
    # generating word cloud
    from wordcloud import WordCloud, STOPWORDS
    import matplotlib.pyplot as plt

    stopwords = set(STOPWORDS)
    
    # adding words from query to stop words so they don't show up in the word cloud
    if query:
        stopwords.update(query.split())

    # word cloud for text
    words_fig = plt.figure()
    word_cloud = WordCloud(background_color="white", width=3000, height=2000, max_font_size=500,
                           max_words=100, prefer_horizontal=1.0, stopwords=stopwords)
    word_cloud.generate(tweet_text)
    plt.imshow(word_cloud)
    plt.axis("off")
    plt.title("Frequent keywords in Tweets", fontsize=15)
    plt.show()
    if save_imgs:
        word_cloud.to_file("wordcloud.png")

    # word cloud for hashtags
    hash_fig = plt.figure()
    word_cloud = WordCloud(background_color="white", width=3000, height=2000, max_font_size=500,
                           max_words=100, prefer_horizontal=1.0, stopwords=stopwords)
    word_cloud.generate(" ".join(hash_list))
    plt.imshow(word_cloud)
    plt.axis("off")
    plt.title("Frequent hashtags in Tweets", fontsize=15)
    plt.show()
    if save_imgs:
        word_cloud.to_file("hashtags.png")
    
    return words_fig, hash_fig

### Attention Over Time Plots

This function plots the volume of tweets relevant to a query over time. Similar to the wordcloud function, this function avoids additional API calls and takes the DataFrame from the Tweet search functions as an input. The user can adjust aspects of the plot to fit different use cases, such as the title, plot type, and x-axis labels. The arguments for this function are:

- `df`: DataFrame of Tweet data, created from one of the Tweet search functions defined above.
- `query`: The query used to create `df`. If passed into this function, adds a subtitle to the plot with the query.
- `title`: The title of the plot.
- `xlabel`: "month", "year", or "day" (default "month"). Granularity of ticks and labels on the x-axis.
- `plot_type`: "line" or "bar" (default "line"). Choose between line or bar plot for attention over time.
- `figsize`: Default (10,5). Size of the figure outputted by this function.

In [8]:
# plot function
def attention_plots(df, query=None, title="Tweet count over time", xlabel="month", plot_type="line", figsize=(10,5)):
    
    # ensuring the correct parameters have been passed
    assert plot_type in ("line", "bar"), "Please input 'line' or 'bar' into plot_type"
    assert xlabel in ("day", "month", "year"), "Please input 'day', 'month', or 'year' into xlabel"
        
    # converting dates to datetime, getting counts of tweets per day
    import pandas as pd
    df["created_at"] = pd.to_datetime(df["created_at"])
    daily_counts = test.groupby(test["created_at"].dt.date).count()
    dates = pd.to_datetime(daily_counts.index)
    
    # creating figure for plot
    import matplotlib.pyplot as plt
    figure = plt.figure(figsize=figsize)
    
    # line or bar graph, depending on input
    if plot_type == "line":
        plt.plot(daily_counts.index, daily_counts["text"])
    else:
        plt.bar(daily_counts.index, daily_counts["text"])
    
    # setting x-axis ticks to be month, day, or year, depending on input
    if xlabel == "month":
        period = "M"
        tick_labels = dates.to_period(period).unique().strftime("%b %Y")
    elif xlabel == "day":
        period = "D"
        tick_labels = dates.to_period(period).unique().strftime("%m-%d-%Y")
    elif xlabel == "year":
        period = "Y"
        tick_labels = dates.to_period(period).unique()
    tick_locs = dates.to_period(period).unique()
    plt.xticks(ticks=tick_locs, labels=tick_labels, rotation=90)
    
    # setting plot title and subtitle (if query is passed)
    plt.suptitle(title, fontsize=15)
    if query:
        plt.title(f"Query: {query}")
    plt.xlabel("Date")
    plt.ylabel("Number of Tweets")
    plt.show()
    
    return figure

## Example Use Case: Caitlin Jenner & Elliot Page

The rest of the notebook will walk through an example use case for these functions: comparing the discussions around Caitlin Jenner and Elliot Page when they came out as transgender. The example will use all of the functions defined above as a simple baseline for users to see how they work and what their outputs are.

In [2]:
# importing functions from rethink_twitter_functions.py
from rethink_twitter_functions import *

# importing a module so we can time how long the functions take
import time

# defining some search strings for the API queries
page_search = '"elliot page"'
jenn_search = '"caitlin jenner"'

### Current relevance

We can get an initial idea about the difference in how these two celebrities are viewed by looking at what people are saying about them. We can use the `search_7()` function to see how people are talking about these celebrities right now.

In [3]:
# running and timing the search_7 function for Elliot Page 
start = time.time()
page_7 = search_7(page_search, max_results=2000, write_csv=True)
end = time.time()

print(f"Time taken: {(end-start)/60} min")
print(f'{page_search} mentioned {len(page_7)} times')
page_7.head()

Time taken: 0.34371581077575686 min
"elliot page" mentioned 264 times


Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,referenced_tweets,followers_count,verified,entities_hashtags,retweet_count,reply_count,like_count,quote_count,total_engagements
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1461442602889355268,"anyway, there's new elliot page pics. in case ...",,772915432017702912,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1461442462082342913,2021-11-18 21:14:03+00:00,,,7.729154320177029e+17,en,"[(type, id)]",1306,False,,0,0,0,0,0
1461442233140346881,elliot page my beloved the light of my life ht...,"{'media_keys': ['3_1461442228530782211', '3_14...",1328850303710420993,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1461442233140346881,2021-11-18 21:12:35+00:00,"{'urls': [{'start': 44, 'end': 67, 'url': 'htt...",,,en,,298,False,,0,0,0,0,0
1461442155713417220,A story of homoerotic sex in 1950s New York. S...,,1110200174398312448,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1461442155713417220,2021-11-18 21:12:16+00:00,"{'annotations': [{'start': 29, 'end': 42, 'pro...",,,en,,245,False,,0,0,0,0,0
1461435798700572672,"@MiaOnSunday The child in that netflix show, h...",,715859536,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1461431449370906625,2021-11-18 20:47:01+00:00,"{'annotations': [{'start': 31, 'end': 37, 'pro...",,61092130.0,en,"[(type, id)]",1039,False,,0,1,0,0,1
1461427587431940097,@llorithaine If you haven’t seen Whip It w Ell...,,1423074082715750401,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1461426094431735814,2021-11-18 20:14:23+00:00,"{'annotations': [{'start': 43, 'end': 53, 'pro...",,1.4230740827157504e+18,en,"[(type, id)]",30,False,,0,0,1,0,1


In [4]:
# showing samples of tweets within DataFrame
page_sample = page_7.sample(n=10)
num_tweets = 1
for tweet in page_sample.text:
    print(f"Tweet {num_tweets}:")
    print(tweet,"\n")
    num_tweets += 1

Tweet 1:
@GaymerExtofer @Asymetricalhomo @A_hungry_Fool @jeffbrutlag @ASm1thee Elliot Page addressed that directly.

It’s a fair question to ask, and it’s his prerogative to respond (or not). From there, we can take it as it is. I don’t know if Chrispy is anti, but if he wants the benefit of ambiguity he also has to deal with the repercussions of ambiguity. https://t.co/elRGEChCWe 

Tweet 2:
@LynchRegan Also I haaaaaaate being called king - feels like something an overly woke cis girl would call Elliot page when she’s performing her allyship a little too hard 

Tweet 3:
RT @lizardkingfe: @graceelavery look I wasn’t expecting grand insights into life as a butch lesbian from Graham, but you can tell he’s only… 

Tweet 4:
Actors
Cara Delevingne (Pan, Genderfluid She/Her)
Elliot Page (Trans man He/They)
Jack Dylan Grazer (Bi He/They)
Auli’i Cravalho (Bi She/Her) https://t.co/fQjdc4tS2O 

Tweet 5:
@PotatoSoup13 I DONT WANT A LADY TO TELL ME THAT THE CONGRESS SHOULD STOP ROE V WADE AND THEN 

In [5]:
# running and timing the search_7 function for Caitlin Jenner
start = time.time()
jenn_7 = search_7(jenn_search, max_results=2000, write_csv=True)
end = time.time()

print(f"Time taken: {(end-start)/60} min")
print(f'{jenn_search} mentioned {len(jenn_7)} times')
jenn_7.head()

Time taken: 0.32984480063120525 min
"caitlin jenner" mentioned 119 times


Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,referenced_tweets,followers_count,verified,entities_hashtags,retweet_count,reply_count,like_count,quote_count,total_engagements
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1461426121560449025,RT @JennaferNyx: @BalkusAnthony There's a reas...,,1363319341958094849,,1461426121560449025,2021-11-18 20:08:34+00:00,"{'annotations': [{'start': 79, 'end': 92, 'pro...",,,en,"[(type, id)]",1046,False,,1,0,0,0,1
1461425986222923778,@BalkusAnthony There's a reason why transphobe...,,1363319341958094849,,1461422207117131777,2021-11-18 20:08:01+00:00,"{'annotations': [{'start': 62, 'end': 75, 'pro...",,1.394862254621958e+18,en,"[(type, id)]",1046,False,,1,1,2,0,4
1461421012587884550,@Mrjjrocks @dog_envier The difference between ...,,1012719733,,1461062447276564480,2021-11-18 19:48:15+00:00,"{'annotations': [{'start': 181, 'end': 194, 'p...",,195049592.0,en,"[(type, id)]",669,False,,0,1,0,0,1
1461415036090347522,@MrJGPozos @TheRealAndrew_ Ok would you vote f...,,781830667436863488,,1392653546600939523,2021-11-18 19:24:31+00:00,"{'annotations': [{'start': 49, 'end': 62, 'pro...",,1.0683172057361859e+18,en,"[(type, id)]",52,False,,0,1,0,0,1
1461413793846550529,@LibQn32 @ciara Seems they have a celebrity fo...,,2885584092,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1461395756221407238,2021-11-18 19:19:34+00:00,"{'annotations': [{'start': 103, 'end': 106, 'p...",,1.0154480389896192e+18,en,"[(type, id)]",259,False,,0,0,0,0,0


In [6]:
# showing samples of tweets within DataFrame
jenn_sample = jenn_7.sample(n=10)
num_tweets = 1
for tweet in jenn_sample.text:
    print(f"Tweet {num_tweets}:")
    print(tweet,"\n")
    num_tweets += 1

Tweet 1:
yall realise you can hate caitlin jenner without insulting her appearance or being transphobic right? just so we all clear 

Tweet 2:
@Tribe_XX Caitlin Jenner . Oh my 

Tweet 3:
known murderer caitlin jenner would like a word https://t.co/hc9aiuKnSX 

Tweet 4:
@rsosa8 Caitlin Jenner
Serena Williams 
Angelina Jolie

Tom Brady
Nolan Ryan 
Mick Jagger 

Tweet 5:
@SteMcC82 @SullyTrent @Ryanyates10 @20StoriesMCR No, it's a joke about Caitlin Jenner, not about trans people. 

Tweet 6:
@salsaboiii Adrian Adonis 2.0. Golddust..just a Bruce Jenner out there kicking ass..Sorry, Caitlin Jenner kicking ass. I like this storyline. 

Tweet 7:
@libsoftiktok Caitlin Jenner didn’t win the California governor’s race do to anti-trans California.  We all know California hates trans people
#Californiaismid 

Tweet 8:
@ItsOnlyMara20 @MusoniusRufus @questionbot1776 @SimpleArgonian @incompleteocean @Aly_Dar8 @Lynnia00721169 @acneonmyshirt @thatwitchyjess7 @maqart55 @shanoawarrior @Architectprod @chro

Elliot Page has been mentioned 264 times in the past week, whereas Caitlin Jenner has been mentioned 119 times. From the samples that we've displayed, it also seems like Elliot Page is held in higher regard than Caitlin Jenner on Twitter. We only display 10 samples for each celebrity though, so the samples may not be representative of the rest of the population.

### Names and deadnames

We can get an idea of how much these two celebrities' identities are respected by looking at how many times they are referenced by their "deadname." A deadname is the birth name that a transgender person drops when they transition, often in favor of a name that fits their gender. We can use the `search_30()` function to get an idea about how Elliot and Caitlin are viewed in the public eye, whether they are more referenced by their chosen name or their deadname.

In [38]:
# running and timing the search_30 function for Elliot Page 
start = time.time()
page_dn = search_30("ellen page", max_results=3000, write_csv=True)
end = time.time()

print(f"Time taken: {(end-start)/60} min")
print(f'Elliot Page has been deadnamed {len(page_dn)} times')
page_dn.head()

Time taken: 0.22551111777623495 min
Elliot Page has been deadnamed 946 times


Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,referenced_tweets,followers_count,verified,entities_hashtags,retweet_count,reply_count,like_count,quote_count,total_engagements
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1460316351428448256,RT @ZacballsPages: Ellen’s Spooky Streak page ...,{'media_keys': ['3_1454481459998302213']},1391282073957134338,"[{'domain': {'id': '119', 'name': 'Holiday', '...",1460316351428448256,2021-11-15 18:38:44+00:00,"{'mentions': [{'start': 3, 'end': 17, 'usernam...",,,en,"[(type, id)]",39,False,,11,0,0,0,11
1460275011600715783,@NNecroz @JustMightyJake That’s the studio beh...,,4550081715,,1460267304768442373,2021-11-15 15:54:28+00:00,"{'mentions': [{'start': 0, 'end': 8, 'username...",,1.0294753348471644e+18,en,"[(type, id)]",62,False,,0,1,0,0,1
1460243624025751554,RT @Feet_Addicted_: Kate Mara &amp; Ellen Page...,{'media_keys': ['7_1186737271959101442']},1199255575843889152,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1460243624025751554,2021-11-15 13:49:44+00:00,"{'mentions': [{'start': 3, 'end': 18, 'usernam...",,,en,"[(type, id)]",91,False,,109,0,0,0,109
1460236132860497930,@Bolt_451 Elliot Page (née Ellen) was sitting ...,,17558699,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1459527379131502595,2021-11-15 13:19:58+00:00,"{'mentions': [{'start': 0, 'end': 9, 'username...",,20631467.0,en,"[(type, id)]",456,False,,0,0,0,0,0
1460206967352446979,@dreangr selective professionalism huh. Facebo...,,1308376862184402944,"[{'domain': {'id': '46', 'name': 'Brand Catego...",1460193267245625344,2021-11-15 11:24:05+00:00,"{'mentions': [{'start': 0, 'end': 8, 'username...",,1.3716514919210803e+18,en,"[(type, id)]",34,False,,0,1,1,0,2


In [47]:
# running and timing the search_30 function for Caitlin Jenner 
start = time.time()
jenn_dn = search_30("bruce jenner", max_results=3000, write_csv=True)
end = time.time()

print(f"Time taken: {(end-start)/60} min")
print(f'Caitlin Jenner has been deadnamed {len(jenn_dn)} times')
jenn_dn.head()

<tweepy.cursor.ItemIterator object at 0x7fce3f3a4e50>


TooManyRequests: 429 Too Many Requests
Request exceeds account’s current package request limits. Please upgrade your package and retry or contact Twitter about enterprise access.

In the past month, Elliot Page has been deadnamed 946 times, and Caitlin Jenner has been deadnamed so much that the Twitter API threw an error trying to retrieve all of the Tweets.