# ReThink Media Twitter API

This notebook is for the development and exploration of code for ReThink Media's Twitter API Python interface. The main goals of this notebook are:

- Search Tweets: query, date (optional)
  - Past seven days
  - Past 30 days
  - Full archive
  - Language = English
- Collect Tweets in .csv file
- Add data visualization
  - Top hashtags, keywords, influencers
  - Volume over time for queries/topics

In [None]:
# importing necessary modules
from dotenv import load_dotenv
import os
import json
import numpy as np
import pandas as pd
import tweepy

load_dotenv()

## Utility Functions

Functions for general use across the different analysis functions within the notebook.

In [1]:
# function to parse Twitter API v2 response into a DataFrame of Tweet data
def tweet_df(response, tweet_fields):
    import pandas as pd
    
    # initializing DataFrame
    tweets = pd.DataFrame(columns=tweet_fields+['entities_hashtags'])
    tweets.index.name = "Tweet ID"
    
    # looping through each Tweet in response, parsing data
    for i in range(len(response[0])):
        tweet = response[0][i]
        tweet_id = tweet.id
        tweet_data = {}
        for field in tweet_fields:
            if tweet[field]:
                tweet_data[field] = tweet[field]
                
                # extracting hashtag from "entities" field and adding it as its own column
                if field == 'entities':
                    try:
                        tweet_data['entities_hashtags'] = tweet[field]['hashtags']
                    except KeyError:
                        tweet_data['entities_hashtags'] = None
            else:
                tweet_data[field] = None
        tweets.loc[tweet_id] = tweet_data
    
    return tweets

## Authentication

The variables below are what allow access to the Twitter API. I've defined them in a `.env` file, and I'm retrieving them with the code below. We then pass those variables in to a tweepy client in order to instantiate a Twitter API instance.

In [None]:
# retrieving environment variables
consumer_key = os.getenv("API_KEY")
consumer_secret = os.getenv("API_KEY_SECRET")
bearer_token = os.getenv("BEARER_TOKEN")
access_token = os.getenv("ACCESS_TOKEN")
access_secret = os.getenv("ACCESS_SECRET")

In [None]:
# Twitter API authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

In [2]:
# function to initialize Twitter API v1.1 instance (for 30-day and full archive search)
def init_api_1():
    
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # Twitter API authentication
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    
    # instantiating Twitter API v1.1 reference
    api_1 = tweepy.API(auth)
    
    return api_1

In [3]:
# function to initialize Twitter API v2 instance (for 7-day search)
def init_api_2():
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # instantiating Twitter API v2 reference
    api_2 = tweepy.Client(bearer_token=bearer_token,
                         consumer_key=consumer_key,
                         consumer_secret=consumer_secret,
                         access_token=access_token,
                         access_token_secret=access_secret)
    
    return api_2

## Recent Search

The search function available to us in the Standard API package restricts our search to the past seven days, without a premium API dev subscription. For searches further back in the archive, we need to subscribe to a premium API dev environment or upgrade to the Academic API package, which is given to researchers with a clear thesis or research paper goal in mind.

The query can be 512 characters maximum, and the user can specify a `start_time` and `end_time` (as `datetime` or `str` objects) within the past seven days. The user can also search for hashtags as well. The default behavior for white space is "AND" joins, e.g., hello world = hello AND world. More information about Twitter API queries can be found [in their documentation](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query).

The 7-day search can receive an unlimited number of requests and 500,000 Tweets per month.

The `response` object is a tuple, and it consists of four items: `(data, includes, errors, meta)`.

The `data` object contains the Tweets that are retrieved, and `meta` is the metadata for those Tweets. In this reponse object, `includes` and `errors` are empty, so I'm not sure what `includes` is yet.

In [4]:
# function to retrieve Tweets from the past 7 days relevant to a query
def search_7(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_7.csv"):
    
    # initializing API v1.1 instance
    api_2 = init_api_2()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets between start_date and end_date relevant to query
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    response = api_2.search_recent_tweets(query=f"({query}) lang:en",
                                         start_time=start_date,
                                         end_time=end_date,
                                         max_results=max_results,
                                         tweet_fields=tweet_fields)
    
    # adding Tweet data to DataFrame
    tweets = tweet_df(response, tweet_fields)
    
    # writing Tweet DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [5]:
test = search_7("hello world", max_results=100, write_csv=True)
print(len(test))
test

100


Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,public_metrics,referenced_tweets,entities_hashtags
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1452714631210749954,RT @SaffronSalim: Hello @cricbuzz you’ve reade...,,2894075335,"[{'domain': {'id': '6', 'name': 'Sports Event'...",1452714631210749954,2021-10-25 19:12:12+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 6, 'reply_count': 0, 'like_c...","[(type, id)]",
1452714558167162884,"RT @jaeykie: @ENHYPEN_members Hello, Heeseung!...",,1433871563120652289,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1452714558167162884,2021-10-25 19:11:55+00:00,"{'annotations': [{'start': 37, 'end': 44, 'pro...",,,en,"{'retweet_count': 1, 'reply_count': 0, 'like_c...","[(type, id)]",
1452714548700618765,"RT @hiruna454: Hello, This is the first time I...",,931607234005565441,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1452714548700618765,2021-10-25 19:11:53+00:00,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",,,en,"{'retweet_count': 486, 'reply_count': 0, 'like...","[(type, id)]",
1452714531566899209,"Hello, Twitter World! https://t.co/WCNju86xyK",{'media_keys': ['3_1452714495617478656']},1452714156818407430,"[{'domain': {'id': '46', 'name': 'Brand Catego...",1452714531566899209,2021-10-25 19:11:49+00:00,"{'annotations': [{'start': 7, 'end': 13, 'prob...",,,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",,
1452714500419952646,"RT @ruggiere_l: HELLO AMERICA,\n\nCLIMATE CHAN...",,1074649896551178240,,1452714500419952646,2021-10-25 19:11:41+00:00,"{'annotations': [{'start': 22, 'end': 28, 'pro...",,,en,"{'retweet_count': 4, 'reply_count': 0, 'like_c...","[(type, id)]",
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1452710934116388864,@fastmoney_nft Hello guys 👋\n\nJust a reminder...,,1275108204431552512,,1452602086961582087,2021-10-25 18:57:31+00:00,"{'mentions': [{'start': 0, 'end': 14, 'usernam...",,2815437643,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id), (type, id)]",
1452710928290566160,"Rrrr .Hello World, Introducing BUCK #Stablecoi...",,1444983946782154758,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1452710928290566160,2021-10-25 18:57:30+00:00,"{'hashtags': [{'start': 36, 'end': 47, 'tag': ...",,,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",,"[{'start': 36, 'end': 47, 'tag': 'Stablecoin'}..."
1452710871436722177,@shbtart Hello guys 👋\n\nJust a reminder have ...,,1275108204431552512,,1452591842139295749,2021-10-25 18:57:16+00:00,"{'mentions': [{'start': 0, 'end': 8, 'username...",,1421945407614107651,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id), (type, id)]",
1452710835738984448,RT @SaffronSalim: Hello @cricbuzz you’ve reade...,,1311263530620874753,"[{'domain': {'id': '6', 'name': 'Sports Event'...",1452710835738984448,2021-10-25 18:57:07+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 6, 'reply_count': 0, 'like_c...","[(type, id)]",


## 30-Day/Full Archive Search

We can access 30-day and full archive searches without an Academic API package with a premium development environment through the Twitter API. This requires interfacing with the API v1.1, as opposed to v2 in the Recent Search.

The 30-day search can receive 250 requests and 25,000 Tweets per month, while the full archive search can receive 50 requests and 5,000 Tweets per month.

The `tweepy.models.Status` object contains a lot of data about the Tweet, such as its text, its author, and various aspects of metadata about the Tweet's creation and interactions.

In [6]:
# function to search Tweets within the past 30 days
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_30(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_30.csv"):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets from the past 30 days relevant to query
    response_1 = api_1.search_30_day(label="30day",
                                  query=f"{query} lang:en",
                                  fromDate=start_date,
                                  toDate=end_date,
                                  maxResults=max_results)
    
    # retrieving Tweet ID's to pass into API v2
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # initializing API v2 instance
    api_2 = init_api_2()
    
    # retrieving Tweet data from Tweet ID's
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    response_2 = api_2.get_tweets(tweet_ids, tweet_fields=tweet_fields)
    
    # adding Tweet data to DataFrame
    tweets = tweet_df(response_2, tweet_fields)
    
    # writing Tweet DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [7]:
test30 = search_30("hello world", max_results=100, write_csv=True)
test30

Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,public_metrics,referenced_tweets,entities_hashtags
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1452714631210749954,RT @SaffronSalim: Hello @cricbuzz you’ve reade...,,2894075335,"[{'domain': {'id': '6', 'name': 'Sports Event'...",1452714631210749954,2021-10-25 19:12:12+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 6, 'reply_count': 0, 'like_c...","[(type, id)]",
1452714558167162884,"RT @jaeykie: @ENHYPEN_members Hello, Heeseung!...",,1433871563120652289,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1452714558167162884,2021-10-25 19:11:55+00:00,"{'annotations': [{'start': 37, 'end': 44, 'pro...",,,en,"{'retweet_count': 1, 'reply_count': 0, 'like_c...","[(type, id)]",
1452714548700618765,"RT @hiruna454: Hello, This is the first time I...",,931607234005565441,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1452714548700618765,2021-10-25 19:11:53+00:00,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",,,en,"{'retweet_count': 486, 'reply_count': 0, 'like...","[(type, id)]",
1452714531566899209,"Hello, Twitter World! https://t.co/WCNju86xyK",{'media_keys': ['3_1452714495617478656']},1452714156818407430,"[{'domain': {'id': '46', 'name': 'Brand Catego...",1452714531566899209,2021-10-25 19:11:49+00:00,"{'annotations': [{'start': 7, 'end': 13, 'prob...",,,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",,
1452714500419952646,"RT @ruggiere_l: HELLO AMERICA,\n\nCLIMATE CHAN...",,1074649896551178240,,1452714500419952646,2021-10-25 19:11:41+00:00,"{'annotations': [{'start': 22, 'end': 28, 'pro...",,,en,"{'retweet_count': 4, 'reply_count': 0, 'like_c...","[(type, id)]",
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1452709430991097856,@DropYourNFT Hello guys 👋\n\nJust a reminder h...,,1275108204431552512,,1452699129759649795,2021-10-25 18:51:33+00:00,"{'mentions': [{'start': 0, 'end': 12, 'usernam...",,1373268723767869440,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id), (type, id)]",
1452709348879134722,@NFtsGrab Hello guys 👋\n\nJust a reminder have...,,1275108204431552512,,1452679349019283463,2021-10-25 18:51:13+00:00,"{'mentions': [{'start': 0, 'end': 9, 'username...",,1234798891289194498,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id), (type, id)]",
1452709250405486597,"RT @RhlPixels: #originalcharacters yes hello, ...","{'media_keys': ['3_1452683898270400519', '3_14...",1359683649369686017,,1452709250405486597,2021-10-25 18:50:49+00:00,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",,,en,"{'retweet_count': 26, 'reply_count': 0, 'like_...","[(type, id)]","[{'start': 15, 'end': 34, 'tag': 'originalchar..."
1452709149112877059,@NFTMansa Hello guys 👋\n\nJust a reminder have...,,1285389126271766529,,1452628767478894596,2021-10-25 18:50:25+00:00,"{'mentions': [{'start': 0, 'end': 9, 'username...",,98849456,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id), (type, id)]",


In [8]:
# function to search Tweets within the past 30 days
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_full(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_full.csv"):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets from the past 30 days relevant to query
    response_1 = api_1.search_full_archive(label="full",
                                           query=f"{query} lang:en",
                                           fromDate=start_date,
                                           toDate=end_date,
                                           maxResults=max_results)
    
    # retrieving Tweet ID's to pass into API v2
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # initializing API v2 instance
    api_2 = init_api_2()
    
    # retrieving Tweet data from Tweet ID's
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    response_2 = api_2.get_tweets(tweet_ids, tweet_fields=tweet_fields)
    
    # adding Tweet data to DataFrame
    tweets = tweet_df(response_2, tweet_fields)
    
    # writing Tweets DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [9]:
test_full = search_full("hello world", max_results=100, write_csv=True)
test_full

Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,public_metrics,referenced_tweets,entities_hashtags
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1452714631210749954,RT @SaffronSalim: Hello @cricbuzz you’ve reade...,,2894075335,"[{'domain': {'id': '6', 'name': 'Sports Event'...",1452714631210749954,2021-10-25 19:12:12+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 6, 'reply_count': 0, 'like_c...","[(type, id)]",
1452714558167162884,"RT @jaeykie: @ENHYPEN_members Hello, Heeseung!...",,1433871563120652289,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1452714558167162884,2021-10-25 19:11:55+00:00,"{'annotations': [{'start': 37, 'end': 44, 'pro...",,,en,"{'retweet_count': 1, 'reply_count': 0, 'like_c...","[(type, id)]",
1452714548700618765,"RT @hiruna454: Hello, This is the first time I...",,931607234005565441,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1452714548700618765,2021-10-25 19:11:53+00:00,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",,,en,"{'retweet_count': 486, 'reply_count': 0, 'like...","[(type, id)]",
1452714531566899209,"Hello, Twitter World! https://t.co/WCNju86xyK",{'media_keys': ['3_1452714495617478656']},1452714156818407430,"[{'domain': {'id': '46', 'name': 'Brand Catego...",1452714531566899209,2021-10-25 19:11:49+00:00,"{'annotations': [{'start': 7, 'end': 13, 'prob...",,,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",,
1452714500419952646,"RT @ruggiere_l: HELLO AMERICA,\n\nCLIMATE CHAN...",,1074649896551178240,,1452714500419952646,2021-10-25 19:11:41+00:00,"{'annotations': [{'start': 22, 'end': 28, 'pro...",,,en,"{'retweet_count': 4, 'reply_count': 0, 'like_c...","[(type, id)]",
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1452709430991097856,@DropYourNFT Hello guys 👋\n\nJust a reminder h...,,1275108204431552512,,1452699129759649795,2021-10-25 18:51:33+00:00,"{'mentions': [{'start': 0, 'end': 12, 'usernam...",,1373268723767869440,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id), (type, id)]",
1452709348879134722,@NFtsGrab Hello guys 👋\n\nJust a reminder have...,,1275108204431552512,,1452679349019283463,2021-10-25 18:51:13+00:00,"{'mentions': [{'start': 0, 'end': 9, 'username...",,1234798891289194498,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id), (type, id)]",
1452709250405486597,"RT @RhlPixels: #originalcharacters yes hello, ...","{'media_keys': ['3_1452683898270400519', '3_14...",1359683649369686017,,1452709250405486597,2021-10-25 18:50:49+00:00,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",,,en,"{'retweet_count': 26, 'reply_count': 0, 'like_...","[(type, id)]","[{'start': 15, 'end': 34, 'tag': 'originalchar..."
1452709149112877059,@NFTMansa Hello guys 👋\n\nJust a reminder have...,,1285389126271766529,,1452628767478894596,2021-10-25 18:50:25+00:00,"{'mentions': [{'start': 0, 'end': 9, 'username...",,98849456,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id), (type, id)]",


## Stream

A Stream is an object that can filter and sample realtime Tweets. Since it's a real-time stream, this is probably not what we're looking for in an analysis pipeline.

In [None]:
# instantiating Stream object
stream = tweepy.Stream(consumer_key, consumer_secret, access_token, access_secret)
stream

In [None]:
stream.sample(languages=["en"])