# ReThink Media Twitter API

This notebook is for the development and exploration of code for ReThink Media's Twitter API Python interface. The main goals of this notebook are:

- Search Tweets: query, date (optional)
  - Past seven days
  - Past 30 days
  - Full archive
  - Language = English
- Collect Tweets in .csv file
- Add data visualization
  - Top hashtags, keywords, influencers
  - Volume over time for queries/topics

In [1]:
# importing necessary modules
from dotenv import load_dotenv
import os
import json
import numpy as np
import pandas as pd
import tweepy

load_dotenv()

True

## Authentication

The variables below are what allow access to the Twitter API. I've defined them in a `.env` file, and I'm retrieving them with the code below. We then pass those variables in to a tweepy client in order to instantiate a Twitter API instance.

In [2]:
# retrieving environment variables
consumer_key = os.getenv("API_KEY")
consumer_secret = os.getenv("API_KEY_SECRET")
bearer_token = os.getenv("BEARER_TOKEN")
access_token = os.getenv("ACCESS_TOKEN")
access_secret = os.getenv("ACCESS_SECRET")

In [3]:
# Twitter API authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

In [4]:
# function to initialize Twitter API v1.1 instance (for 30-day and full archive search)
def init_api_1():
    
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # Twitter API authentication
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    
    # instantiating Twitter API v1.1 reference
    api_1 = tweepy.API(auth)
    
    return api_1

In [5]:
# function to initialize Twitter API v2 instance (for 7-day search)
def init_api_2():
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # instantiating Twitter API v2 reference
    api_2 = tweepy.Client(bearer_token=bearer_token,
                         consumer_key=consumer_key,
                         consumer_secret=consumer_secret,
                         access_token=access_token,
                         access_token_secret=access_secret)
    
    return api_2

## Recent Search

The search function available to us in the Standard API package restricts our search to the past seven days, without a premium API dev subscription. For searches further back in the archive, we need to subscribe to a premium API dev environment or upgrade to the Academic API package, which is given to researchers with a clear thesis or research paper goal in mind.

The query can be 512 characters maximum, and the user can specify a `start_time` and `end_time` (as `datetime` or `str` objects) within the past seven days. The user can also search for hashtags as well. The default behavior for white space is "AND" joins, e.g., hello world = hello AND world. More information about Twitter API queries can be found [in their documentation](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query).

The 7-day search can receive an unlimited number of requests and 500,000 Tweets per month.

In [6]:
# instantiating a Twitter API v2 instance
api2 = tweepy.Client(bearer_token=bearer_token,
                consumer_key=consumer_key,
                consumer_secret=consumer_secret,
                access_token=access_token,
                access_token_secret=access_secret)
api2

<tweepy.client.Client at 0x7feb501ca880>

In [7]:
# searching for "hello world" over the past seven days.
test_field = "text"
response = api2.search_recent_tweets(query="hello world lang:en", max_results=20, tweet_fields=test_field)

The `response` object is a tuple, and it consists of four items: `(data, includes, errors, meta)`.

The `data` object contains the Tweets that are retrieved, and `meta` is the metadata for those Tweets. In this reponse object, `includes` and `errors` are empty, so I'm not sure what `includes` is yet.

In [8]:
# printing Tweets
for i in range(len(response[0])):
    print(f"Tweet {i}:")
    print(response[0][i][test_field])

Tweet 0:
Hello, world!
Tweet 1:
RT @seachi_: Hello, it's I, local cat artiny waiting for the world to be torn asunder,,,, and also offer art while I am waiting.

#AtinyTal…
Tweet 2:
Hello World (1634922436)
Tweet 3:
@AliceMissWoW1 @LatexLadies @BdsmMinds @Oostwalum @DomainDomina @Femdom__World Hello Ladies
Tweet 4:
Hello World (1634922418)
Tweet 5:
RT @seachi_: Hello, it's I, local cat artiny waiting for the world to be torn asunder,,,, and also offer art while I am waiting.

#AtinyTal…
Tweet 6:
RT @seachi_: Hello, it's I, local cat artiny waiting for the world to be torn asunder,,,, and also offer art while I am waiting.

#AtinyTal…
Tweet 7:
@AlecBaldwin A real drama. What Alec feels now is immeasurable. Who failed, or was it intentional? Hello justice. Please (!) clear up. Is it ever possible to find out? I wish all the energy in the world for the beloved ones and Alec!!!


In [9]:
# printing metadata for Tweets in response
response[3]

{'newest_id': '1451600535921045506',
 'oldest_id': '1451600417046073367',
 'result_count': 8,
 'next_token': 'b26v89c19zqg8o3fpdv67sanfck6ichpmqutz5hphebnh'}

In [10]:
# retrieving full text of retweeted Tweet
retweet_id = response[0][0].referenced_tweets[0].id
retweet = api2.get_tweet(retweet_id)
retweet[0].text

TypeError: 'NoneType' object is not subscriptable

In [65]:
response[0][0]['author_id']

In [89]:
# function to retrieve Tweets from the past 7 days relevant to a query
def search_7(query, start_date=None, end_date=None, max_results=20):
    
    # initializing API v1.1 instance
    api_2 = init_api_2()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets between start_date and end_date relevant to query
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    response = api_2.search_recent_tweets(query=f"({query}) lang:en",
                                         start_time=start_date,
                                         end_time=end_date,
                                         max_results=max_results,
                                         tweet_fields=tweet_fields)
    
    # adding Tweet data to DataFrame
    import pandas as pd
    
    tweets = pd.DataFrame(columns=tweet_fields+['entities_hashtags'])
    tweets.index.name = "Tweet ID"
    
    # going through Tweets in response object and parsing data in dict
    for i in range(len(response[0])):
        tweet = response[0][i]
        tweet_id = tweet.id
        tweet_data = {}
        for field in tweet_fields:
            if tweet[field]:
                tweet_data[field] = tweet[field]
                # extracting hashtags into separate column
                if field == 'entities':
                    try:
                        tweet_data['entities_hashtags'] = tweet[field]['hashtags']
                    except KeyError:
                        tweet_data['entities_hashtags'] = None
            else:
                tweet_data[field] = None
        # adding Tweet data to DataFrame
        tweets.loc[tweet_id] = tweet_data
    return tweets

In [91]:
test = search_7("hello world", max_results=100)
print(len(test))
test

100


Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,public_metrics,referenced_tweets,entities_hashtags
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1451639387721281539,"RT @SBphiloz4: Bon_Appétit♡S - Hello, Happy Wo...",,1065704111218335744,"[{'domain': {'id': '130', 'name': 'Multimedia ...",1451639387721281539,2021-10-22 19:59:34+00:00,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",,,en,"{'retweet_count': 99, 'reply_count': 0, 'like_...","[(type, id)]",
1451639359493623811,RT @hobihongie: Hello ♡\nI'm Ari and my missio...,"{'media_keys': ['3_1451620782766018563', '3_14...",921787945807474688,"[{'domain': {'id': '10', 'name': 'Person', 'de...",1451639359493623811,2021-10-22 19:59:28+00:00,"{'hashtags': [{'start': 78, 'end': 93, 'tag': ...",,,en,"{'retweet_count': 7, 'reply_count': 0, 'like_c...","[(type, id)]","[{'start': 78, 'end': 93, 'tag': 'AtinyTalentD..."
1451639208888635399,RT @JWallet_: 📈⚡️🥳🧙🙌 Hello (#crypto) world! 💎🔥...,{'media_keys': ['3_1451511131701448706']},1392159703615016963,"[{'domain': {'id': '66', 'name': 'Interests an...",1451639208888635399,2021-10-22 19:58:52+00:00,"{'hashtags': [{'start': 28, 'end': 35, 'tag': ...",,,en,"{'retweet_count': 31, 'reply_count': 0, 'like_...","[(type, id)]","[{'start': 28, 'end': 35, 'tag': 'crypto'}]"
1451639184972787717,Hello World! Here's an interesting post! https...,,16986001,,1451639184972787717,2021-10-22 19:58:46+00:00,"{'urls': [{'start': 41, 'end': 64, 'url': 'htt...",,,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",,
1451639168883429377,"@CryptoTI @origintrailclub @origin_trail Bye, ...",,1373019702927187968,,1447781753855229952,2021-10-22 19:58:42+00:00,"{'cashtags': [{'start': 125, 'end': 130, 'tag'...",,863747523365683200,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id)]",
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1451632749564022792,"@thetitanborn I am gonna hit the wall\n"" a boy...",{'media_keys': ['3_1451632746208583680']},1417766739048288256,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451594504411168770,2021-10-22 19:33:12+00:00,"{'urls': [{'start': 257, 'end': 280, 'url': 'h...",,1414858240752500740,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id)]",
1451632726285725698,"@nftjuno I am gonna hit the wall\n"" a boy leav...",{'media_keys': ['3_1451632722766712833']},1417766739048288256,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451608598530904065,2021-10-22 19:33:06+00:00,"{'urls': [{'start': 252, 'end': 275, 'url': 'h...",,1444655950116401158,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id)]",
1451632699232370694,"@cherrygirl_mel I am gonna hit the wall\n"" a b...",{'media_keys': ['3_1451632695977598985']},1417766739048288256,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451596792475226113,2021-10-22 19:33:00+00:00,"{'urls': [{'start': 259, 'end': 282, 'url': 'h...",,80468034,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id)]",
1451632653627654144,Hello world,,895000604908462081,,1451632653627654144,2021-10-22 19:32:49+00:00,,,,en,"{'retweet_count': 0, 'reply_count': 1, 'like_c...",,


In [17]:
test.columns

Index(['text', 'attachments', 'author_id', 'context_annotations',
       'conversation_id', 'created_at', 'entities', 'geo',
       'in_reply_to_user_id', 'lang', 'public_metrics', 'referenced_tweets'],
      dtype='object')

In [80]:
try:
    print(test.iloc[16]["entities"]["hashtags"])
except:
    print("None")

[{'start': 182, 'end': 201, 'tag': 'INVASIONINEVITABLE'}, {'start': 257, 'end': 267, 'tag': 'MarvinINU'}, {'start': 268, 'end': 275, 'tag': 'MARVIN'}]


## 30-Day/Full Archive Search

We can access 30-day and full archive searches without an Academic API package with a premium development environment through the Twitter API. This requires interfacing with the API v1.1, as opposed to v2 in the Recent Search.

The 30-day search can receive 250 requests and 25,000 Tweets per month, while the full archive search can receive 50 requests and 5,000 Tweets per month.

In [21]:
# initializing API v1.1
api1 = tweepy.API(auth)

In [39]:
response_30 = api1.search_30_day(label="30day", query="hello world lang:en", maxResults=20)

In [47]:
tweet_ids = [tweet._json['id'] for tweet in response_30]
tweet_ids

[1451611411759595520,
 1451611286295490564,
 1451611166778830865,
 1451611152337842177,
 1451611133790593024,
 1451611127486550030,
 1451611076663988228,
 1451611002110369793,
 1451610990035013677,
 1451610645636472834,
 1451610637130366980,
 1451610607162167305,
 1451610581279166465,
 1451610443794026499,
 1451610438668484612,
 1451610366912327691,
 1451610363758202880,
 1451610336797151233,
 1451610300852015111,
 1451610259701727235]

In [48]:
type(response_30[0])

tweepy.models.Status

The `tweepy.models.Status` object contains a lot of data about the Tweet, such as its text, its author, and various aspects of metadata about the Tweet's creation and interactions.

In [13]:
response_30[0]._json

{'created_at': 'Wed Oct 20 17:06:26 +0000 2021',
 'id': 1450871038364037120,
 'id_str': '1450871038364037120',
 'text': "RT @SaitamaDude: Hello #SaitamaWolfPack!!! \n\nI've made a youtube channel with a quick and easy video on how to purchase #Saitama easily bef…",
 'source': '<a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>',
 'truncated': False,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'in_reply_to_screen_name': None,
 'user': {'id': 1397553132,
  'id_str': '1397553132',
  'name': 'koko kaye',
  'screen_name': 'koko_kaye',
  'location': None,
  'url': None,
  'description': '“You are never too old to set another goal, or to dream a new dream.” C.S. Lewis\n\nMy tweets/retweets are not financial advice. \nDYOR | Join the #SaitamaWolfPack',
  'translator_type': 'none',
  'protected': False,
  'verified': False,
  'followers_count': 870,
  'friends_count': 1046,
  'listed_co

In [151]:
for i in response_30[0]._json:
    print(i)

created_at
id
id_str
text
source
truncated
in_reply_to_status_id
in_reply_to_status_id_str
in_reply_to_user_id
in_reply_to_user_id_str
in_reply_to_screen_name
user
geo
coordinates
place
contributors
retweeted_status
is_quote_status
quote_count
reply_count
retweet_count
favorite_count
entities
favorited
retweeted
filter_level
lang
matching_rules


In [13]:
# obsolete version of the search function
def search_30(query, start_date=None, end_date=None, max_results=20):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets from the past 30 days relevant to query
    response = api_1.search_30_day(label="30day",
                                  query=f"{query} lang:en",
                                  fromDate=start_date,
                                  toDate=end_date,
                                  maxResults=max_results)
    
    # creating DataFrame of Tweets
    import pandas as pd
    
    tweet_fields = list(response[0]._json.keys())
    tweet_fields.remove('id')
    tweet_fields.remove('id_str')
    tweets = pd.DataFrame(columns=tweet_fields)
    tweets.index.name = "Tweet ID"
    
    for i in range(len(response)):
        tweet = response[i]

        # retrieving and formatting JSON of Tweet data for DataFrame
        tweet_data = tweet._json
        tweet_id = tweet_data['id']
        del tweet_data['id']
        del tweet_data['id_str']
        
        tweets.loc[tweet_id] = tweet_data
    
    return tweets

In [92]:
# function to search Tweets within the past 30 days
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_30(query, start_date=None, end_date=None, max_results=20):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets from the past 30 days relevant to query
    response_1 = api_1.search_30_day(label="30day",
                                  query=f"{query} lang:en",
                                  fromDate=start_date,
                                  toDate=end_date,
                                  maxResults=max_results)
    
    # retrieving Tweet ID's to pass into API v2
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # initializing API v2 instance
    api_2 = init_api_2()
    
    # retrieving Tweet data from Tweet ID's
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    response_2 = api_2.get_tweets(tweet_ids, tweet_fields=tweet_fields)
    
    # adding Tweet data to DataFrame
    import pandas as pd
    
    tweets = pd.DataFrame(columns=tweet_fields+['entities_hashtags'])
    tweets.index.name = "Tweet ID"
    for i in range(len(response_2[0])):
        tweet = response_2[0][i]
        tweet_id = tweet.id
        tweet_data = {}
        for field in tweet_fields:
            if tweet[field]:
                tweet_data[field] = tweet[field]
                if field == 'entities':
                    try:
                        tweet_data['entities_hashtags'] = tweet[field]['hashtags']
                    except KeyError:
                        tweet_data['entities_hashtags'] = None
            else:
                tweet_data[field] = None
        tweets.loc[tweet_id] = tweet_data
    return tweets

In [93]:
test30 = search_30("hello world", max_results=100)
test30

Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,public_metrics,referenced_tweets,entities_hashtags
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1451639725521969152,RT @EBOY_PJW: hello! this is a new account #EB...,,1443908396760002564,,1451639725521969152,2021-10-22 20:00:55+00:00,"{'hashtags': [{'start': 43, 'end': 56, 'tag': ...",,,en,"{'retweet_count': 130, 'reply_count': 0, 'like...","[(type, id)]","[{'start': 43, 'end': 56, 'tag': 'EBOYJEONGWOO..."
1451639718832222208,RT @CislaArmy: Hello @elonmusk &amp; @dogecoi...,,1436611276180963329,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451639718832222208,2021-10-22 20:00:53+00:00,"{'hashtags': [{'start': 128, 'end': 141, 'tag'...",,,en,"{'retweet_count': 9, 'reply_count': 0, 'like_c...","[(type, id)]","[{'start': 128, 'end': 141, 'tag': 'CryptoIsla..."
1451639717720690699,RT @LibardoIsaza: Whistle_anthem\n\n// Class d...,,1192586564188168193,,1451639717720690699,2021-10-22 20:00:53+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 7, 'reply_count': 0, 'like_c...","[(type, id)]",
1451639710548430851,RT @LibardoIsaza: Whistle_anthem\n\n// Class d...,,1398088887763939328,,1451639710548430851,2021-10-22 20:00:51+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 7, 'reply_count': 0, 'like_c...","[(type, id)]",
1451639681905463305,RT @LibardoIsaza: Whistle_anthem\n\n// Class d...,,1419122738216771587,,1451639681905463305,2021-10-22 20:00:44+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 7, 'reply_count': 0, 'like_c...","[(type, id)]",
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1451632111660769290,"@ShillCyber I am gonna hit the wall\n"" a boy l...",{'media_keys': ['3_1451632105901932544']},1417766739048288256,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451592348157165568,2021-10-22 19:30:40+00:00,"{'mentions': [{'start': 0, 'end': 11, 'usernam...",,1400274373102030849,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id)]",
1451632105537032194,"RT @MantisShrimpp: Hello world, this is Manti...",,1390822780837539844,"[{'domain': {'id': '66', 'name': 'Interests an...",1451632105537032194,2021-10-22 19:30:38+00:00,"{'mentions': [{'start': 3, 'end': 17, 'usernam...",,,en,"{'retweet_count': 5, 'reply_count': 0, 'like_c...","[(type, id)]",
1451632084532043784,"@NFT_French I am gonna hit the wall\n"" a boy l...",{'media_keys': ['3_1451632081210122242']},1417766739048288256,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451618169785827335,2021-10-22 19:30:33+00:00,"{'mentions': [{'start': 0, 'end': 11, 'usernam...",,1427282507867754496,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id)]",
1451632084506877958,RT @seungdduk: hello to seungmin being feature...,,1386405751443316738,,1451632084506877958,2021-10-22 19:30:33+00:00,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",,,en,"{'retweet_count': 1105, 'reply_count': 0, 'lik...","[(type, id)]",


In [38]:
tweet_fields = ['text', 'attachments', 'author_id', 'context_annotations',
                'conversation_id', 'created_at', 'entities', 'geo',
                'in_reply_to_user_id', 'lang', 'public_metrics', 'referenced_tweets']
tweet_ids = test30.index.tolist()
api_2 = init_api_2()
response = api_2.get_tweets(tweet_ids, tweet_fields=tweet_fields)
response[0][1]['created_at']

datetime.datetime(2021, 10, 22, 17, 41, 41, tzinfo=datetime.timezone.utc)

In [150]:
test30["entities"].iloc[0]['hashtags'][0]['text']

'thecollective'

In [14]:
response_full = api1.search_full_archive(label="full", query="hello world lang:en", maxResults=20)

In [15]:
response_full[0]._json

{'created_at': 'Wed Oct 20 17:06:38 +0000 2021',
 'id': 1450871091174510593,
 'id_str': '1450871091174510593',
 'text': 'Hello World (1634749599)',
 'source': '<a href="https://www.keysight.com/" rel="nofollow">Nemo Outdoor</a>',
 'truncated': False,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'in_reply_to_screen_name': None,
 'user': {'id': 930007300525510657,
  'id_str': '930007300525510657',
  'name': 'Bench Mark',
  'screen_name': 'bnchmrk415',
  'location': None,
  'url': None,
  'description': None,
  'translator_type': 'none',
  'protected': False,
  'verified': False,
  'followers_count': 1,
  'friends_count': 0,
  'listed_count': 0,
  'favourites_count': 0,
  'statuses_count': 55931,
  'created_at': 'Mon Nov 13 09:39:53 +0000 2017',
  'utc_offset': None,
  'time_zone': None,
  'geo_enabled': False,
  'lang': None,
  'contributors_enabled': False,
  'is_translator': False,
  'profile_backgr

In [15]:
# obsolete version of the function
def search_full(query, start_date=None, end_date=None, max_results=20):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets from the past 30 days relevant to query
    response = api_1.search_full_archive(label="full",
                                         query=f"{query} lang:en",
                                         fromDate=start_date,
                                         toDate=end_date,
                                         maxResults=max_results)
    
    # creating DataFrame of Tweets
    import pandas as pd
    
    tweet_fields = list(response[0]._json.keys())
    tweet_fields.remove('id')
    tweet_fields.remove('id_str')
    tweets = pd.DataFrame(columns=tweet_fields)
    tweets.index.name = "Tweet ID"
    
    for i in range(len(response)):
        tweet = response[i]

        # retrieving and formatting JSON of Tweet data for DataFrame
        tweet_data = tweet._json
        tweet_id = tweet_data['id']
        del tweet_data['id']
        del tweet_data['id_str']
        
        tweets.loc[tweet_id] = tweet_data
    
    return tweets

In [94]:
# function to search Tweets within the past 30 days
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_full(query, start_date=None, end_date=None, max_results=20):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets from the past 30 days relevant to query
    response_1 = api_1.search_full_archive(label="full",
                                           query=f"{query} lang:en",
                                           fromDate=start_date,
                                           toDate=end_date,
                                           maxResults=max_results)
    
    # retrieving Tweet ID's to pass into API v2
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # initializing API v2 instance
    api_2 = init_api_2()
    
    # retrieving Tweet data from Tweet ID's
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    response_2 = api_2.get_tweets(tweet_ids, tweet_fields=tweet_fields)
    
    # adding Tweet data to DataFrame
    import pandas as pd
    
    tweets = pd.DataFrame(columns=tweet_fields+['entities_hashtags'])
    tweets.index.name = "Tweet ID"
    for i in range(len(response_2[0])):
        tweet = response_2[0][i]
        tweet_id = tweet.id
        tweet_data = {}
        for field in tweet_fields:
            if tweet[field]:
                tweet_data[field] = tweet[field]
                if field == 'entities':
                    try:
                        tweet_data['entities_hashtags'] = tweet[field]['hashtags']
                    except KeyError:
                        tweet_data['entities_hashtags'] = None
            else:
                tweet_data[field] = None
        tweets.loc[tweet_id] = tweet_data
    return tweets

In [95]:
test_full = search_full("hello world", max_results=100)
test_full

Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,public_metrics,referenced_tweets,entities_hashtags
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1451639940668805120,RT @LibardoIsaza: Whistle_anthem\n\n// Class d...,,1423163449052471296,,1451639940668805120,2021-10-22 20:01:46+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 17, 'reply_count': 0, 'like_...","[(type, id)]",
1451639940043915272,RT @LibardoIsaza: Whistle_anthem\n\n// Class d...,,1076302066245685248,,1451639940043915272,2021-10-22 20:01:46+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 17, 'reply_count': 0, 'like_...","[(type, id)]",
1451639936822693894,RT @REI: In the sixth episode of REI's new pod...,,1215411705897005062,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451639936822693894,2021-10-22 20:01:45+00:00,"{'mentions': [{'start': 3, 'end': 7, 'username...",,,en,"{'retweet_count': 1, 'reply_count': 0, 'like_c...","[(type, id)]",
1451639922536890370,RT @LibardoIsaza: Whistle_anthem\n\n// Class d...,,1345993097981943808,,1451639922536890370,2021-10-22 20:01:42+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 17, 'reply_count': 0, 'like_...","[(type, id)]",
1451639918795575297,RT @LibardoIsaza: Whistle_anthem\n\n// Class d...,,1407497395684917254,,1451639918795575297,2021-10-22 20:01:41+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"{'retweet_count': 17, 'reply_count': 0, 'like_...","[(type, id)]",
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1451632524065533956,RT @seungdduk: hello to seungmin being feature...,,1356893873490911234,,1451632524065533956,2021-10-22 19:32:18+00:00,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",,,en,"{'retweet_count': 1105, 'reply_count': 0, 'lik...","[(type, id)]",
1451632414191673345,"@JjCurtis6 I am gonna hit the wall\n"" a boy le...",{'media_keys': ['3_1451632410693705731']},1417766739048288256,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451577748003201024,2021-10-22 19:31:52+00:00,"{'mentions': [{'start': 0, 'end': 10, 'usernam...",,1434491472628305922,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id)]",
1451632382235324423,"@NFTCompanyy I am gonna hit the wall\n"" a boy ...",{'media_keys': ['3_1451632379072745476']},1417766739048288256,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451588157028409361,2021-10-22 19:31:44+00:00,"{'mentions': [{'start': 0, 'end': 12, 'usernam...",,1286003964287295488,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id)]",
1451632348169125890,"@MoodyOwlNFT I am gonna hit the wall\n"" a boy ...",{'media_keys': ['3_1451632344968974338']},1417766739048288256,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1451579639684599809,2021-10-22 19:31:36+00:00,"{'mentions': [{'start': 0, 'end': 12, 'usernam...",,927499539682070528,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...","[(type, id)]",


In [97]:
test_full.entities_hashtags
for i in test_full.entities_hashtags:
    print(i)

None
None
None
None
None
None
None
None
None
None
[{'start': 43, 'end': 56, 'tag': 'EBOYJEONGWOO'}, {'start': 70, 'end': 83, 'tag': 'PARKJEONGWOO'}]
[{'start': 128, 'end': 141, 'tag': 'CryptoIsland'}]
None
None
None
None
None
None
None
None
None
[{'start': 78, 'end': 93, 'tag': 'AtinyTalentDay'}]
[{'start': 28, 'end': 35, 'tag': 'crypto'}]
None
None
None
None
[{'start': 128, 'end': 141, 'tag': 'CryptoIsland'}]
None
None
[{'start': 20, 'end': 40, 'tag': 'SB19xATINAnnivMonth'}]
[{'start': 128, 'end': 141, 'tag': 'CryptoIsland'}]
None
[{'start': 28, 'end': 35, 'tag': 'crypto'}]
None
[{'start': 43, 'end': 56, 'tag': 'EBOYJEONGWOO'}, {'start': 70, 'end': 83, 'tag': 'PARKJEONGWOO'}]
None
None
None
[{'start': 55, 'end': 62, 'tag': 'VTuber'}]
[{'start': 141, 'end': 151, 'tag': 'ponylandh'}, {'start': 152, 'end': 163, 'tag': 'digitalart'}, {'start': 164, 'end': 177, 'tag': 'illustration'}, {'start': 178, 'end': 192, 'tag': 'genshinimpact'}, {'start': 193, 'end': 199, 'tag': 'kaeya'}, {'start': 

## Stream

A Stream is an object that can filter and sample realtime Tweets.

In [11]:
# instantiating Stream object
stream = tweepy.Stream(consumer_key, consumer_secret, access_token, access_secret)
stream

<tweepy.streaming.Stream at 0x7fc4d6038130>

In [None]:
stream.sample(languages=["en"])