# ReThink Media Twitter API

This notebook is for the development and exploration of code for ReThink Media's Twitter API Python interface. The main goals of this notebook are:

- Search Tweets: query, date (optional)
  - Past seven days
  - Past 30 days
  - Full archive
  - Language = English
- Collect Tweets in .csv file
- Add data visualization
  - Top hashtags, keywords, influencers
  - Volume over time for queries/topics

In [None]:
# importing necessary modules
from dotenv import load_dotenv
import os
import json
import numpy as np
import pandas as pd
import tweepy

load_dotenv()

## Utility Functions

Functions for general use across the different analysis functions within the notebook.

In [1]:
# function to parse Twitter API v2 response into a DataFrame of Tweet data
def tweet_df(df, response, tweet_fields):
    
    # looping through each Tweet in response, parsing data
    for i in range(len(response[0])):
        tweet = response[0][i]
        tweet_id = tweet.id
        tweet_data = {}
        for field in tweet_fields:
            if tweet[field]:
                tweet_data[field] = tweet[field]
                
                # extracting hashtags from "entities" field and adding it as its own column
                if field == "entities":
                    try:
                        hashtag_data = tweet[field]['hashtags']
                        hashtags = [hashtag['tag'] for hashtag in hashtag_data]
                        tweet_data['entities_hashtags'] = hashtags
                    except KeyError:
                        tweet_data['entities_hashtags'] = None
                
                # separating metrics from "public_metrics" field and adding them as their own column
                if field == "public_metrics":
                    metrics = list(tweet[field].keys())
                    for metric in metrics:
                        tweet_data[metric] = tweet[field][metric]
                
            else:
                tweet_data[field] = None
                if field == "entities":
                    tweet_data['entities_hashtags'] = None
        
        df.loc[tweet_id] = tweet_data
    
    return df

In [2]:
# function to add follower counts to DataFrame of Tweet data
# designed to be called after the Tweet data has been parsed with tweet_df()
def author_data(api_2, df):
    num_users = 0
    author_ids = df["author_id"].unique().tolist()
    users = []
    while num_users < len(author_ids):
        
        # slicing tweet_ids since API v2 get_users only takes max 100 ID's per request
        try:
            slice_ids = author_ids[num_users:num_users+100]
        except IndexError:
            slice_ids = author_ids[num_users:]
        
        # retrieving user data through API v2, adding responses to users list
        user_fields = ["public_metrics", "verified"]
        response = api_2.get_users(ids=slice_ids, usernames=None, user_fields=user_fields)
        users.extend(response[0])
        num_users += len(response[0])
    
    # mapping author_id to follower counts and verified status, adding to DataFrame
    followers_count = {user['id']: user['public_metrics']['followers_count'] for user in users}
    verified = {user['id']: user['verified'] for user in users}
    df["author_followers_count"] = df["author_id"].map(followers_count)
    df["author_verified"] = df["author_id"].map(verified)
    
    return df

## Authentication

The variables below are what allow access to the Twitter API. I've defined them in a `.env` file, and I'm retrieving them with the code below. We then pass those variables in to a tweepy client in order to instantiate a Twitter API instance.

In [None]:
# retrieving environment variables
consumer_key = os.getenv("API_KEY")
consumer_secret = os.getenv("API_KEY_SECRET")
bearer_token = os.getenv("BEARER_TOKEN")
access_token = os.getenv("ACCESS_TOKEN")
access_secret = os.getenv("ACCESS_SECRET")

In [None]:
# Twitter API authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

In [3]:
# function to initialize Twitter API v1.1 instance (for 30-day and full archive search)
def init_api_1():
    
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # Twitter API authentication
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    
    # instantiating Twitter API v1.1 reference
    api_1 = tweepy.API(auth)
    
    return api_1

In [4]:
# function to initialize Twitter API v2 instance (for 7-day search)
def init_api_2():
    # importing necessary modules and loading .env file
    from dotenv import load_dotenv
    import os
    import tweepy
    load_dotenv()
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # instantiating Twitter API v2 reference
    api_2 = tweepy.Client(bearer_token=bearer_token,
                         consumer_key=consumer_key,
                         consumer_secret=consumer_secret,
                         access_token=access_token,
                         access_token_secret=access_secret)
    
    return api_2

## Recent Search

The search function available to us in the Standard API package restricts our search to the past seven days, without a premium API dev subscription. For searches further back in the archive, we need to subscribe to a premium API dev environment or upgrade to the Academic API package, which is given to researchers with a clear thesis or research paper goal in mind.

The query can be 512 characters maximum, and the user can specify a `start_time` and `end_time` (as `datetime` or `str` objects) within the past seven days. The user can also search for hashtags as well. The default behavior for white space is "AND" joins, e.g., hello world = hello AND world. More information about Twitter API queries can be found [in their documentation](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query).

The 7-day search can receive an unlimited number of requests and 500,000 Tweets per month.

The 7-day search has a query character limit of 512 characters.

The `response` object is a tuple, and it consists of four items: `(data, includes, errors, meta)`.

The `data` object contains the Tweets that are retrieved, and `meta` is the metadata for those Tweets. In this reponse object, `includes` and `errors` are empty, so I'm not sure what `includes` is yet.

In [5]:
# function to retrieve Tweets from the past 7 days relevant to a query
def search_7(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_7.csv"):
    
    # initializing API v1.1 instance
    api_2 = init_api_2()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # setting Tweet data to be included in response
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    
    # initializing variables for API calls and DataFrame for Tweet data
    import pandas as pd
    next_token = None
    num_results = 0
    tweets = pd.DataFrame(columns=tweet_fields+
                          ['entities_hashtags','retweet_count','reply_count','like_count','quote_count'])
    tweets.index.name = "Tweet ID"
    
    # aggregating multiple pages of query results
    import tweepy
    paginator_results = tweepy.Paginator(api_2.search_recent_tweets,
                                 query=f"{query} lang:en",
                                 start_time=start_date,
                                 end_time=end_date,
                                 tweet_fields=tweet_fields
                                ).flatten(max_results)
    
    # collecting tweets in a format acceptable by tweet_df()
    response = [[tweet for tweet in paginator_results]]
        
    # adding Tweet data to DataFrame
    tweets = tweet_df(tweets, response, tweet_fields)
    num_results = len(tweets)

    # dropping "public_metrics" since all the values are unpacked, adding "total_engagements"
    tweets.drop('public_metrics', axis=1, inplace=True)
    total_engagements = tweets["retweet_count"] + tweets["reply_count"] + tweets["like_count"] + tweets["quote_count"]
    tweets["total_engagements"] = total_engagements
    
    # adding follower count and verified status to DataFrame for influence metrics
    tweets = author_data(api_2, tweets)
    
    # writing Tweet DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [6]:
test = search_7("hello world", max_results=212, write_csv=True)
test

Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,referenced_tweets,entities_hashtags,retweet_count,reply_count,like_count,quote_count,total_engagements,author_followers_count,author_verified
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1453463653534076937,RT @KriptoBattle: Hello World ! \nWelcome to t...,,1407287050211053568,"[{'domain': {'id': '30', 'name': 'Entities [En...",1453463653534076937,2021-10-27 20:48:33+00:00,"{'mentions': [{'start': 3, 'end': 16, 'usernam...",,,en,"[(type, id)]",,5536,0,0,0,5536,86,False
1453463597556830221,@BitcoinMagazine Hello shibes all over the wor...,,1453384998225068034,"[{'domain': {'id': '65', 'name': 'Interests an...",1453463073843515398,2021-10-27 20:48:20+00:00,"{'mentions': [{'start': 0, 'end': 16, 'usernam...",,361289499,en,"[(type, id), (type, id)]",,0,0,0,0,0,7,False
1453463581580840969,RT @12emtm: Hello I'm Em Blavk! \nThe Designer...,,1881571092,,1453463581580840969,2021-10-27 20:48:16+00:00,"{'mentions': [{'start': 3, 'end': 10, 'usernam...",,,en,"[(type, id)]",,130,0,0,0,130,976,False
1453463498583908356,Hello World.,,1453461165472993280,,1453463498583908356,2021-10-27 20:47:56+00:00,,,,en,,,0,0,0,0,0,1,False
1453463470784094218,"RT @charliecorgii: Hello world, it is I! Ein f...",,2925578518,"[{'domain': {'id': '3', 'name': 'TV Shows', 'd...",1453463470784094218,2021-10-27 20:47:50+00:00,"{'mentions': [{'start': 3, 'end': 17, 'usernam...",,,en,"[(type, id)]",[cowboybebop],3,0,0,0,3,290,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1453449031758368770,"RT @IzaMidoriya: ""Oh uhm hello my name is Iza ...",,1247438424472268801,,1453449031758368770,2021-10-27 19:50:27+00:00,"{'annotations': [{'start': 42, 'end': 53, 'pro...",,,en,"[(type, id)]",,55,0,0,0,55,998,False
1453449016251858948,Hello world! Say hi to Everly Rose Vargas. 🌹 h...,{'media_keys': ['3_1453449006860832769']},1103932368,,1453449016251858948,2021-10-27 19:50:23+00:00,"{'annotations': [{'start': 23, 'end': 40, 'pro...",,,en,,,4,54,288,0,346,29288,True
1453448938917294080,Bring everyone and everything together across ...,,22032222,,1453448938917294080,2021-10-27 19:50:05+00:00,"{'hashtags': [{'start': 136, 'end': 155, 'tag'...",,,en,,[HelloToRingCentral],0,0,0,0,0,145,False
1453448820365398023,RT @CREESCEENDOO: #JohnWick\nHello world meet ...,,1259362451298156544,"[{'domain': {'id': '130', 'name': 'Multimedia ...",1453448820365398023,2021-10-27 19:49:37+00:00,"{'hashtags': [{'start': 18, 'end': 27, 'tag': ...",,,en,"[(type, id)]",[JohnWick],1,0,0,0,1,207,False


## 30-Day/Full Archive Search

We can access 30-day and full archive searches without an Academic API package with a premium development environment through the Twitter API. This requires interfacing with the API v1.1, as opposed to v2 in the Recent Search.

The 30-day search can receive 250 requests and 25,000 Tweets per month, while the full archive search can receive 50 requests and 5,000 Tweets per month.

Both 30-day and full archive searches have a query character limit of 256 characters.

The `tweepy.models.Status` object contains a lot of data about the Tweet, such as its text, its author, and various aspects of metadata about the Tweet's creation and interactions.

In [7]:
# function to search Tweets within the past 30 days
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_30(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_30.csv"):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets from the past 30 days relevant to query using tweepy's pagination function
    import tweepy
    response_1 = tweepy.Cursor(api_1.search_30_day,
                               label="30day",
                               query=f"{query} lang:en",
                               fromDate=start_date,
                               toDate=end_date
                              ).items(max_results)
    
    # gathering Tweet ID's in a list
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # setting Tweet data to be included in response
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    
    # initializing variables for API calls and DataFrame for Tweet data
    import pandas as pd
    num_results = 0
    tweets = pd.DataFrame(columns=tweet_fields+
                          ['entities_hashtags','retweet_count','reply_count','like_count','quote_count'])    
    tweets.index.name = "Tweet ID"
    
    # loop to retrieve Tweets from ID's through API v2, 100 at a time
    api_2 = init_api_2()
    while num_results < max_results:
        # slicing tweet_ids since API v2 get_tweets only takes max 100 ID's per request
        try:
            slice_ids = tweet_ids[num_results:num_results+100]
        except IndexError:
            slice_ids = tweet_ids[num_results:]
        
        # retrieving Tweet data from API v2 and adding to DataFrame
        response_2 = api_2.get_tweets(slice_ids, tweet_fields=tweet_fields)
        tweets = tweet_df(tweets, response_2, tweet_fields)
        num_results = len(tweets)
    
    # dropping "public_metrics" since all the values are unpacked, adding "total_engagements"
    tweets.drop('public_metrics', axis=1, inplace=True)
    total_engagements = tweets["retweet_count"] + tweets["reply_count"] + tweets["like_count"] + tweets["quote_count"]
    tweets["total_engagements"] = total_engagements
    
    # adding follower count and verified status to DataFrame for influence metrics
    tweets = author_data(api_2, tweets)
    
    # writing Tweet DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [8]:
test30 = search_30("hello world", max_results=150, write_csv=True)
test30

Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,referenced_tweets,entities_hashtags,retweet_count,reply_count,like_count,quote_count,total_engagements,author_followers_count,author_verified
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1453464115821830145,"RT @rifu_: @Itumade3 [EN]\nHello, Happy World!...",{'media_keys': ['3_1452610405071671297']},1226040051102076928,"[{'domain': {'id': '130', 'name': 'Multimedia ...",1453464115821830145,2021-10-27 20:50:23+00:00,"{'urls': [{'start': 60, 'end': 83, 'url': 'htt...",,,en,"[(type, id)]",,56,0,0,0,56,35,False
1453464015385075719,"RT @FionnOnFire: HELLO\n\nI need your help, Tw...",,1444972506796871683,"[{'domain': {'id': '46', 'name': 'Brand Catego...",1453464015385075719,2021-10-27 20:49:59+00:00,"{'hashtags': [{'start': 108, 'end': 119, 'tag'...",,,en,"[(type, id)]",[Worlds2021],13,0,0,0,13,21,False
1453463939820441603,RT @MrGerrenalist: Hello Journo World! @theGri...,,1167253456710774784,,1453463939820441603,2021-10-27 20:49:41+00:00,"{'mentions': [{'start': 3, 'end': 17, 'usernam...",,,en,"[(type, id)]",,55,0,0,0,55,869,False
1453463653534076937,RT @KriptoBattle: Hello World ! \nWelcome to t...,,1407287050211053568,"[{'domain': {'id': '30', 'name': 'Entities [En...",1453463653534076937,2021-10-27 20:48:33+00:00,"{'annotations': [{'start': 62, 'end': 81, 'pro...",,,en,"[(type, id)]",,5536,0,0,0,5536,86,False
1453463597556830221,@BitcoinMagazine Hello shibes all over the wor...,,1453384998225068034,"[{'domain': {'id': '65', 'name': 'Interests an...",1453463073843515398,2021-10-27 20:48:20+00:00,"{'urls': [{'start': 89, 'end': 112, 'url': 'ht...",,361289499,en,"[(type, id), (type, id)]",,0,0,0,0,0,7,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1453451993389883396,RT @universaltheory: Hello World!!! \n\nRememb...,,1447533838318686208,,1453451993389883396,2021-10-27 20:02:13+00:00,"{'mentions': [{'start': 3, 'end': 19, 'usernam...",,,en,"[(type, id)]",,5,0,0,0,5,65,False
1453451981683560456,RT @universaltheory: Hello World!!! \n\nRememb...,,1431187748073902083,,1453451981683560456,2021-10-27 20:02:10+00:00,"{'mentions': [{'start': 3, 'end': 19, 'usernam...",,,en,"[(type, id)]",,5,0,0,0,5,40,False
1453451714724511749,"RT @MarkoSilberhand: Hello World,\n\nGreetings...",{'media_keys': ['3_1453435789598208004']},898231845925646336,,1453451714724511749,2021-10-27 20:01:07+00:00,"{'mentions': [{'start': 3, 'end': 19, 'usernam...",,,en,"[(type, id)]",,2,0,0,0,2,39640,False
1453451692838686726,RT @Leiws_brown: @ShytoshiKusama Hello @Shytos...,,1082249359461359616,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1453451692838686726,2021-10-27 20:01:02+00:00,"{'mentions': [{'start': 3, 'end': 15, 'usernam...",,,en,"[(type, id)]",[nft],22,0,0,0,22,16,False


In [10]:
# function to search Tweets within the past 30 days
# utilizes both API v1.1 and v2 to be consistent with 7-day search.
def search_full(query, start_date=None, end_date=None, max_results=20, write_csv=False, filename="search_full.csv"):
    # initializing API v1.1 instance
    api_1 = init_api_1()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets from the past 30 days relevant to query using tweepy's pagination function
    import tweepy
    response_1 = tweepy.Cursor(api_1.search_full_archive,
                               label="full",
                               query=f"{query} lang:en",
                               fromDate=start_date,
                               toDate=end_date
                              ).items(max_results)
    
    # gathering Tweet ID's in a list
    tweet_ids = [tweet._json['id'] for tweet in response_1]
    
    # setting Tweet data to be included in response
    tweet_fields = ["text", "attachments", "author_id", "context_annotations", "conversation_id", "created_at",
                   "entities", "geo", "in_reply_to_user_id", "lang", "public_metrics", "referenced_tweets"]
    
    # initializing variables for API calls and DataFrame for Tweet data
    import pandas as pd
    num_results = 0
    tweets = pd.DataFrame(columns=tweet_fields+
                          ['entities_hashtags','retweet_count','reply_count','like_count','quote_count'])
    tweets.index.name = "Tweet ID"
    
    # loop to retrieve Tweets from ID's through API v2, 100 at a time
    api_2 = init_api_2()
    while num_results < max_results:
        # slicing tweet_ids since API v2 get_tweets only takes max 100 ID's per request
        try:
            slice_ids = tweet_ids[num_results:num_results+100]
        except IndexError:
            slice_ids = tweet_ids[num_results:]
        
        # retrieving Tweet data from API v2 and adding to DataFrame
        response_2 = api_2.get_tweets(slice_ids, tweet_fields=tweet_fields)
        tweets = tweet_df(tweets, response_2, tweet_fields)
        num_results = len(tweets)
    
    # dropping "public_metrics" since all the values are unpacked, adding "total_engagements"
    tweets.drop('public_metrics', axis=1, inplace=True)
    total_engagements = tweets["retweet_count"] + tweets["reply_count"] + tweets["like_count"] + tweets["quote_count"]
    tweets["total_engagements"] = total_engagements
    
    # adding follower count and verified status to DataFrame for influence metrics
    tweets = author_data(api_2, tweets)
    
    # writing Tweets DataFrame to csv file
    if write_csv:
        tweets.to_csv(filename)
    
    return tweets

In [11]:
test_full = search_full("hello world", max_results=150, write_csv=True)
test_full

Unnamed: 0_level_0,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,referenced_tweets,entities_hashtags,retweet_count,reply_count,like_count,quote_count,total_engagements,author_followers_count,author_verified
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1453464277306781699,RT @Lidija_Bitz: Hello World. Luke is hiring r...,,20996331,"[{'domain': {'id': '65', 'name': 'Interests an...",1453464277306781699,2021-10-27 20:51:02+00:00,"{'hashtags': [{'start': 54, 'end': 64, 'tag': ...",,,en,"[(type, id)]","[Professor, Agricultural, Genomics]",3,0,0,0,3,894,False
1453464170825863170,HELLO AMERICA\n\nMaking GUNS an issue by letti...,,1060967641962016768,,1453464170825863170,2021-10-27 20:50:37+00:00,"{'annotations': [{'start': 6, 'end': 12, 'prob...",,,en,,,0,0,0,0,0,9146,False
1453464115821830145,"RT @rifu_: @Itumade3 [EN]\nHello, Happy World!...",{'media_keys': ['3_1452610405071671297']},1226040051102076928,"[{'domain': {'id': '130', 'name': 'Multimedia ...",1453464115821830145,2021-10-27 20:50:23+00:00,"{'urls': [{'start': 60, 'end': 83, 'url': 'htt...",,,en,"[(type, id)]",,56,0,0,0,56,35,False
1453464015385075719,"RT @FionnOnFire: HELLO\n\nI need your help, Tw...",,1444972506796871683,"[{'domain': {'id': '46', 'name': 'Brand Catego...",1453464015385075719,2021-10-27 20:49:59+00:00,"{'hashtags': [{'start': 108, 'end': 119, 'tag'...",,,en,"[(type, id)]",[Worlds2021],13,0,0,0,13,21,False
1453463939820441603,RT @MrGerrenalist: Hello Journo World! @theGri...,,1167253456710774784,,1453463939820441603,2021-10-27 20:49:41+00:00,"{'mentions': [{'start': 3, 'end': 17, 'usernam...",,,en,"[(type, id)]",,55,0,0,0,55,869,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1453452234025357313,"hello , local cluster，This world is silly",,1298558054376431616,,1453452234025357313,2021-10-27 20:03:11+00:00,,,,en,,,0,0,0,0,0,0,False
1453452156124680192,RT @SharonObara: Hello fam❤We did it once and ...,,285559327,,1453452156124680192,2021-10-27 20:02:52+00:00,"{'mentions': [{'start': 3, 'end': 15, 'usernam...",,,en,"[(type, id)]",,70,0,0,0,70,1949,False
1453451993389883396,RT @universaltheory: Hello World!!! \n\nRememb...,,1447533838318686208,,1453451993389883396,2021-10-27 20:02:13+00:00,"{'annotations': [{'start': 49, 'end': 64, 'pro...",,,en,"[(type, id)]",,5,0,0,0,5,65,False
1453451981683560456,RT @universaltheory: Hello World!!! \n\nRememb...,,1431187748073902083,,1453451981683560456,2021-10-27 20:02:10+00:00,"{'annotations': [{'start': 49, 'end': 64, 'pro...",,,en,"[(type, id)]",,5,0,0,0,5,40,False


## Stream

A Stream is an object that can filter and sample realtime Tweets. Since it's a real-time stream, this is probably not what we're looking for in an analysis pipeline.

In [None]:
# instantiating Stream object
stream = tweepy.Stream(consumer_key, consumer_secret, access_token, access_secret)
stream

In [None]:
stream.sample(languages=["en"])