# ReThink Media Twitter API

This notebook is for the development and exploration of code for ReThink Media's Twitter API Python interface. The main goals of this notebook are:

- Search Tweets: query, date (optional)
  - Past seven days
  - Past 30 days
  - Full archive
  - Language = English
- Collect Tweets in .csv file
- Add data visualization
  - Top hashtags, keywords, influencers
  - Volume over time for queries/topics

In [2]:
# importing necessary modules
from dotenv import load_dotenv
import os
import json
import numpy as np
import pandas as pd
import tweepy

load_dotenv()

True

## Authentication

The variables below are what allow access to the Twitter API. I've defined them in a `.env` file, and I'm retrieving them with the code below. We then pass those variables in to a tweepy client in order to instantiate a Twitter API instance.

In [3]:
# retrieving environment variables
consumer_key = os.getenv("API_KEY")
consumer_secret = os.getenv("API_KEY_SECRET")
bearer_token = os.getenv("BEARER_TOKEN")
access_token = os.getenv("ACCESS_TOKEN")
access_secret = os.getenv("ACCESS_SECRET")

In [4]:
# Twitter API authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

In [20]:
# function to initialize Twitter API v1.1 instance (for 30-day and full archive search)
def init_api_1():
    
    # importing necessary modules
    from dotenv import load_dotenv
    import os
    import tweepy
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # Twitter API authentication
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    
    # instantiating Twitter API v1.1 reference
    api_1 = tweepy.API(auth)
    
    return api_1

In [23]:
# function to initialize Twitter API v2 instance (for 7-day search)
def init_api_2():
    # importing necessary modules
    from dotenv import load_dotenv
    import os
    import tweepy
    
    # retrieving environment variables from .env file
    consumer_key = os.getenv("API_KEY")
    consumer_secret = os.getenv("API_KEY_SECRET")
    bearer_token = os.getenv("BEARER_TOKEN")
    access_token = os.getenv("ACCESS_TOKEN")
    access_secret = os.getenv("ACCESS_SECRET")
    
    # instantiating Twitter API v2 reference
    api_2 = tweepy.Client(bearer_token=bearer_token,
                         consumer_key=consumer_key,
                         consumer_secret=consumer_secret,
                         access_token=access_token,
                         access_token_secret=access_secret)
    
    return api_2

## Recent Search

The search function available to us in the Standard API package restricts our search to the past seven days, without a premium API dev subscription. For searches further back in the archive, we need to subscribe to a premium API dev environment or upgrade to the Academic API package, which is given to researchers with a clear thesis or research paper goal in mind.

The query can be 512 characters maximum, and the user can specify a `start_time` and `end_time` (as `datetime` or `str` objects) within the past seven days. The user can also search for hashtags as well. The default behavior for white space is "AND" joins, e.g., hello world = hello AND world. More information about Twitter API queries can be found [in their documentation](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query).

The 7-day search can receive an unlimited number of requests and 500,000 Tweets per month.

In [45]:
# instantiating a Twitter API v2 instance
api2 = tweepy.Client(bearer_token=bearer_token,
                consumer_key=consumer_key,
                consumer_secret=consumer_secret,
                access_token=access_token,
                access_token_secret=access_secret)
api2

<tweepy.client.Client at 0x7f02beaec820>

In [46]:
# searching for "hello world" over the past seven days.
response = api2.search_recent_tweets(query="hello world lang:en", max_results=20, expansions="referenced_tweets.id")

The `response` object is a tuple, and it consists of four items: `(data, includes, errors, meta)`.

The `data` object contains the Tweets that are retrieved, and `meta` is the metadata for those Tweets. In this reponse object, `includes` and `errors` are empty, so I'm not sure what `includes` is yet.

In [47]:
# printing Tweets
for i in range(len(response[0])):
    print(f"Tweet {i}:")
    print(response[0][i]['text']+"\n")

Tweet 0:
RT @NoCode_November: Hello world! 👶

I wanted to wait until Twitter updates my Twitter card but they're taking too long 🤷‍♀️

I am here to…

Tweet 1:
HELLO World of Twitter, please help me. DO YOU GUYS OR GALS, know of any place which prints embroidered patches of cloth? Like embroidery of a logo on some cloth. Please RT. It's reallyyyyyy important. 💪🙏😭😍🥰🙌

Tweet 2:
Hello World (1634752986)

Tweet 3:
Hello in there, @jrpsaki....what color is the sky in your world? https://t.co/w9zbCHpBVW https://t.co/06EX1hY93H

Tweet 4:
RT @elyxionbieber: Can y'all imagine the exo chatroom? Kyungsoo sending voice messages telling everyone to have a good day. Kai responding…

Tweet 5:
RT @BLEUSSlDE: SAY HELLO TO MY HOPE WORLD 🌍 https://t.co/WN136FUvr1

Tweet 6:
hello world

Tweet 7:
@pjmzwhore Hello are you okay?.. I know we don't know eachother but I'm here okay? Dont do it, I know the world is hard and it's very very evil but you can get through this, please talk to me.

Tweet 8:
hello world

In [94]:
# printing metadata for Tweets in response
response[3]

{'newest_id': '1450885307889426436',
 'oldest_id': '1450884517594546179',
 'result_count': 20,
 'next_token': 'b26v89c19zqg8o3fpdv5sr0kag6odo2pxlfk0m3q1pmyl'}

In [54]:
# retrieving full text of retweeted Tweet
retweet_id = response[0][0].referenced_tweets[0].id
retweet = api2.get_tweet(retweet_id)
retweet[0].text

"Hello world! 👶\n\nI wanted to wait until Twitter updates my Twitter card but they're taking too long 🤷\u200d♀️\n\nI am here to help you build 💪 &amp; launch 🚀 your #NoCode projects using at least 1 of the participating NoCode tools!\n\nClaim your free tickets NOW! https://t.co/Ul652SvkhN"

In [65]:
response[0][0]['author_id']

In [147]:
# function to retrieve Tweets from the past 7 days relevant to a query
def search_7(query, start_date=None, end_date=None, max_results=20):
    
    # initializing API v1.1 instance
    api_2 = init_api_2()
    
    # parsing dates passed into function
    from dateutil import parser
    if start_date:
        start_date = parser.parse(start_date)
    if end_date:
        end_date = parser.parse(end_date)
    
    # retrieving Tweets between start_date and end_date relevant to query
    response = api_2.search_recent_tweets(query=f"({query}) lang:en",
                                         start_time=start_date,
                                         end_time=end_date,
                                         max_results=max_results,
                                         expansions=["author_id", "referenced_tweets.id"])
    
    # adding Tweet data to DataFrame
    tweets = pd.DataFrame(columns=["Text", "Author ID", "Referenced Tweet IDs"])
    tweets.index.name = "Tweet ID"
    for i in range(len(response[0])):
        tweet = response[0][i]
        tweet_id = tweet.id
        text = tweet.text
        author_id = tweet.author_id
        if tweet.referenced_tweets:
            retweet_ids = [retweet.id for retweet in tweet.referenced_tweets]
        else:
            retweet_ids = None
        tweets.loc[tweet_id] = [text, author_id, retweet_ids]
    return tweets

In [150]:
test = search_7("hello world")
print(len(test))
test

20


Unnamed: 0_level_0,Text,Author ID,Referenced Tweet IDs
Tweet ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1450922732149956613,"RT @LauraHotWifeX: Hello Twitter,\n\nI’m takin...",801555020172972037,[1450911864041398273]
1450922727787749377,"RT @vcbxvs: Hello world, Twitter deleted my ac...",1235652376914038784,[1450909359953874957]
1450922690382950401,Hello twitter world hehe💓 Just in case that yo...,3281433068,
1450922674922737664,RT @AdiPolak: Getting into data science/engine...,2717113375,[1450681808220270592]
1450922623634853889,RT @elyxionbieber: Can y'all imagine the exo c...,1355136001681682434,[1450846125041143811]
1450922529434988544,@SmutNetwork Hello world! I'm the Emperor and ...,1210412532025847809,[1450900063564222467]
1450922523839930371,"RT @Scenario3d: Hello, world 👋\n#TeamScenario ...",19071963,[1450842523845472260]
1450922504478994432,RT @SolDucksNFT: Hello world!\n\nWelcome to th...,35068584,[1450135607930404867]
1450922387525013509,RT @AdiPolak: Getting into data science/engine...,619846999,[1450681808220270592]
1450922357086957581,"RT @akcentofficial: Hello Pakistan, \nThis tim...",1440634045960908806,[1448676916324614145]


## 30-Day/Full Archive Search

We can access 30-day and full archive searches without an Academic API package with a premium development environment through the Twitter API. This requires interfacing with the API v1.1, as opposed to v2 in the Recent Search.

The 30-day search can receive 250 requests and 25,000 Tweets per month, while the full archive search can receive 50 requests and 5,000 Tweets per month.

In [9]:
# initializing API v1.1
api1 = tweepy.API(auth)

In [10]:
response_30 = api1.search_30_day(label="30day", query="hello world lang:en", maxResults=20)

In [11]:
len(response_30)

20

In [12]:
type(response_30[0])

tweepy.models.Status

The `tweepy.models.Status` object contains a lot of data about the Tweet, such as its text, its author, and various aspects of metadata about the Tweet's creation and interactions.

In [13]:
response_30[0]._json

{'created_at': 'Wed Oct 20 17:06:26 +0000 2021',
 'id': 1450871038364037120,
 'id_str': '1450871038364037120',
 'text': "RT @SaitamaDude: Hello #SaitamaWolfPack!!! \n\nI've made a youtube channel with a quick and easy video on how to purchase #Saitama easily bef…",
 'source': '<a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>',
 'truncated': False,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'in_reply_to_screen_name': None,
 'user': {'id': 1397553132,
  'id_str': '1397553132',
  'name': 'koko kaye',
  'screen_name': 'koko_kaye',
  'location': None,
  'url': None,
  'description': '“You are never too old to set another goal, or to dream a new dream.” C.S. Lewis\n\nMy tweets/retweets are not financial advice. \nDYOR | Join the #SaitamaWolfPack',
  'translator_type': 'none',
  'protected': False,
  'verified': False,
  'followers_count': 870,
  'friends_count': 1046,
  'listed_co

In [151]:
for i in response_30[0]._json:
    print(i)

created_at
id
id_str
text
source
truncated
in_reply_to_status_id
in_reply_to_status_id_str
in_reply_to_user_id
in_reply_to_user_id_str
in_reply_to_screen_name
user
geo
coordinates
place
contributors
retweeted_status
is_quote_status
quote_count
reply_count
retweet_count
favorite_count
entities
favorited
retweeted
filter_level
lang
matching_rules


In [14]:
response_full = api1.search_full_archive(label="full", query="hello world lang:en", maxResults=20)

In [15]:
response_full[0]._json

{'created_at': 'Wed Oct 20 17:06:38 +0000 2021',
 'id': 1450871091174510593,
 'id_str': '1450871091174510593',
 'text': 'Hello World (1634749599)',
 'source': '<a href="https://www.keysight.com/" rel="nofollow">Nemo Outdoor</a>',
 'truncated': False,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'in_reply_to_screen_name': None,
 'user': {'id': 930007300525510657,
  'id_str': '930007300525510657',
  'name': 'Bench Mark',
  'screen_name': 'bnchmrk415',
  'location': None,
  'url': None,
  'description': None,
  'translator_type': 'none',
  'protected': False,
  'verified': False,
  'followers_count': 1,
  'friends_count': 0,
  'listed_count': 0,
  'favourites_count': 0,
  'statuses_count': 55931,
  'created_at': 'Mon Nov 13 09:39:53 +0000 2017',
  'utc_offset': None,
  'time_zone': None,
  'geo_enabled': False,
  'lang': None,
  'contributors_enabled': False,
  'is_translator': False,
  'profile_backgr

## Stream

A Stream is an object that can filter and sample realtime Tweets.

In [11]:
# instantiating Stream object
stream = tweepy.Stream(consumer_key, consumer_secret, access_token, access_secret)
stream

<tweepy.streaming.Stream at 0x7fc4d6038130>

In [None]:
stream.sample(languages=["en"])