# ReThink Media Twitter API

This notebook is for the development and exploration of code for ReThink Media's Twitter API Python interface. The main goals of this notebook are:

- Search Tweets: query, date (optional)
  - Past seven days
  - Past 30 days
  - Full archive
  - Language = English
- Collect Tweets in .csv file
- Add data visualization
  - Top hashtags, keywords, influencers
  - Volume over time for queries/topics

In [1]:
# importing necessary modules
from dotenv import load_dotenv
import os
import json
import numpy as np
import pandas as pd
import tweepy

load_dotenv()

True

## Authentication

The variables below are what allow access to the Twitter API. I've defined them in a `.env` file, and I'm retrieving them with the code below. We then pass those variables in to a tweepy client in order to instantiate a Twitter API instance.

In [2]:
# retrieving environment variables
consumer_key = os.getenv("API_KEY")
consumer_secret = os.getenv("API_KEY_SECRET")
bearer_token = os.getenv("BEARER_TOKEN")
access_token = os.getenv("ACCESS_TOKEN")
access_secret = os.getenv("ACCESS_SECRET")

In [3]:
# Twitter API authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

In [4]:
# instantiating a Twitter API instance
api = tweepy.Client(bearer_token=bearer_token,
                consumer_key=consumer_key,
                consumer_secret=consumer_secret,
                access_token=access_token,
                access_token_secret=access_secret)
api

<tweepy.client.Client at 0x7fc4d6126700>

## Recent Search

The search function available to us in the Standard API package restricts our search to the past seven days, without a premium API dev subscription. For searches further back in the archive, we need to subscribe to a premium API dev environment or upgrade to the Academic API package, which is given to researchers with a clear thesis or research paper goal in mind.

The query can be 512 characters maximum, and the user can specify a `start_time` and `end_time` (as `datetime` or `str` objects) within the past seven days. The user can also search for hashtags as well. The default behavior for white space is "AND" joins, e.g., hello world = hello AND world. More information about Twitter API queries can be found [in their documentation](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query).

In [42]:
# searching for "hello world" over the past seven days.
response = api.search_recent_tweets(query="hello world lang:en", max_results=20, tweet_fields=["referenced_tweets"])

The `response` object is a tuple, and it consists of four items: `(data, includes, errors, meta)`.

The `data` object contains the Tweets that are retrieved, and `meta` is the metadata for those Tweets. In this reponse object, `includes` and `errors` are empty, so I'm not sure what `includes` is yet.

In [43]:
# printing Tweets
for i in range(len(response[0])):
    print(f"Tweet {i}:")
    print(response[0][i]['text']+"\n")

Tweet 0:
RT @MarkoSilberhand: Hello World,

#Halloween is just around the corner . . .

... are you already prepared ?

🐾     🎃     😊     🎃     🐾 ht…

Tweet 1:
RT @SB19Official: 🎉 #SB19xATINAnnivMonth

Hello to all A'TIN all around the world 💙

We are glad to announce that we will be celebrating ou…

Tweet 2:
RT @MarkoSilberhand: Hello World,

... because today is Monday . . .

Greetings from Bilbo !

🐾     💕     😊 https://t.co/LOWn7pLyIL

Tweet 3:
RT @KeysandAnklets: Hello everyone. I've been working on this for a very long time and I"m happy to finally make it public knowledge and fo…

Tweet 4:
RT @MarkoSilberhand: Hello World,

October 17, 2021
A foggy day in Bavaria . . .

... here are some impressions...

🐾       💕       😊 https…

Tweet 5:
Hello world, the Iraqi people are demonstrating against the election results

@Pontifex_es @UNarabic @UNICCairo @ChinaIraq @bbcmundo @bbcarabicalerts https://t.co/PxyJwuYkHU https://t.co/6lJKsi3jse

Tweet 6:
RT @sideframestudio: @berframe Hello 

In [40]:
# printing metadata for Tweets in response
response[3]

{'newest_id': '1450523929580347395',
 'oldest_id': '1450522936138289152',
 'result_count': 20,
 'next_token': 'b26v89c19zqg8o3fpdv5skkl0vnw1kmoonar7upoakbr1'}

## 30-Day/Full Archive Search

We can access 30-day and full archive searches without an Academic API package with a premium development environment through the Twitter API. This requires interfacing with the API v1.1, as opposed to v2 in the Recent Search.

In [44]:
# initializing API v1.1
api1 = tweepy.API(auth)

In [47]:
response_30 = api1.search_30_day(label="30day", query="hello world lang:en", maxResults=20)

In [49]:
len(response_30)

20

In [51]:
type(response_30[0])

tweepy.models.Status

The `tweepy.models.Status` object contains a lot of data about the Tweet, such as its text, its author, and various aspects of metadata about the Tweet's creation and interactions.

In [68]:
response_30[0]._json

{'created_at': 'Tue Oct 19 20:13:43 +0000 2021',
 'id': 1450555783477960704,
 'id_str': '1450555783477960704',
 'text': "Hello everyone! I finally made a Twitter for my Sims 4 Machinima World! I'm currently trying to find out the kinks… https://t.co/l8SdcJtkM4",
 'display_text_range': [0, 140],
 'source': '<a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>',
 'truncated': True,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'in_reply_to_screen_name': None,
 'user': {'id': 1450536534499811331,
  'id_str': '1450536534499811331',
  'name': 'Esmee Simnematic',
  'screen_name': 'EsmeeSimnematic',
  'location': 'SimNation',
  'url': 'https://www.youtube.com/channel/UCBSK-8JM9CLzhwuw_bBuNfA',
  'description': '⚠️CURRENTLY UNDER CONSTRUCTION⚠️\nThis is my official Twitter account for my Sims 4 Machinima world, series, BTS screenshots and videos! 18+ minors DNI',
  'translator_type': 'none

In [69]:
response_full = api1.search_full_archive(label="full", query="hello world lang:en", maxResults=20)

In [70]:
response_full[0]._json

{'created_at': 'Tue Oct 19 20:38:43 +0000 2021',
 'id': 1450562073205874697,
 'id_str': '1450562073205874697',
 'text': '@KMbappe Eberhard grossgasteiger 📸\nhello sen doubt the best photographer in the world amazing shot.… https://t.co/BhX4WLQPxO',
 'display_text_range': [0, 140],
 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
 'truncated': True,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': 1725137533,
 'in_reply_to_user_id_str': '1725137533',
 'in_reply_to_screen_name': 'KMbappe',
 'user': {'id': 1431013728384860165,
  'id_str': '1431013728384860165',
  'name': 'ahmsbeny',
  'screen_name': 'hamidon2021',
  'location': 'Roche-la-Molière, France',
  'url': None,
  'description': 'Noormale',
  'translator_type': 'none',
  'protected': False,
  'verified': False,
  'followers_count': 11,
  'friends_count': 25,
  'listed_count': 0,
  'favourites_count': 213,
  'statuses_count': 878,
  'create

## Stream

A Stream is an object that can filter and sample realtime Tweets.

In [11]:
# instantiating Stream object
stream = tweepy.Stream(consumer_key, consumer_secret, access_token, access_secret)
stream

<tweepy.streaming.Stream at 0x7fc4d6038130>

In [17]:
stream.sample(languages=["en"])

KeyboardInterrupt: 