In [75]:
import numpy as np
import pandas as pd
import json
from subprocess import Popen, PIPE
import datetime

Twitter API limits the search's returned tweets to less than 100 from within the past seven days. I would like to search for the most popular tweets in this time frame, but the result_type's 'popular' option appears buggy. Let's look at tweets from the past week that:
- contain the hashtags 'books' and 'bookquotes'
- were not retweets
- were written in the English language

The Twitter API search tool:

https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets
        
Details on search:

https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/guides/standard-operators

Details on search operators:

https://developer.twitter.com/en/docs/twitter-api/v1/rules-and-filtering/search-operators

URL encoding reference:

https://www.w3schools.com/tags/ref_urlencode.ASP

In [58]:
# define search parameters:
query = "%23books+%23bookquotes+-filter%3Aretweets" #URL encoding
count = 100 #max number available
result_type = "recent" 

# place search parameters into the appropriate URL:
url = "{api}?q={q}&result_type={result_type}&count={count}&tweet_mode=extended".format(
    api="/1.1/search/tweets.json",
    q=query,
    result_type=result_type,
    count=count)

# create process:
cmd = ["twurl",url]
process = Popen(cmd,stdout=PIPE,stderr=PIPE)
stdout,stderr = process.communicate()

## Understanding output from /1.1/search/tweets.json
The API tweet search returns a collection of relevant tweets matching the specific inquery. The Python subprocess returns a 'bytes' object. The object is made more user-friendly with the json Python library. json.loads() is used to deserialize the bytes instance into a Python object. 

In [59]:
output = json.loads(stdout)

json.loads() returns a Python dictionary with two key-item pairs:
1. The 'statuses' key contains the search's resultant json objects
2. The 'search_metadata' key contains the search's metadata

'search_metadata' is a straightforward dictionary with basic information about the completed search. 

In [60]:
output["search_metadata"]

{'completed_in': 0.089,
 'max_id': 1345023568363577344,
 'max_id_str': '1345023568363577344',
 'next_results': '?max_id=1342376243757543427&q=%23books%20%23bookquotes%20-filter%3Aretweets&count=100&include_entities=1&result_type=recent',
 'query': '%23books+%23bookquotes+-filter%3Aretweets',
 'refresh_url': '?since_id=1345023568363577344&q=%23books%20%23bookquotes%20-filter%3Aretweets&result_type=recent&include_entities=1',
 'count': 100,
 'since_id': 0,
 'since_id_str': '0'}

'statuses' is a Python list of Python dictionaries. Each dictionary contains a mess of other nested dictionaries, strings, and other types. Let's take a closer look at the key-item pairs. 

In [61]:
keys,items = [],[]
for key, item in output["statuses"][0].items():
    keys.append(key)
    items.append(type(item))

# output the key-item information as a Pandas dictionary for 
# easy viewing:
pd.DataFrame(
    [keys,items],
    index=["key","item type"]
    ).transpose()

Unnamed: 0,key,item type
0,created_at,<class 'str'>
1,id,<class 'int'>
2,id_str,<class 'str'>
3,full_text,<class 'str'>
4,truncated,<class 'bool'>
5,display_text_range,<class 'list'>
6,entities,<class 'dict'>
7,extended_entities,<class 'dict'>
8,metadata,<class 'dict'>
9,source,<class 'str'>


In [62]:
output["statuses"][0]

{'created_at': 'Fri Jan 01 15:06:22 +0000 2021',
 'id': 1345023568363577344,
 'id_str': '1345023568363577344',
 'full_text': '#WritingCommunity #fantasybooks #indieauthors #bookquotes #readerscommunity #quotes #books \nExcerpt from The Exiled: Of Shade and Shadow https://t.co/TkrW5i0AVu',
 'truncated': False,
 'display_text_range': [0, 136],
 'entities': {'hashtags': [{'text': 'WritingCommunity', 'indices': [0, 17]},
   {'text': 'fantasybooks', 'indices': [18, 31]},
   {'text': 'indieauthors', 'indices': [32, 45]},
   {'text': 'bookquotes', 'indices': [46, 57]},
   {'text': 'readerscommunity', 'indices': [58, 75]},
   {'text': 'quotes', 'indices': [76, 83]},
   {'text': 'books', 'indices': [84, 90]}],
  'symbols': [],
  'user_mentions': [],
  'urls': [],
  'media': [{'id': 1345023565943480321,
    'id_str': '1345023565943480321',
    'indices': [137, 160],
    'media_url': 'http://pbs.twimg.com/media/Eqp7zpaXIAEc-Qq.png',
    'media_url_https': 'https://pbs.twimg.com/media/Eqp7zpaXIAEc

# Process search results
Let's judge a tweet's success by its number of retweets and its number of favorites. We are also interested in what was in the tweet text (number of hashtags, length, etc), what time and day of the week the tweet was made, and whether or not an image was attached. The corresponding output dictionary keys to most of these parameters are:
- retweet_count
- favorite_count
- full_text
- created_at
- id

Information about the media contained in the search's output is accessed via the 'entities' key. The 'entities' key accesses a dictionary that contains tweet hashtag and attachment information. 

In [69]:
output["statuses"][0]["entities"]

{'hashtags': [{'text': 'WritingCommunity', 'indices': [0, 17]},
  {'text': 'fantasybooks', 'indices': [18, 31]},
  {'text': 'indieauthors', 'indices': [32, 45]},
  {'text': 'bookquotes', 'indices': [46, 57]},
  {'text': 'readerscommunity', 'indices': [58, 75]},
  {'text': 'quotes', 'indices': [76, 83]},
  {'text': 'books', 'indices': [84, 90]}],
 'symbols': [],
 'user_mentions': [],
 'urls': [],
 'media': [{'id': 1345023565943480321,
   'id_str': '1345023565943480321',
   'indices': [137, 160],
   'media_url': 'http://pbs.twimg.com/media/Eqp7zpaXIAEc-Qq.png',
   'media_url_https': 'https://pbs.twimg.com/media/Eqp7zpaXIAEc-Qq.png',
   'url': 'https://t.co/TkrW5i0AVu',
   'display_url': 'pic.twitter.com/TkrW5i0AVu',
   'expanded_url': 'https://twitter.com/ScharaReeves/status/1345023568363577344/photo/1',
   'type': 'photo',
   'sizes': {'medium': {'w': 1118, 'h': 1200, 'resize': 'fit'},
    'large': {'w': 1419, 'h': 1523, 'resize': 'fit'},
    'small': {'w': 634, 'h': 680, 'resize': 'fit

The 'media' key within 'entities' is a list of the attachment media. The list's items are themselves dictionaries. 

In [71]:
output["statuses"][0]["entities"]["media"][0]

{'id': 1345023565943480321,
 'id_str': '1345023565943480321',
 'indices': [137, 160],
 'media_url': 'http://pbs.twimg.com/media/Eqp7zpaXIAEc-Qq.png',
 'media_url_https': 'https://pbs.twimg.com/media/Eqp7zpaXIAEc-Qq.png',
 'url': 'https://t.co/TkrW5i0AVu',
 'display_url': 'pic.twitter.com/TkrW5i0AVu',
 'expanded_url': 'https://twitter.com/ScharaReeves/status/1345023568363577344/photo/1',
 'type': 'photo',
 'sizes': {'medium': {'w': 1118, 'h': 1200, 'resize': 'fit'},
  'large': {'w': 1419, 'h': 1523, 'resize': 'fit'},
  'small': {'w': 634, 'h': 680, 'resize': 'fit'},
  'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}}}

The 'type' key returns the media type.

In [72]:
output["statuses"][0]["entities"]["media"][0]["type"]

'photo'

Let's extract the relevant information and store them in a Pandas' dataframe.

In [89]:
tweet_id = []
tweet_datetime = []
tweet_body = []
tweet_media_type = []
retweets = []
favorites = []
for tweet in output["statuses"]:
    
    # convert 'created_at' to datetime object:    
    aware_utc = datetime.datetime.strptime(
        tweet["created_at"],
        "%a %b %d %H:%M:%S %z %Y")
    naive_utc = aware_utc.replace(tzinfo=None)
    
    # store basic tweet info:
    tweet_id.append(tweet["id"])
    tweet_datetime.append(naive_utc)
    tweet_body.append(tweet["full_text"])
    
    # store tweet retweets and favorites:
    retweets.append(tweet["retweet_count"])
    favorites.append(tweet["favorite_count"])
    
    # store media type attachment:
    try:
        media = tweet["entities"]["media"]
        media_list = []
        for mm in media:
            media_list.append(mm["type"])
        tweet_media_type.append(";".join(media_list))
        
    except:
        tweet_media_type.append(np.nan)

# create Pandas' Dataframe:
data = [
    tweet_id,
    tweet_datetime,
    tweet_body,
    tweet_media_type,
    retweets,
    favorites,
    ]
index = [
    "tweet_id",
    "tweet_datetime",
    "tweet_body",
    "tweet_media",
    "num_retweets",
    "num_favorites"
    ]
results = pd.DataFrame(data,index=index).transpose()

# display results:
results

Unnamed: 0,tweet_id,tweet_datetime,tweet_body,tweet_media,num_retweets,num_favorites
0,1345023568363577344,2021-01-01 15:06:22,#WritingCommunity #fantasybooks #indieauthors ...,photo,1,3
1,1344829959983034370,2021-01-01 02:17:02,“As long as people like me are unwilling to ta...,photo,0,1
2,1344783402130698241,2020-12-31 23:12:02,"“This was a fast paced book, and I highly reco...",photo,0,2
3,1344749880061825025,2020-12-31 20:58:50,"And to end the year, a special treat. #patrick...",photo,0,2
4,1344735587002146819,2020-12-31 20:02:02,“How long will we hide in the shadows while th...,photo,1,1
5,1344700604816498689,2020-12-31 17:43:02,“A tension filled climactic ending” https://t....,photo,0,4
6,1344531080544718849,2020-12-31 06:29:24,#mythsandmusic #blackmagickseries #books #whit...,photo,0,0
7,1344360506979569664,2020-12-30 19:11:36,This is going to be a fun project. #patrickkni...,photo,0,1
8,1344347528716759042,2020-12-30 18:20:02,"""I can't go home, and I can't stay here. Unles...",photo,1,0
9,1344338216275554309,2020-12-30 17:43:02,"""You were dead a long time ago. It's simply a ...",photo,0,0
