# Searching Tweets in Python 

## Copyleft 2020 Forrest Sheng Bao 

To get the code working, you need to get Twitter Developer Account. 
Then create a file `credentials.py` and put your Twitter API credentials in it, like this (the keys and secrete below do not work. Just examples): 

```
consumer_key = "xvz1evFS4wEEPTGEFPHBog"
consumer_secrete = "L8qq9PZyRg6ieKGEKhZolGC0vJWLw8iEJ88DRdyOg"
bearer_token = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"  # This one is optional. And you won't have it until your finish step 1. 
```

Opinions expressed here do not reflect those of Iowa State University and Iowa NPR. 

# Step 0: Load libraries 

In [5]:
import credentials # a user script containing keys, secrets, and tokens 

import json
import base64
import copy

# use two (diversity!) libraries for making web requests
import requests # for authentication
import urllib   # for crawling 

# Step 1: Get Twitter authentication 

We only need OAuth 2.0 Basic authentication because the script below only access public tweets. No need for OAuth 1.0 which accesses user-specific data. 

It will send a request to Twitter's server with your Twitter developer credentials (not your Twitter username and password). If correct, the server will return a Bearer access token. 
Include that token in the headers of all search queries in the future. 

If you have valid Bearer token, you can skip this step. 
The next step assumes Bearer token is saved in the `credential.py` file 

For more details: see 
https://developer.twitter.com/en/docs/basics/authentication/oauth-2-0/application-only 

In [None]:
consumer_key = credentials.consumer_key 
consumer_secrete = credentials.consumer_secrete

bearer = ":".join([consumer_key, consumer_secrete])
bearer_base64 = str(base64.b64encode(bearer.encode('utf-8')))

r = requests.post('https://api.twitter.com/oauth2/token',
                   data={"grant_type":"client_credentials"},
                   headers = {"Authorization": "Basic " + bearer_base64, 
                              "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8"}
                   )

reply = json.loads(r.content)
bearer_token = result['access_token']


# Step 2: Search tweets using tags

Let's do a very basic search: find all Tweets of the tag `#coronavirus`. Note that Twitter's Free/Basic API only allows searching with in the past 7 days. 

Somehow Twitter's official API guide didn't mention how to include Bearer token in the search. 
So here is a side info
https://stackoverflow.com/questions/53002662/get-user-information-in-twitter-api-using-bearer-token

See more at: 
* https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets
* https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators


In [15]:
def tag_search(bearer_token, search_url_base, tag, lang, result_type, count): 
    """Basic search using tags 

    return all recent tweets of a tag, 
              in a specific language (e.g., 'en'), 
              in a result_type (mixed -- default, popular, recent), 
              for count (1 to 100, default 15) amount
         as a list of dicts
    """
    search_url = search_url_base + "?q=" +\
                 query_tag.replace("#", "%23") + "&" +\
                 "lang={}".format(lang) + "&" +\
                 "result_type={}".format(result_type) + "&" +\
                 "count={}".format(count) + "&" + \
                 "tweet_mode=extended"

    print ("Searching URL...", search_url)

    request_headers = {"Authorization":"Bearer " + bearer_token}

    request = urllib.request.Request(search_url, headers=request_headers)
    reply = urllib.request.urlopen(request) 
    tweets = reply.read()
    tweets = json.loads(tweets.decode('utf-8'))
    
    print ("Done")
    return tweets['statuses']


# To try it out, uncomment the lines below. 
# bearer_token = credentials.bearer_token
# query_tag = "#coronavirus"
# search_url_base = "https://api.twitter.com/1.1/search/tweets.json"
# tweets = tag_search(bearer_token, search_url_base, query_tag, "en", "popular", "5")


Twitter returns a very verbose information of the tweets. So you can distill down a little bit with certain information you care about. In the example below, we only keep information fields that are specified in a list. 

In [16]:
def distill_tweets(tweets, info_keys, show_url):
    """

    tweets: a list of tweets as dicts, result of extended search. 
    Keys are: ['contributors', 'coordinates', 'created_at',
               'display_text_range', 'entities', 
               'favorite_count', 'favorited', 'full_text', 'geo',
               'id', 'id_str', 'in_reply_to_screen_name', 'in_reply_to_status_id', 
               'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str',
               'is_quote_status', 'lang', 'metadata', 
               'place', 'possibly_sensitive',
               'retweet_count','retweeted',
               'source','truncated',  'user' ] # author

    """
    counter  = 1 
    new_tweets = []
    for old_tweet in tweets: 
        new_tweet = {}
        for key in info_keys:
            if type(key) == str:
                new_tweet[key] = old_tweet[key]
            elif type(key) == list:
                x = copy.deepcopy(old_tweet)
                for i in key: 
                    x = x[i]
                new_tweet["_".join(key)] = x
        new_tweets.append(new_tweet)
        if show_url:
            print (str(counter)+  ".", end = " ")
            print ("By", new_tweet['user_screen_name'], "at", new_tweet["created_at"])
            print("https://twitter.com/i/web/status/"+old_tweet['id_str'])
            print (new_tweet['full_text'])
            print ()
        counter  += 1 
    return new_tweets

# To try it out, uncomment lines below
# info_keys = ["full_text", "created_at", ['user','screen_name']]
# new_tweets = distill_tweets(tweets, info_keys, True)



# Step 3: Put everything together 

If you just want something that works with everything in default, edit the last line. Specify a hashtag and how many results you want in return. 

With free/basic Twitter API, you can search up to 450 times in a 15-minute window.

In [17]:
def lazy_guy_package(query_tag, how_many):
    info_keys = ["full_text", "created_at", ['user','screen_name']]
    bearer_token = credentials.bearer_token
    search_url_base = "https://api.twitter.com/1.1/search/tweets.json"
    
    tweets = tag_search(bearer_token, search_url_base, query_tag, "en", "popular", how_many)
    
    print ("\nSearch result:")
    
    new_tweets = distill_tweets(tweets, info_keys, True)

    return None 

lazy_guy_package("#coronavirus", 15) 
# just specify one hashtag, and how many (15) most popular results in the past 7-days you want




Searching URL... https://api.twitter.com/1.1/search/tweets.json?q=%23coronavirus&lang=en&result_type=popular&count=15&tweet_mode=extended
Done

Search result:
1. By abhijitmajumder at Sun Feb 16 04:14:53 +0000 2020
https://twitter.com/i/web/status/1228895497923940352
This is beyond eerie. A 1981 thriller by Dean Koontz predicted the #Coronavirus nightmare, pinpointing it to supposedly biological weapons labs in China’s Wuhan! https://t.co/LYIIdEnsEL

2. By HawleyMO at Sun Feb 16 00:31:47 +0000 2020
https://twitter.com/i/web/status/1228839355491504128
Ah, so now Beijing has appointed a virulently anti-Christian, anti-faith party hack to manage #HongKong. This is #China’s priority - while #coronavirus spreads like wildfire https://t.co/uNeF6GFOgz

3. By globaltimesnews at Mon Feb 17 09:18:17 +0000 2020
https://twitter.com/i/web/status/1229334238631084032
Japan's boy band #ARASHI #嵐 announced on #Weibo their concert scheduled to open in Beijing in April has to be canceled due to #COVID19.