# Searching Tweets in Python 

## Copyleft 2020 Forrest Sheng Bao 

To get the code working, you need to get Twitter Developer Account. 
Then create a file `credentials.py` and put your Twitter API credentials in it, like this (the keys and secrete below do not work. Just examples): 

```
consumer_key = "xvz1evFS4wEEPTGEFPHBog"
consumer_secrete = "L8qq9PZyRg6ieKGEKhZolGC0vJWLw8iEJ88DRdyOg"
bearer_token = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"  # This one is optional. And you won't have it until your finish step 1. 
```

Opinions expressed here do not reflect those of Iowa State University and Iowa NPR. 

# Step 0: Load libraries 

In [3]:
import credentials # a user script containing keys, secrets, and tokens 

import json
import base64
import copy

# use two (diversity!) libraries for making web requests
import requests # for authentication
import urllib   # for crawling 

# Step 1: Get Twitter authentication (do it only once, unless you want a new token)

We only need OAuth 2.0 Basic authentication because the script below only access public tweets. No need for OAuth 1.0 which accesses user-specific data. 

It will send a request to Twitter's server with your Twitter developer credentials (not your Twitter username and password). If correct, the server will return a Bearer access token. 
Include that token in the headers of all search queries in the future. 

If you have valid Bearer token, you can skip this step. 
The next step assumes Bearer token is saved in the `credential.py` file 

For more details: see 
https://developer.twitter.com/en/docs/basics/authentication/oauth-2-0/application-only 

In [4]:
consumer_key = credentials.consumer_key 
consumer_secrete = credentials.consumer_secrete

bearer = ":".join([consumer_key, consumer_secrete])
bearer_base64 = str(base64.b64encode(bearer.encode('utf-8')))

r = requests.post('https://api.twitter.com/oauth2/token',
                   data={"grant_type":"client_credentials"},
                   headers = {"Authorization": "Basic " + bearer_base64, 
                              "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8"}
                   )

reply = json.loads(r.content)
bearer_token = reply['access_token']


KeyError: 'access_token'

# Step 2: Search tweets using tags

Let's do a very basic search: find all Tweets of the tag `#coronavirus`. Note that Twitter's Free/Basic API only allows searching with in the past 7 days. 

Somehow Twitter's official API guide didn't mention how to include Bearer token in the search. 
So here is a side info
https://stackoverflow.com/questions/53002662/get-user-information-in-twitter-api-using-bearer-token

See more at: 
* https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets
* https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators

##  Step 2.1: Get raw search result 


In [19]:
def tag_search(bearer_token, search_url_base, query_tag, result_type, count): 
    """Basic search using tags 

    return all recent tweets of a tag, 
              in a specific language (e.g., 'en'), 
              in a result_type (mixed -- default, popular, recent), 
              for count (1 to 100, default 15) amount
         as a list of dicts
    """
    search_url = search_url_base + "?q=" +\
                 query_tag.replace("#", "%23").replace("@", "from%3A") + "&" +\
                 "result_type={}".format(result_type) + "&" +\
                 "count={}".format(count) + "&" + \
                 "tweet_mode=extended"
                #  "lang={}".format(lang) + "&" +\
    print ("Searching URL...", search_url)

    request_headers = {"Authorization":"Bearer " + bearer_token}

    request = urllib.request.Request(search_url, headers=request_headers)
    reply = urllib.request.urlopen(request) 
    tweets = reply.read()
    tweets = json.loads(tweets.decode('utf-8'))
    
    print ("Done")
    return tweets['statuses']


# To try it out, uncomment the lines below. 
# bearer_token = credentials.bearer_token
# query_tag = "#coronavirus"
# search_url_base = "https://api.twitter.com/1.1/search/tweets.json"
# tweets = tag_search(bearer_token, search_url_base, query_tag, "en", "popular", "5")


## Step 2.2: Distill the search result

Twitter returns a very verbose information of the tweets. So you can distill down a little bit with certain information you care about. In the example below, we only keep information fields that are specified in a list. 

In [10]:
def distill_tweets(tweets, info_keys, show_url):
    """
    tweets: a list of tweets as dicts, result of extended search. 

    """
    counter  = 1 
    new_tweets = []
    for old_tweet in tweets: 
        new_tweet = {}
        for key in info_keys:
            if key == "user_screen_name":
                new_tweet[key] = old_tweet["user"]["screen_name"]
            elif key == "user_location":
                new_tweet[key] = old_tweet["user"]["location"]
            elif key == "hashtags":
                new_tweet[key] = [x["text"] for x in old_tweet["entities"]["hashtags"]]
            elif key == "mentions":
                new_tweet[key] = [x["screen_name"] for x in old_tweet["entities"]["user_mentions"]]
            else: 
                new_tweet[key] = old_tweet[key]
        new_tweets.append(new_tweet)
        if show_url:
            print (str(counter)+  ".", end = " ")
            print ("By", new_tweet['user_screen_name'], "at", new_tweet["created_at"])
            print("https://twitter.com/i/web/status/"+old_tweet['id_str'])
            print (new_tweet['full_text'])
            print ()
        counter  += 1 
    return new_tweets

# To try it out, uncomment lines below
# info_keys = ["full_text", "created_at", ['user','screen_name']]
# info_keys = ["full_text", "created_at", "user_screen_name", "user_location", "hashtags", "mentions"]
# tweets[0].keys()
# new_tweets = distill_tweets(tweets, info_keys, True)
# new_tweets

## Step 2.3: Save query results into text

In [11]:
def save_csv(dump_to, tweets, info_keys): 
    """Dump distilled search results into a CSV file. 

    dump_to: str, path to the CSV to be written 
    tweets: list of dicts, keys are those in info_keys (see below), values are their respective data types
    info_keys: list of a mixture of (strs or list of strs), 
               e.g., ["full_text", "created_at", ['user','screen_name']] , 
               list-type elements are concatenated with underscores, e.g., 'user_screen_name'
    """
    if len(tweets) > 0 : 
        first_tweet = tweets[0]
        keys = [x if type(x)==str else "_".join(x) for x in first_tweet]
    else: 
        keys = []

    with open(dump_to, 'w') as f:
        header = "\t".join(keys)
        f.write(header + "\n")
        for tweet in tweets: 
            line = []
            for key in keys: 
                if type(tweet[key]) == list:
                    try: 
                        line.append( ",".join(tweet[key])  ) 
                    except TypeError:
                        print (key, tweet[key])
                else: 
                    line.append(tweet[key])
            line = "\t".join(line)
            line = line.replace("\n"," ")#.replace("\t", " ")
            f.write(line + "\n")
 
    return None 

# save_csv("coronavirus.tsv", new_tweets, info_keys)


# Step 3: Put everything together 

If you just want something that works with everything in default, edit the last line. Specify a hashtag and how many results you want in return. 

With free/basic Twitter API, you can search up to 450 times in a 15-minute window.

In [25]:
def lazy_guy_package(query_tag, how_many, info_keys):
    bearer_token = credentials.bearer_token
    search_url_base = "https://api.twitter.com/1.1/search/tweets.json"
    
    tweets = tag_search(bearer_token, search_url_base, query_tag, "popular", how_many)
    
    print ("\n Search result:")
    
    
    new_tweets = distill_tweets(tweets, info_keys, True)

    x= save_csv(query_tag.replace("@","").replace("#","")+"_search.csv", new_tweets, info_keys)

    return tweets 

# for hashtag in ["#noplant19", "#plant19", "#harvest19", "#noharvest19"]:
# for hashtag in ["#coronavirus"]:
for hashtag in ["@SecPompeo"]:
    tweets=lazy_guy_package(hashtag, 100, info_keys = ["full_text", "created_at", "user_screen_name", "user_location", "hashtags", "mentions"]) 
# just specify one hashtag, how many (15) most popular results in the past 7-days you want, and what are the fields of the tweets you care




Searching URL... https://api.twitter.com/1.1/search/tweets.json?q=from%3ASecPompeo&result_type=popular&count=100&tweet_mode=extended
Done

 Search result:
1. By SecPompeo at Thu Feb 27 15:19:21 +0000 2020
https://twitter.com/i/web/status/1233048984723152897
President @realDonaldTrump's first official trip to India this week demonstrates the value the U.S. places on the #USIndia partnership. Democratic traditions unite us, shared interests bond us, and under the President's leadership our partnership has and will only grow stronger. https://t.co/FbmOenZB26

2. By SecPompeo at Thu Feb 27 02:15:11 +0000 2020
https://twitter.com/i/web/status/1232851640698404864
Today, we again reaffirm #Crimea is #Ukraine. The United States does not and will not ever recognize Russia’s claims of sovereignty over the peninsula. We call on Russia to end its occupation of Crimea.

3. By SecPompeo at Wed Feb 26 23:09:50 +0000 2020
https://twitter.com/i/web/status/1232804996271525889
#Iran’s regime is facing a 

In [30]:
import json
json.dump(tweets, open("SecPompeo.json", 'w', encoding='utf-8'), indent=2)