# Text Analytics | BAIS:6100
# Module 5. Using Twitter APIs

Instructor: Kang-Pyo Lee 

Topics to be covered:
- Searching for tweets using a search query
- Retrieving tweets from a user's timeline

### *** Please run the cells for API requests only when needed. You should be aware of the API rate limits of Twitter.

https://developer.twitter.com/en/docs/basics/rate-limits <br>
https://developer.twitter.com/en/docs/basics/rate-limiting

## Importing Modules

In [None]:
# ! pip install --user --upgrade twitter

In [None]:
import twitter

## Connecting to the Twitter APIs

In [None]:
# Fill in the four variables with your own Twitter API credentials.

CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""

In [None]:
# Establish a connection to the Twitter APIs.

auth = twitter.oauth.OAuth(ACCESS_TOKEN, ACCESS_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET)
twitter_api = twitter.Twitter(auth=auth)

## Searching for Tweets Using a Search Query

In [None]:
# Make the first search call to the Twitter API.

q = "iowa"
search_results = twitter_api.search.tweets(q=q, count=100, lang="en", result_type="mixed")

https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html

- `q`: a search query of 500 characters maximum, including operators
- `count`: the number of tweets to return per page, up to a maximum of 100 
- `lang`: restricts tweets to the given language 
- `result_type`: specifies what type of search results you would prefer to receive (mixed | recent | popular)

Note that the search API has a 7-day limit. In other words, you can only search for tweets published in the past 7 days.

In [None]:
type(search_results)

In [None]:
search_results.keys()

In [None]:
search_results["search_metadata"]

In [None]:
len(search_results["statuses"])

A status refers to an individual tweet.

In [None]:
search_results["statuses"][0]

https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object

In [None]:
search_results["statuses"][0]["text"]

In [None]:
search_results["statuses"][0]["user"]

In [None]:
for status in search_results["statuses"][:30]:
    print(status["text"])

## Searching for More Tweets

In [None]:
N = 30                     # Number of additional calls to the search API.

#########################################################################
# 'Results' will be used for accumulating all incoming data from Twitter.
# Start by storing the previous data in 'results'.
#########################################################################

results = []
results += search_results["statuses"]

#########################################################################
# Make N more iterative search calls with the same query.
#########################################################################

for _ in range(N):
    try:
        next_results = search_results["search_metadata"]["next_results"]
    except KeyError:
        break
    
    kwargs = dict([kv.split('=') for kv in next_results[1:].split("&")])
    search_results = twitter_api.search.tweets(**kwargs)
    
    print("%d tweets retrieved." %len(search_results["statuses"]))
    
    ##########################################################
    # Add the current search results to the overall results.
    ##########################################################
    
    results += search_results["statuses"]

Due to Twitter's API rate policy, you can only make 180 search queries per 15-minute window. 

In [None]:
len(results)

## Saving Data in a CSV File

It's alwasy a good idea to save the collected data, which is temporarily in the memory, as a file for easier access to the data. 

In [None]:
def cleanse_text(text):
    text = text.replace("\n", "")
    text = text.replace("\r", "")
    text = text.replace("\t", "")
    text = text.replace("\"", "")
    
    return text

In [None]:
###################################################################################
# Write the data to a CSV file.
###################################################################################

with open(file="outcome/twitter_search_data.csv", mode="w", encoding="utf8") as fw:
    
    ###########################################################################################
    # Write the 14 column names on the first row.
    # A tab (\t) acts as a seperator between columns and a new line (\n) between rows. 
    ###########################################################################################
    
    fw.write("id\t" +
             "created_at\t" +
             "text\t" +
             "is_retweet\t" +
             "retweet_created_at\t" +
             "retweet_count\t" +
             "user_id\t" +
             "user_name\t" +
             "user_screen_name\t" + 
             "user_created_at\t" +
             "user_followers_count\t" +
             "user_statuses_count\t" +
             "user_location\t" +
             "user_desc\n")

    ###########################################################################################
    # Write the actual data starting from the second row by iterating over the 'results'.
    # A tab (\t) acts as a seperator between columns and a new line (\n) as the end of a line. 
    # Mare sure the order of column names matches the order in which the column values are written.
    ###########################################################################################
    
    for status in results:
        fw.write(status["id_str"] + "\t")
        fw.write(status["created_at"] + "\t")
        fw.write(cleanse_text(status["text"]) + "\t")
        
        if "retweeted_status" in status:
            fw.write("1\t")
            fw.write(status["retweeted_status"]["created_at"] + "\t")
        else:
            fw.write("0\t")
            fw.write("\t")
        
        fw.write(str(status["retweet_count"]) + "\t")
        fw.write(status["user"]["id_str"] + "\t")
        fw.write(status["user"]["name"] + "\t")
        fw.write(status["user"]["screen_name"] + "\t")
        fw.write(status["user"]["created_at"] + "\t")
        fw.write(str(status["user"]["followers_count"]) + "\t")
        fw.write(str(status["user"]["statuses_count"]) + "\t")
        fw.write(cleanse_text(status["user"]["location"]) + "\t")
        fw.write(cleanse_text(status["user"]["description"]) + "\t")
        fw.write("\n")

## Retrieving Tweets from a User's Timeline

https://twitter.com/cnnbrk

In [None]:
# Retrieve recent tweets from the timeline of the CNN Breaking News on Twitter.  

kwargs = {"screen_name": "cnnbrk", "count": 200, "include_rts": "true", "since_id": 1}
statuses = twitter_api.statuses.user_timeline(**kwargs)

https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline

- `screen_name`: the screen name of the user for whom to return results
- `count`: the number of Tweets to try and retrieve, up to a maximum of 200 per distinct request
- `include_rts`: when set to false, the timeline will strip any native retweets (though they will still count toward both the maximal length of the timeline and the slice selected by the count parameter)
- `since_id`: returns results with an ID greater than (that is, more recent than) the specified ID

In [None]:
len(statuses)

In [None]:
statuses[0]

## Retrieving More Tweets

In [None]:
#########################################################################
# 'Results' will be used for accumulating all incoming data from Twitter.
# Start by storing the previous data in 'results'.
#########################################################################

results = []
results += statuses

#########################################################################
# Make more iterative user timeline calls
#########################################################################

N = 15          # Maximum number of calls to be made.

i = 1
while (i <= N) and (len(statuses) > 0):
    
    ##########################################
    # Add a new key 'max_id' to kwargs 
    ##########################################
    
    kwargs["max_id"] = min([status["id"] for status in statuses]) - 1
    statuses = twitter_api.statuses.user_timeline(**kwargs)
    
    print("%d tweets retrieved." %len(statuses))
    
    ##########################################################
    # Add the current results to the overall results.
    ##########################################################
    
    results += statuses
    i += 1

- max_id: returns results with an ID less than (that is, older than) or equal to the specified ID.

Due to Twitter's API rate policy, you can only make 900 statuses queries per 15-minute window. In addition, this can only return up to 3,200 of a user’s most recent Tweets. 

In [None]:
len(results)

## Saving Data in a CSV File

In [None]:
with open(file="outcome/twitter_user_timeline_data.csv", mode="w", encoding="utf8") as fw:
    fw.write("id\t" +
             "created_at\t" +
             "text\t" +
             "is_retweet\t" +
             "retweet_created_at\t" +
             "retweet_count\t" +
             "user_id\n")

    for status in results:
        fw.write(status["id_str"] + "\t")
        fw.write(status["created_at"] + "\t")
        fw.write(cleanse_text(status["text"]) + "\t")
        
        if "retweeted_status" in status:
            fw.write("1\t")
            fw.write(status["retweeted_status"]["created_at"] + "\t")
        else:
            fw.write("0\t")
            fw.write("\t")
        
        fw.write(str(status["retweet_count"]) + "\t")
        fw.write(status["user"]["id_str"])
        fw.write("\n")