## Using the Twitter API: Bot Tutorial

##### Background: What is an API?

An Application Programming Interface (API) is the means by which a piece of software exposes some of its underlying functionality. Ideally an API is well documented so that application programmers can easily interact with it. 

We will look at a specific type of API: an interface exposed by a web site or a Web API. 

The practice of publishing APIs has allowed web communities to create an open architecture for sharing content and data. In this way, content that is created in one place can be dynamically posted and updated in multiple locations on the web. For example, Amazon or eBay APIs allow developers to use the existing retail infrastructure to create specialized web stores. Other APIs allow for:

-Smartphone applications (for accessing Twitter, LinkedIn, Facebook etc.);<br>
-Maps with location data (like Yelp);<br>
-Online purchases (verification of credit-card data); and <br>
-Sharing content between social networking sites.

###### Twitter API 

Many APIs will require you to establish an authorization key. For the twitter API, you must create an application here: https://apps.twitter.com/app/new<br>

I can provide you with temporary keys to my bot's account during class. 

Otherwise, to fill out the application linked above following these instructions: 
Write in a Name and Desciption. You can put in a place filler like http://www.google.com for the Website.<br>
Leave the Callback URL empty.<br> 
Submit the form.

On the following page go to the Keys and Access Tokens tab and make a note of the <strong>API Key</strong> and <strong>API Secret</strong>. Scroll down and create an Access Token. Make a note of the <strong>Access Token</strong> and <strong>Access Token Secret</strong>.

In [None]:
api_key = 'ZPuTHu7jnNc8lHBfAeGYSOxVF'
api_secret = 'atHZU01QAGhc115KsBtKY7NrtuQ3xf6A0lTTd6xIwIKMGx7zvS'
access_token = '2420979512-KpKq2xtQifJHfqzfkUJjM6URLrucNYoF7rK4jE3'
access_secret = 'eFLBON1KQeAi5oqdEwZCn3bFGgNqznF5UPqKQ2cF5riMH'

###### Luckily, someone has already created many of the important functions to interact with the Twitter API and its data. We will use tweepy to perform some actions within this notebook file. 

If you have not installed this package yet, go to the "Anaconda Prompt" terminal on your machine and execute:
<code>pip install tweepy</code>

In [None]:
import time
import io
import tweepy #package we will use to interact with the API. 

In [None]:
#Authorization
auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

Now Let's have some fun with all the twitter data at your disposal!

In [None]:
#Print to the console the tweets from your timeline
public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)

In [None]:
# Get information about a user
user = api.get_user('twitter')

print(user.screen_name)
#get follwer count
print(user.followers_count)

In [None]:
#see how the friends() method works
for friend in user.friends():
   print(friend.screen_name)
#people that the user follows that follow them back
#This is how websites like UnfollowMe, etc work! You could start making money off kids who are worried about their follower ratio!

It's very common to get data from an API that is a JSON object since JSON is useful for storing data. However, it isn't always the most readable-- there are tons of brakets, arrays, and colons that organize the object. You will need to understand the structure of the JSON in order to parse and then comprehend the data.

In [None]:
#The trends closest feautre will find places trending closest to a given lat/long
lat = 38.0293
long = -78.4767 #cville!
api.trends_closest(lat, long)

In [None]:
json_object = api.trends_place(2480894) #this is the WOEID for Richmond. Feel free to find a different location
print(json_object)

As we see above, there are a lot of other things you can analyze such as the volume of tweets or the amount of promoted content

In [None]:
#The JSON array's outer level is list with only one item
json_object[0]
#json_object[1] --> fails since index out of range

In [None]:
#Now the JSON object acts like a dictionary, or a set of key-value pairs. The only key is "trends"
json_object[0].get('trends')#The trends closest feautre will find places trending closest to a given lat/long
lat = 38.0293
long = -78.4767 #cville!
api.trends_closest(lat, long)

##### At last we have a list of dictionaries that each have the key 'name', which is the trend we are looking to caputre.
There are other fields you are welcome to look at and analyze (i.e. promoted, tweet_volume) but let's just get a list of all the current worldwide trends. For that we will need a for loop:

In [None]:
trends = []
for dictionary in json_object[0].get("trends"):
    trends.append(dictionary.get("name"))
    
for trend in trends:
    print(trend)

## Now Let's look at searching for a specfic subject/hashtag
For documentation on a search query: <br>
http://docs.tweepy.org/en/v3.5.0/api.html#geo-methods


API.search(q[, lang][, locale][, rpp][, page][, since_id][, geocode][, show_user])
Returns tweets that match a specified query.

Parameters:	
<strong>q</strong> – the search query string<br>
<strong>lang</strong> – Restricts tweets to the given language, given by an ISO 639-1 code.<br>
<strong>locale</strong> – Specify the language of the query you are sending. This is intended for language-specific clients and the default should work in the majority of cases.<br>
<strong>rpp</strong> – The number of tweets to return per page, up to a max of 100.<br>
<strong>page</strong> – The page number (starting at 1) to return, up to a max of roughly 1500 results (based on rpp * page.<br>
<strong>since_id</strong> – Returns only statuses with an ID greater than (that is, more recent than) the specified ID.<br>
<strong>geocode</strong> – Returns tweets by users located within a given radius of the given latitude/longitude. The location is preferentially taking from the Geotagging API, but will fall back to their Twitter profile. The parameter value is specified by “latitide,longitude,radius”, where radius units must be specified as either “mi” (miles) or “km” (kilometers). Note that you cannot use the near operator via the API to geocode arbitrary locations; however you can use this geocode parameter to search near geocodes directly.<br>
<strong>show_user</strong> – When true, prepends “<user>:” to the beginning of the tweet. This is useful for readers that do not display Atom’s author field. The default is false.

In [None]:
#For future text analysis projects, we can search for a specific subject, not just trends, and get the text that is relevant

tweets = api.search('Trump', rpp = 100)

In [None]:
print(type(tweets))
print(type(tweets[0]))
print(tweets[0])

As we see above, tweets is type "Search Results." If we take the first element of the search results, that is a "Status" object. We can perform .text on each of these Statuses to get the text and use the text for later analysis.

In [None]:
for tweet in tweets:
    print(tweet.text)

The API is only allowing us to see 15 tweets at a time...Let's try to continuously collect data

In [97]:
start_time = time.time() #grabs the system time
keyword_list = ['US','Politics'] #track list

In [96]:
import io
#Listener Class Override
class listener(StreamListener):
 
    def __init__(self, start_time, time_limit=60):
 
        self.time = start_time
        self.limit = time_limit
        self.tweet_data = []
 
    def on_data(self, data):
 
        saveFile = io.open('raw_tweets.json', 'a', encoding='utf-8')
 
        while (time.time() - self.time) < self.limit:
 
            try:
 
                self.tweet_data.append(data)
 
                return True
 
 
            except BaseException as e:
                print('failed ondata,', str(e))
                time.sleep(5)
                pass
 
        saveFile = io.open('raw_tweets.json', 'w', encoding='utf-8')
        saveFile.write(u'[\n')
        saveFile.write(','.join(self.tweet_data))
        saveFile.write(u'\n]')
        saveFile.close()
        exit()
 
    def on_error(self, status):
 
        print(statuses)

In [None]:
twitterStream = Stream(auth, listener(start_time, time_limit=20)) #initialize Stream object with a time out limit
twitterStream.filter(track=keyword_list, languages=['en'])  #call the filter method to run the Stream Object

ProtocolError: ('Connection broken: IncompleteRead(7626 bytes read)', IncompleteRead(7626 bytes read))

<strong>Important Note:</strong> To prevent bots from spamming, Twitter will restrict your access or boot you if you perform too many actions automatically. 
You can limit the usage of your cursor to stay within the rate limit.

In [None]:
# In this example, the handler is time.sleep(15 * 60),
# but you can of course handle it in any way you want.

def limit_handled(cursor):
    while True:
        
        try:
            yield cursor.next()
        except tweepy.RateLimitError:
            time.sleep(15 * 60)

In [None]:
#this will follow every one of your followers with less than 300 friends while making sure you stay within the rate limit
#try at your own risk
#for follower in limit_handled(tweepy.Cursor(api.followers).items()):
#    if follower.friends_count < 300:
#        print follower.screen_name

#### Try out more actions on your own using the tweepy documentation here:
http://docs.tweepy.org/en/v3.5.0/api.html
#### Or try importing data to tweet, doing analytics on your timeline or a friend's timeline, or using another API (I have used yelp, google maps APIs and can help you out with that) 