## Using the Twitter API: Bot Tutorial

##### Background: What is an API?

An Application Programming Interface (API) is the means by which a piece of software exposes some of its underlying functionality. Ideally an API is well documented so that application programmers can easily interact with it. 

We will look at a specific type of API: an interface exposed by a web site or a Web API. 

The practice of publishing APIs has allowed web communities to create an open architecture for sharing content and data. In this way, content that is created in one place can be dynamically posted and updated in multiple locations on the web. For example, Amazon or eBay APIs allow developers to use the existing retail infrastructure to create specialized web stores. Other APIs allow for:

-Smartphone applications (for accessing Twitter, LinkedIn, Facebook etc.);<br>
-Maps with location data (like Yelp);<br>
-Online purchases (verification of credit-card data); and <br>
-Sharing content between social networking sites.

###### Twitter API 

Many APIs will require you to establish an authorization key. For the twitter API, you must create an application here: https://apps.twitter.com/app/new<br>

I can provide you with temporary keys to my bot's account during class. 

Otherwise, to fill out the application linked above following these instructions: 
Write in a Name and Desciption. You can put in a place filler like http://www.google.com for the Website.<br>
Leave the Callback URL empty.<br> 
Submit the form.

On the following page go to the Keys and Access Tokens tab and make a note of the <strong>API Key</strong> and <strong>API Secret</strong>. Scroll down and create an Access Token. Make a note of the <strong>Access Token</strong> and <strong>Access Token Secret</strong>.

In [9]:
api_key = 'ZPuTHu7jnNc8lHBfAeGYSOxVF'
api_secret = 'atHZU01QAGhc115KsBtKY7NrtuQ3xf6A0lTTd6xIwIKMGx7zvS'
access_token = '2420979512-KpKq2xtQifJHfqzfkUJjM6URLrucNYoF7rK4jE3'
access_secret = 'eFLBON1KQeAi5oqdEwZCn3bFGgNqznF5UPqKQ2cF5riMH'

###### Luckily, someone has already created many of the important functions to interact with the Twitter API and its data. We will use tweepy to perform some actions within this notebook file. 

If you have not installed this package yet, go to the "Anaconda Prompt" terminal on your machine and execute:
<code>pip install tweepy</code>

In [51]:
import time
import io
import tweepy #package we will use to interact with the API. 
from tweepy import StreamListener, Stream

In [52]:
#Authorization
auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

Now Let's have some fun with all the twitter data at your disposal!

In [53]:
#Print to the console the tweets from your timeline
public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)

Imagine being taken away from your mother &amp; never seeing her again 🐯😢 https://t.co/y8ZVwL8Emn
I am in Brussels! Hi Brussels!
Take a look at some of the greatest teams in @VirginiaSports history
https://t.co/jqda7PRJbw https://t.co/VpuA5GueA0
RT @realDonaldTrump: An interesting cartoon that is circulating. http://t.co/OPG2R2ytkr
#PennywiseLuredMeWith fries 🎈🍟 https://t.co/2UUOsSVx3H
RT @TweetLikeAGirI: me after going to a 1 hour lecture and playing on my phone the whole time https://t.co/2jOD1gynRQ
RT @AynRandPaulRyan: What a headline!! 🤣🤣🤣 https://t.co/eLIsA69sPP
Eating animals is weird (like the shirt says). RT if you agree!

👕 ➡️ https://t.co/07MXMh7wQc https://t.co/vSHtlcWhIu
If you're unfamiliar w/ Nobel winner Ishiguro, suggest starting w/ his earliest work—A Pale View of Hills &amp; An Artist of the Floating World
Thank god there's no audio dayaruci @eosborne_makeup 💕 https://t.co/HlCz2o5LAD
RT @TheBlackSheep99: with all this hate in the world, it's refreshing to see that tru

In [54]:
# Get information about a user
user = api.get_user('twitter')

print(user.screen_name)
#get follwer count
print(user.followers_count)

Twitter
62001909


In [55]:
#see how the friends() method works
for friend in user.friends():
   print(friend.screen_name)
#people that the user follows that follow them back
#This is how websites like UnfollowMe, etc work! You could start making money off kids who are worried about their follower ratio!

AdsAPI
pichette
TwitterFashnJP
TwitterA11y
TwitterComms
TwitterSportsCA
TwitterVideoIN
TwitterREW
TwitterAdsHelp
TwitterMktgFR
JoinTheFlockJP
TwitterMktgBR
TwitterMediaBR
TwitterSportsJP
TwitterDevJP
TwitterTVJP
TwitterAmplify
PeriscopeHelp
TwitterLifeline
TwitterSportsAU


It's very common to get data from an API that is a JSON object since JSON is useful for storing data. However, it isn't always the most readable-- there are tons of brakets, arrays, and colons that organize the object. You will need to understand the structure of the JSON in order to parse and then comprehend the data.

In [56]:
#The trends closest feautre will find places trending closest to a given lat/long
lat = 38.0293
long = -78.4767 #cville!
api.trends_closest(lat, long)

[{'country': 'United States',
  'countryCode': 'US',
  'name': 'Richmond',
  'parentid': 23424977,
  'placeType': {'code': 7, 'name': 'Town'},
  'url': 'http://where.yahooapis.com/v1/place/2480894',
  'woeid': 2480894}]

In [57]:
json_object = api.trends_place(2480894) #this is the WOEID for Richmond. Feel free to find a different location
print(json_object)

[{'trends': [{'name': '#WorldTeachersDay', 'url': 'http://twitter.com/search?q=%23WorldTeachersDay', 'promoted_content': None, 'query': '%23WorldTeachersDay', 'tweet_volume': 67626}, {'name': '#ONA17', 'url': 'http://twitter.com/search?q=%23ONA17', 'promoted_content': None, 'query': '%23ONA17', 'tweet_volume': None}, {'name': '#ThursdayThoughts', 'url': 'http://twitter.com/search?q=%23ThursdayThoughts', 'promoted_content': None, 'query': '%23ThursdayThoughts', 'tweet_volume': 50969}, {'name': '#NYCC', 'url': 'http://twitter.com/search?q=%23NYCC', 'promoted_content': None, 'query': '%23NYCC', 'tweet_volume': 46151}, {'name': 'Kazuo Ishiguro', 'url': 'http://twitter.com/search?q=%22Kazuo+Ishiguro%22', 'promoted_content': None, 'query': '%22Kazuo+Ishiguro%22', 'tweet_volume': 94979}, {'name': '#ANAMasters', 'url': 'http://twitter.com/search?q=%23ANAMasters', 'promoted_content': None, 'query': '%23ANAMasters', 'tweet_volume': None}, {'name': 'Lollapalooza', 'url': 'http://twitter.com/searc

As we see above, there are a lot of other things you can analyze such as the volume of tweets or the amount of promoted content

In [58]:
#The JSON array's outer level is list with only one item
json_object[0]
#json_object[1] --> fails since index out of range

{'as_of': '2017-10-05T15:38:27Z',
 'created_at': '2017-10-05T15:35:59Z',
 'locations': [{'name': 'Richmond', 'woeid': 2480894}],
 'trends': [{'name': '#WorldTeachersDay',
   'promoted_content': None,
   'query': '%23WorldTeachersDay',
   'tweet_volume': 67626,
   'url': 'http://twitter.com/search?q=%23WorldTeachersDay'},
  {'name': '#ONA17',
   'promoted_content': None,
   'query': '%23ONA17',
   'tweet_volume': None,
   'url': 'http://twitter.com/search?q=%23ONA17'},
  {'name': '#ThursdayThoughts',
   'promoted_content': None,
   'query': '%23ThursdayThoughts',
   'tweet_volume': 50969,
   'url': 'http://twitter.com/search?q=%23ThursdayThoughts'},
  {'name': '#NYCC',
   'promoted_content': None,
   'query': '%23NYCC',
   'tweet_volume': 46151,
   'url': 'http://twitter.com/search?q=%23NYCC'},
  {'name': 'Kazuo Ishiguro',
   'promoted_content': None,
   'query': '%22Kazuo+Ishiguro%22',
   'tweet_volume': 94979,
   'url': 'http://twitter.com/search?q=%22Kazuo+Ishiguro%22'},
  {'name': '

In [59]:
#Now the JSON object acts like a dictionary, or a set of key-value pairs. The only key is "trends"
json_object[0].get('trends')#The trends closest feautre will find places trending closest to a given lat/long
lat = 38.0293
long = -78.4767 #cville!
api.trends_closest(lat, long)

[{'country': 'United States',
  'countryCode': 'US',
  'name': 'Richmond',
  'parentid': 23424977,
  'placeType': {'code': 7, 'name': 'Town'},
  'url': 'http://where.yahooapis.com/v1/place/2480894',
  'woeid': 2480894}]

##### At last we have a list of dictionaries that each have the key 'name', which is the trend we are looking to caputre.
There are other fields you are welcome to look at and analyze (i.e. promoted, tweet_volume) but let's just get a list of all the current worldwide trends. For that we will need a for loop:

In [60]:
trends = []
for dictionary in json_object[0].get("trends"):
    trends.append(dictionary.get("name"))
    
for trend in trends:
    print(trend)

#WorldTeachersDay
#ONA17
#ThursdayThoughts
#NYCC
Kazuo Ishiguro
#ANAMasters
Lollapalooza
Tropical Storm Nate
Kellyanne Conway
National Space Council
Bill Haslam
Bernie Mac
Repeal the Second Amendment
Stephen Strasburg
Angry GOP
Stevan Ridley
Top House Republicans
Raising Dion
Ivana Trump
Austin Maddox
UBIQ
Arcade Edition
Energy East
Blue Orbit
NFL's Joe Lockhart
OLB Tony Washington Jr.
Bruce Arena
Gwynne Shotwell
#PennywiseLuredMeWith
#CatalystATL
#AM2DM
#RockHall2018
#EEDay2017
#MyHomeTownIn4Words
#WorldBalletDay
#DoSomethingNiceDay
#JOOHONEYDAY
#AutomotiveTVandMovies
#LeadingtheWay17
#Worlds2017
#EarnHistory
#DreamActNow
#SYRvAUS
#FORRB2B
#FutureXLive
#AsifAtUSIP
#Trees4Threes
#7for7Yugyeom
#LiveAtUrban
#MilanoTorino


## Now Let's look at searching for a specfic subject/hashtag
For documentation on a search query: <br>
http://docs.tweepy.org/en/v3.5.0/api.html#geo-methods


API.search(q[, lang][, locale][, rpp][, page][, since_id][, geocode][, show_user])
Returns tweets that match a specified query.

Parameters:	
<strong>q</strong> – the search query string<br>
<strong>lang</strong> – Restricts tweets to the given language, given by an ISO 639-1 code.<br>
<strong>locale</strong> – Specify the language of the query you are sending. This is intended for language-specific clients and the default should work in the majority of cases.<br>
<strong>rpp</strong> – The number of tweets to return per page, up to a max of 100.<br>
<strong>page</strong> – The page number (starting at 1) to return, up to a max of roughly 1500 results (based on rpp * page.<br>
<strong>since_id</strong> – Returns only statuses with an ID greater than (that is, more recent than) the specified ID.<br>
<strong>geocode</strong> – Returns tweets by users located within a given radius of the given latitude/longitude. The location is preferentially taking from the Geotagging API, but will fall back to their Twitter profile. The parameter value is specified by “latitide,longitude,radius”, where radius units must be specified as either “mi” (miles) or “km” (kilometers). Note that you cannot use the near operator via the API to geocode arbitrary locations; however you can use this geocode parameter to search near geocodes directly.<br>
<strong>show_user</strong> – When true, prepends “<user>:” to the beginning of the tweet. This is useful for readers that do not display Atom’s author field. The default is false.

In [61]:
#For future text analysis projects, we can search for a specific subject, not just trends, and get the text that is relevant

tweets = api.search('Trump', rpp = 100)

In [62]:
print(type(tweets))
print(type(tweets[0]))
print(tweets[0])

<class 'tweepy.models.SearchResults'>
<class 'tweepy.models.Status'>
Status(_api=<tweepy.api.API object at 0x107a8f4e0>, _json={'created_at': 'Thu Oct 05 15:38:00 +0000 2017', 'id': 915964293811752960, 'id_str': '915964293811752960', 'text': 'RT @ericonederful: Two typos in a row. One more and I get a business degree from Trump University.', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'ericonederful', 'name': 'Mr. Onederful®', 'id': 21968079, 'id_str': '21968079', 'indices': [3, 17]}], 'urls': []}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4686094543, 'id_str': '4686094543', 'name': 'HighlyFunctionalDork', 'screen_name': 'DorkusRegis', 'locat

As we see above, tweets is type "Search Results." If we take the first element of the search results, that is a "Status" object. We can perform .text on each of these Statuses to get the text and use the text for later analysis.

In [63]:
for tweet in tweets:
    print(tweet.text)

RT @ericonederful: Two typos in a row. One more and I get a business degree from Trump University.
RT @ananavarro: If only Trump Administration treated transgenders with same respect it treats Putin...or Putin with same animosity it treat…
RT @yoyoha: Current cost of Trump's golf weekends so far: $71,556,561 https://t.co/sPn5FEeKYI
@NathanH16238941 @AbeFroman @Vets_4_Trump @pastormarkburns The point isn't if they have the right to kneel They do… https://t.co/FMmpwfHXAg
RT @DineshDSouza: Excellent! We will revisit this when the earth gets noticeably hotter than it was when we were kids https://t.co/mHlktgVG…
RT @TalbertSwan: @TaIbertSwan @realDonaldTrump Poor misguided soul, #Trump Has done nothing to make America great. He’s polarized the natio…
If ur against Trump it shows u can't see down the rd , not even a short distance, Hes got a corrupt gov to fight for awhile longer and bo to
@StellaStanding @IU_Mike @EllenMorris1222 @marcorubio @TaxPolicyCenter It doesn't make up for it. 
http

The API is only allowing us to see 15 tweets at a time...Let's try to continuously collect data

In [64]:
start_time = time.time() #grabs the system time
keyword_list = ['US','Politics'] #track list

In [65]:
import io
#Listener Class Override
class listener(StreamListener):
 
    def __init__(self, start_time, time_limit=60):
 
        self.time = start_time
        self.limit = time_limit
        self.tweet_data = []
 
    def on_data(self, data):
 
        saveFile = io.open('raw_tweets.json', 'a', encoding='utf-8')
 
        while (time.time() - self.time) < self.limit:
 
            try:
 
                self.tweet_data.append(data)
 
                return True
 
 
            except BaseException as e:
                print('failed ondata,', str(e))
                time.sleep(5)
                pass
 
        saveFile = io.open('raw_tweets.json', 'w', encoding='utf-8')
        saveFile.write(u'[\n')
        saveFile.write(','.join(self.tweet_data))
        saveFile.write(u'\n]')
        saveFile.close()
        exit()
 
    def on_error(self, status):
 
        print(statuses)

In [None]:
twitterStream = Stream(auth, listener(start_time, time_limit=20)) #initialize Stream object with a time out limit
twitterStream.filter(track=keyword_list, languages=['en'])  #call the filter method to run the Stream Object

KeyboardInterrupt: 

<strong>Important Note:</strong> To prevent bots from spamming, Twitter will restrict your access or boot you if you perform too many actions automatically. 
You can limit the usage of your cursor to stay within the rate limit.

In [None]:
# In this example, the handler is time.sleep(15 * 60),
# but you can of course handle it in any way you want.

def limit_handled(cursor):
    while True:
        
        try:
            yield cursor.next()
        except tweepy.RateLimitError:
            time.sleep(15 * 60)

In [None]:
#this will follow every one of your followers with less than 300 friends while making sure you stay within the rate limit
#try at your own risk
#for follower in limit_handled(tweepy.Cursor(api.followers).items()):
#    if follower.friends_count < 300:
#        print follower.screen_name

#### Try out more actions on your own using the tweepy documentation here:
http://docs.tweepy.org/en/v3.5.0/api.html
#### Or try importing data to tweet, doing analytics on your timeline or a friend's timeline, or using another API (I have used yelp, google maps APIs and can help you out with that) 