# Twitter API

## Lecture Notes

##### Why Twitter ?


"While physics and math may tell us how the universe began, they are not much use in predicting Human Behavior because there are far too many Equations to Solve" 
    
    -Stephan Hawking
    

There are a lot of other great APIs out there as well that you should keep in mind.

[Wikipedia](https://www.mediawiki.org/wiki/API:Main_page)  
[Stack exchange](https://api.stackexchange.com/docs)  
[Spotify](http://spotipy.readthedocs.io/en/latest/)

Often there's already a python package with a much easier interface for using the API, like spotipy that's linked to above. 

#### Rest API vs Streaming API: 


REST:  
    - Query user accounts using OAuth
    - Allows you to access 'historical' tweets
    - Only lets you access tweets going back 7 days

STREAM: 
    - Essentially long-running request (left Open) using OAuth
    - Access realtime stream of data
       
Check out the [twitter developer documentation](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets) for more info.

#### Rest API

In [None]:
from __future__ import print_function
import pandas as pd

In [None]:
#!pip install requests_oauthlib

In [None]:
import requests
from requests_oauthlib import OAuth1

#OAuth ~ simple way to to publish & interact with data

In [None]:
#!pip install cnfg

In [None]:
# Importing our Config - note that with this setup we can safely conceal
# our personal keys but still share generalizable code

# Replace the path below with the correct path to your config file
import cnfg
config = cnfg.load("Documents/twitter_api/twitter_config")

oauth = OAuth1(config["consumer_key"],
               config["consumer_secret"],
               config["access_token"],
               config["access_token_secret"])

In [None]:
response = requests.get("https://api.twitter.com/1.1/statuses/user_timeline.json",
                        auth=oauth)

tweets = response.json()

for key in tweets[0].keys():
    print(key)
    

In [None]:
# I only have one tweet (I'm not really a twitter user )
for tweet in tweets:
    print(tweet['text'])

Now let's use the twitter API to search for 5 tweets that are about our favorite topic.

In [None]:
parameters = {"q": "gradient boosting", "count":5}
response = requests.get("https://api.twitter.com/1.1/search/tweets.json",
                        params = parameters,
                        auth=oauth)

from pprint import pprint
pprint(response.json()['search_metadata'])

In [None]:
tweets = response.json()['statuses']

print('PAGE 1')
for tweet in tweets:
    print(tweet['id'], tweet['text'])

In [None]:
search_url = "https://api.twitter.com/1.1/search/tweets.json"
next_page_url = search_url + response.json()['search_metadata']['next_results']

response = requests.get(next_page_url, auth=oauth)

print('PAGE 2')
for tweet in response.json()['statuses']:
    print(tweet['text'])

#### STREAMING API ~ TWEEPY

In [None]:
# !pip install tweepy

In [None]:
import tweepy

auth = tweepy.OAuthHandler(config["consumer_key"],
                           config["consumer_secret"])
auth.set_access_token(config["access_token"],
                      config["access_token_secret"])

api=tweepy.API(auth)

In [None]:
max_tweets=1

#Tweepy Cursor handles pagination .. 

for tweet in tweepy.Cursor(api.search,q="data science").items(max_tweets):
    print(tweet)

In [None]:
results=[]

for tweet in tweepy.Cursor(api.search,q="gradient boosting").items(10):
    results.append(tweet)

#### Import tweets into Pandas

In [None]:
#  import pandas as pd
def structure_results(results):
    id_list=[tweet.id for tweet in results]
    data=pd.DataFrame(id_list,columns=['id'])
    
    data["text"]= [tweet.text.encode('utf-8') for tweet in results]
    data["datetime"]=[tweet.created_at for tweet in results]
    data["Location"]=[tweet.place for tweet in results]
    
    return data

In [None]:
data=structure_results(results)
data.head()

For more on tweepy and using it to set up a true stream, check out the [tweepy doc](http://docs.tweepy.org/en/v3.4.0/streaming_how_to.html) as well as this [nice guide from dataquest](https://www.dataquest.io/blog/streaming-data-python/). 

Combining these steps with writing to a mongo database as below, you can setup a live data acquisition pipeline that continually grabs data from twitter as tweets hit and stores them locally for later static analysis/modeling. We'll show an example at the very bottom of the notebook.

#### Import Tweets into MongoDB

Install mongo locally with brew:    
```
brew update    
brew install mongodb
```
After downloading mongo, we want to create a place for mongo data files to live.  Run:    
```
sudo mkdir -p /data/db
```
Make sure that /data/db directory has the right permissions:

```
sudo chown `active_username` /data/db
(enter password) 
```

(username ~ is just mac username (you can double
check this by running 'whoami' in the terminal)


Run mongo daemon:
```
mongod
```

(In order to access direct mongo functionality, you can just
run 'mongo' in a separate terminal)

Once the mongo daemon is running, we can create a **database** "example" with a **collection** "datascience" as below. Note that the database and collection only exist once we insert tweets into them, since mongo is lazy by design.

In [None]:
#!pip install pymongo

In [None]:
import json
from pymongo import MongoClient


client = MongoClient(port=27017)
db = client.example
tweets = db.datascience

In [None]:
for tweet in results:
    data={}
    data['tweet']=tweet.text.encode('utf-8') 
    data['datetime']=tweet.created_at
    tweets.insert_one(data)

In [None]:
tweets.find_one()

In [None]:
tweets

In [None]:
tweets.count()

Finally, here is a quick example of pulling it all together into a live data acquisition feed/store. This will run continually and grab new tweets as they hit if it doesn't error, so let's leave it for about a minute and see what we get. You should have mongod still running for this.

If you want to use this for a project, you'll want to add some robustness to the stream listener - check out the tweepy doc for more info.

In [None]:
client = MongoClient(port=27017)
db = client.example_live
livetweets = db.datascience

livetweets.drop() # only if we want to start from scratch

class StreamListener(tweepy.StreamListener):

    def on_status(self, status):
        
        # skip retweets
        if status.retweeted: 
            return
    
        # store tweet and creation date
        data = {}
        data['tweet'] = status.text
        data['datetime'] = status.created_at

        # insert into db
        try:
            livetweets.insert_one(data)
        except:
            pass

stream_listener = StreamListener()
stream = tweepy.Stream(auth=api.auth, listener=stream_listener)
stream.filter(track=['data science'])

In [None]:
livetweets.count()

In [None]:
livetweets.find_one()

Nice!