# Twitter API

## Why Twitter ?


    
1. Copy the file `dot.twitter_config` to your home directory with the following command. Note that the `dot` portion of the name will be removed. It was only present in the course repo so that the file would not be invisible.
    ```console
    cp dot.twitter_config $HOME/.twitter_config
    ```  
2. Create a Twitter account if you don't have one. You may have to add a cellphone to your twitter account, to create an App. Write a couple test tweets if you are new to twitter. Then, navigate to [apps.twitter.com](https://apps.twitter.com/) and sign in.
2. Select `Create New App`. Enter a name, description, and website (you can use http://thisismetis.com) if you want. Click the agreement and select `Create your Twitter application`.
3. Navigate to the `Keys and Access Tokens` tab. Select `Create my access token`. Copy the following to the file `.twitter_config` that is in your home directory:
    * Consumer Key
    * Consumer Secret
    * Access Token
    * Access Token Secret


    

## Rest API vs Streaming API: 


REST:  
    - Query user accounts using OAuth
    - Allows you to access 'historical' tweets

STREAM:  
    - Essentially long-running request (left Open) using OAuth
    - Access realtime stream of data
       

### Rest API

On your laptop, install the two required libraries.

```console
conda install -y -c conda-forge requests-oauthlib 
```

In [1]:
from __future__ import print_function
import requests
from requests_oauthlib import OAuth1

import pandas as pd
import os, ast

Import the OAuth configuration file you just created.

In [2]:
config = {}
config_path = os.path.expanduser("~/.twitter_config")
with open(config_path,'r') as f:
    config = ast.literal_eval(f.read())


oauth = OAuth1(config["consumer_key"],
               config["consumer_secret"],
               config["access_token"],
               config["access_token_secret"])

Using this authorization token, get a list of your recent tweets.

In [4]:
response = requests.get("https://api.twitter.com/1.1/statuses/user_timeline.json",
                        auth=oauth)

tweets = response.json()

for key in tweets[0].keys():
    print(key)
    

created_at
id
id_str
text
truncated
entities
source
in_reply_to_status_id
in_reply_to_status_id_str
in_reply_to_user_id
in_reply_to_user_id_str
in_reply_to_screen_name
user
geo
coordinates
place
contributors
is_quote_status
retweet_count
favorite_count
favorited
retweeted
lang


In [7]:
for tweet in tweets[:5]:
    print(tweet['text'])

@prb_data Thaddeus and I are curious where your profile pic is taken at:D
Fuck hackers. im making a new account. Fuck.


Search for tweets on Monica Rogati.

In [15]:
parameters = {"q": "data science", "count":2}
response = requests.get("https://api.twitter.com/1.1/search/tweets.json",
                        params = parameters,
                        auth=oauth)

from pprint import pprint
pprint(response.json()['search_metadata'])
#pprint(response.json())

{'completed_in': 0.031,
 'count': 2,
 'max_id': 866834597408985088,
 'max_id_str': '866834597408985088',
 'next_results': '?max_id=866834405473501183&q=data%20science&count=2&include_entities=1',
 'query': 'data+science',
 'refresh_url': '?since_id=866834597408985088&q=data%20science&include_entities=1',
 'since_id': 0,
 'since_id_str': '0'}


In [16]:
tweets = response.json()['statuses']

print('PAGE 1')
for tweet in tweets[:1]:
    print(tweet['id'], tweet['text'])

PAGE 1
866834597408985088 RT @Intellipaat: What is #DataScience? Read more to know what is Data Science all about!! https://t.co/zF0sAvDBUs


Get the next page of tweets.

In [20]:
response.json()['search_metadata']

{'completed_in': 0.02,
 'count': 2,
 'max_id': 866834405473501183,
 'max_id_str': '866834405473501183',
 'next_results': '?max_id=866834031832440832&q=data%20science&count=2&include_entities=1',
 'query': 'data+science',
 'refresh_url': '?since_id=866834405473501183&q=data%20science&include_entities=1',
 'since_id': 0,
 'since_id_str': '0'}

In [21]:
search_url = "https://api.twitter.com/1.1/search/tweets.json"
next_page_url = search_url + response.json()['search_metadata']['next_results']
next_page_url
response = requests.get(next_page_url, auth=oauth)

print('PAGE 2')
for tweet in response.json()['statuses'][:5]:
    print(tweet['text'])

PAGE 2
Cuánto pagarías por un periódico con las noticias de mañana? Aún no lo hacemos pero estamos cerca, DATA SCIENCE el… https://t.co/VXRd3rtX4S
RT @passionatechica: Why did .@therriaultphd - Former Director of Data Science for the DNC (2014-2016) delete a tweet mocking #SethRich ???…


### Streaming API: Tweepy

We can also use the streaming API. Install tweepy on your laptop with the following command:
```console
conda install -y -c conda-forge tweepy
```

In [22]:
import tweepy

auth = tweepy.OAuthHandler(config["consumer_key"],
                           config["consumer_secret"])

auth.set_access_token(config["access_token"],
                      config["access_token_secret"])

api = tweepy.API(auth)

In [23]:
max_tweets=10

#Tweepy Cursor handles pagination .. 

for tweet in tweepy.Cursor(api.search, q="word2vec").items(max_tweets):
    print(tweet.text)

RT @DataJunkie: After watching @RichardSocher’s lecture on word2vec, I finally see that word2vec is not just a neural net version of LSA. C…
RT @DataJunkie: After watching @RichardSocher’s lecture on word2vec, I finally see that word2vec is not just a neural net version of LSA. C…
After watching @RichardSocher’s lecture on word2vec, I finally see that word2vec is not just a neural net version of LSA. Catching up.
RT @ledell: Awesome to see @juliasilge from @StackOverflow using @h2oai's #Word2Vec on SO comments!  

w2v #rstats demo here: https://t.co/…
Awesome to see @juliasilge from @StackOverflow using @h2oai's #Word2Vec on SO comments!  

w2v #rstats demo here:… https://t.co/TvdDiFTNK9
RT @juliasilge: I trained a word2vec model on @StackOverflow comments. 😮 #rstats (attn: @ledell) https://t.co/8h5L1Bwf4H
RT @juliasilge: I trained a word2vec model on @StackOverflow comments. 😮 #rstats (attn: @ledell) https://t.co/8h5L1Bwf4H
RT @juliasilge: I trained a word2vec model on @StackOverflow 

In [24]:
results = []

for tweet in tweepy.Cursor(api.search, q="word2vec").items(10):
    results.append(tweet)

In [25]:
results[0].text

'RT @DataJunkie: After watching @RichardSocher’s lecture on word2vec, I finally see that word2vec is not just a neural net version of LSA. C…'

### Import tweets into Pandas

In [26]:
def structure_results(results):
    id_list = [tweet.id for tweet in results]
    data = pd.DataFrame(id_list, columns=['id'])
    
    data["text"]     = [tweet.text.encode('utf-8') for tweet in results]
    data["datetime"] = [tweet.created_at for tweet in results]
    data["Location"] = [tweet.place for tweet in results]
    
    return data

In [27]:
data = structure_results(results)
data

Unnamed: 0,id,text,datetime,Location
0,866830626699132928,b'RT @DataJunkie: After watching @RichardSoche...,2017-05-23 01:38:20,
1,866830057204965378,b'RT @DataJunkie: After watching @RichardSoche...,2017-05-23 01:36:05,
2,866829417753923584,b'After watching @RichardSocher\xe2\x80\x99s l...,2017-05-23 01:33:32,
3,866824373239296000,"b""RT @ledell: Awesome to see @juliasilge from ...",2017-05-23 01:13:30,
4,866824368806010880,"b""Awesome to see @juliasilge from @StackOverfl...",2017-05-23 01:13:28,
5,866823990597349376,b'RT @juliasilge: I trained a word2vec model o...,2017-05-23 01:11:58,
6,866823787580334081,b'RT @juliasilge: I trained a word2vec model o...,2017-05-23 01:11:10,
7,866821852143865856,b'RT @juliasilge: I trained a word2vec model o...,2017-05-23 01:03:28,
8,866821848872308737,b'I trained a word2vec model on @StackOverflow...,2017-05-23 01:03:28,
9,866735437796311040,b'Funny visualization of word2vec https://t.co...,2017-05-22 19:20:06,


### Import tweets into MongoDB

Make sure the SSH tunnel to mongoDB that you started earlier this morning is still running. If not, start it with:

```console
ssh -NL 12345:localhost:27017 myaws
```

In [28]:
import json
from pymongo import MongoClient


client = MongoClient(port=12345)
db = client.legislation
tweets = db.news

In [30]:
for tweet in results:
    data = {}
    data['tweet'] = tweet.text.encode('utf-8') 
    data['datetime'] = tweet.created_at
    tweets.insert_one(data)

In [31]:
tweets.find_one()

{'_id': ObjectId('59239ab0c79810573df709a4'),
 'datetime': datetime.datetime(2017, 5, 23, 1, 38, 20),
 'tweet': b'RT @DataJunkie: After watching @RichardSocher\xe2\x80\x99s lecture on word2vec, I finally see that word2vec is not just a neural net version of LSA. C\xe2\x80\xa6'}