# Analysing Twitter Data with Python

Outline:
- Credentials
- Collecting Data
- Loading and Accessing Tweets
- Streaming

The `tweepy` module can be used to collect Twitter data with the Streaming API.

In [17]:
# !pip install tweepy

In [41]:
from tweepy import OAuthHandler, API
# import tweepy

import json

## Credentials

First of all, we need to create a Twitter account, validate it, and then create a Twitter developer account. The developer account can be created within the [Twitter developer](https://developer.twitter.com/en/apps) web page.

Second, we create an app to generate a _Customer Key_, a _Consumer Secret_, an _Access Token_, and an _Access Token Secret_.

The steps to generate keys are as follows:
- Create a Twitter acoount, validate it with a phone number,
- Create a Twitter developer account,
- Create an app,
- Generate keys and tokens.

It is worth noting that the app's API keys should be kept secure. It is important not to commit API keys and access tokens to publicly accessible version control systems such as Github or BitBucket.

The Twitter credentials (keys and tokens) can be kept locally as a json file in the following format.

In [None]:
{"consumer_key":"API key",
 "consumer_secret":"API secret key",
 "access_token_key":"Access token",
 "access_token_secret":"Access token secret"
}

In [42]:
# Load Twitter app information
with open('twitter_cred.json','r') as file:
    twitter_cred = json.load(file)

In [43]:
consumer_key = twitter_cred['consumer_key']
consumer_secret = twitter_cred['consumer_secret']
access_token = twitter_cred['access_token_key']
access_token_secret = twitter_cred['access_token_secret']

`tweepy` library requires a Twitter API key to authenticate with Twitter.

In [44]:
# Consumer key authentication
auth = OAuthHandler(consumer_key, consumer_secret)

# Access key authentication
auth.set_access_token(access_token, access_token_secret)

# Set up the API with the authentication handler
api = API(auth)

We can print the username to see if our account is properly authenticated.

In [None]:
user = api.me()
print(user.name)

## Collecting Data

In [148]:
# most retweeted tweet
status = api.get_status('849813577770778624')

In [149]:
# status to json object
tweet_json = json.dumps(status._json, indent=2)
print(tweet_json)

{
  "created_at": "Thu Apr 06 02:38:40 +0000 2017",
  "id": 849813577770778624,
  "id_str": "849813577770778624",
  "text": "HELP ME PLEASE. A MAN NEEDS HIS NUGGS https://t.co/4SrfHmEMo3",
  "truncated": false,
  "entities": {
    "hashtags": [],
    "symbols": [],
    "user_mentions": [],
    "urls": [],
    "media": [
      {
        "id": 849813572351737856,
        "id_str": "849813572351737856",
        "indices": [
          38,
          61
        ],
        "media_url": "http://pbs.twimg.com/media/C8sk8QlUwAAR3qI.jpg",
        "media_url_https": "https://pbs.twimg.com/media/C8sk8QlUwAAR3qI.jpg",
        "url": "https://t.co/4SrfHmEMo3",
        "display_url": "pic.twitter.com/4SrfHmEMo3",
        "expanded_url": "https://twitter.com/carterjwm/status/849813577770778624/photo/1",
        "type": "photo",
        "sizes": {
          "small": {
            "w": 382,
            "h": 680,
            "resize": "fit"
          },
          "thumb": {
            "w": 150,
         

In [106]:
type(tweet_json), type(status)

(str, tweepy.models.Status)

In [82]:
status.text

'HELP ME PLEASE. A MAN NEEDS HIS NUGGS https://t.co/4SrfHmEMo3'

In [90]:
status.user.name

'Carter Wilkerson'

In [108]:
status.created_at

datetime.datetime(2017, 4, 6, 2, 38, 40)

In [144]:
# Get the available WOEID of a location 
#api.trends_available()

# Worldwide trends (WOEID=1)
trends_ww = api.trends_place(1)

# name of the 3rd top trend worldwide
trends_ww[0]['trends'][3]['name']

'#SözleşmeliKadrosuzUykusuz'

## Loading and Accessing Tweets

Tweets are collected from the Streaming API in **JSON** format. Therefore we  need to convert data into a Python data structure.

In [151]:
# Convert from JSON to Python object
tweet = json.loads(tweet_json)

# Print tweet text
print(tweet['text'])

# Print tweet id
print(tweet['id'])

HELP ME PLEASE. A MAN NEEDS HIS NUGGS https://t.co/4SrfHmEMo3
849813577770778624


In [158]:
# Print user handle
print(tweet['user']['screen_name'])

# Print user follower count
print(tweet['user']['followers_count'])

# Print user location
print(tweet['user']['location'])

# Print user description
print(tweet['user']['description'])

carterjwm
100261
Reno, NV - San Diego, CA
I kinda like chicken nuggets


## Streaming API

Streaming API allows us to collect real-time Twitter data based on either a sample or keyword filtering. 
>Using the streaming api has three steps.
 - Create a class inheriting from StreamListener
 - Using that class create a Stream object
 - Connect to the Twitter API using the Stream. [Source: tweepy](https://tweepy.readthedocs.io/en/v3.5.0/streaming_how_to.html)

Below code calls a Stream Listener. [Source](https://github.com/SocialDataAnalytics-Winter2018/lab04/blob/master/slistener.py) 

In [52]:
%run SListener.py

Alternatively, we can load the file into the cell with the `%load` magic command. (`%load?` for more info)

In [25]:
# %load SListener.py

In [53]:
from tweepy import Stream

# Instantiate the SListener object 
listen = SListener(api)

# Instantiate the Stream object
stream = Stream(auth, listen)

There are various streams available through Tweepy. We'll use _filter_ in this notebook.

In [None]:
# Set up words to track
keywords_to_track = ['#datascience', '#python']

# Begin collecting data
stream.filter(track = keywords_to_track)