Let us start by inspecting the robots.txt: https://twitter.com/robots.txt

The **robots exclusion standard** (aka. robots exclusion protocol or robots.txt), is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.

## Twitter-side preparation ##

Twitter will allow you to access a small part of the complete Twitter stream through their Application Programming Interface (API). However, this stream is only accessible through authorized requests to the Twitter API. This means that we need to register our application at Twitter, which will provide us with the necessary access codes and passwords.

### What is an API? ###
... an application programming interface (API) is a set of subroutine definitions, communication protocols, and tools for building software. In general terms, it is a set of clearly defined methods of communication between various components. A good API makes it easier to develop a computer program by providing all the building blocks, which are then put together by the programmer. An API may be for a web-based system, operating system, database system, computer hardware, or software library. An API specification can take many forms, but often includes specifications for routines, data structures, object classes, variables, or remote calls.

### Create a Twitter account ###
Only registered users of Twitter can create applications. Our first step, therefore, is to create a Twitter account. Please visit the website of [Twitter](https://twitter.com/) and if you do not have account yet, please create one.

### Register an application ###
In order to have access to Twitter data programmatically, we need to create an application which interacts with the Twitter API. To create this application, first visit the website [https://apps.twitter.com/](https://apps.twitter.com/), login to Twitter (if you're not already logged in), and click the button which says "Create New App".

NB. from August 16th, 2018 you have to *apply* for a developer account.

### Save credentials to json file

In [None]:
import json

# Enter your keys/secrets as strings in the following fields
credentials = {}  
credentials["CONSUMER_KEY"] = ""
credentials["CONSUMER_SECRET"] = ""
credentials["ACCESS_TOKEN"] = ""
credentials["ACCESS_SECRET"] = ""

# Save the credentials object to file
with open("twitter_credentials.json", "w") as file:  
    json.dump(credentials, file)

### read credentials from json file

In [None]:
# Load credentials from json file
with open("twitter_credentials.json", "r") as file:  
    creds = json.load(file)


## API wrappers: tweepy and twython ##

### Tweepy module ###
We need to install the third-party Python package [Tweepy](http://docs.tweepy.org/en/v3.5.0/) which provides tools for querying with the Twitter API.

Authentication and API wrapper

In [None]:
# basic functionality of the Twitter API
import tweepy

authentication = tweepy.OAuthHandler(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])
authentication.set_access_token(creds['ACCESS_TOKEN'], creds['ACCESS_SECRET'])
api = tweepy.API(authentication,wait_on_rate_limit=True)

In [None]:
for tweet in tweepy.Cursor(api.home_timeline).items(10):
    print()
    print(tweet.text)

In [None]:
for follower in tweepy.Cursor(api.friends).items():
    print(follower.name)

### Twython module ###
[Twython](https://twython.readthedocs.io/en/latest/) is another (an more powerful) set of tools

In [None]:
# Import Twython class
from twython import Twython  


# Instantiate an object
python_tweets = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])

# Create our query
query = {'q': 'Trump',  
        'result_type': 'popular',
        'count': 10,
        'lang': 'en',
        }

In [None]:
import pandas as pd

# Search tweets
dict_ = {'user': [], 'date': [], 'text': [], 'favorite_count': []}  
for status in python_tweets.search(**query)['statuses']:  
    dict_['user'].append(status['user']['screen_name'])
    dict_['date'].append(status['created_at'])
    dict_['text'].append(status['text'])
    dict_['favorite_count'].append(status['favorite_count'])

# Structure data in a pandas DataFrame for easier manipulation
df = pd.DataFrame(dict_)  
df.sort_values(by='favorite_count', inplace=True, ascending=False)  
df.to_csv("output.csv")

df.head(10)

## Streaming Tweets with Twython ##

In [None]:
from twython import TwythonStreamer  
import csv

# Filter out unwanted data
def process_tweet(tweet):  
    d = {}
    d['hashtags'] = [hashtag['text'] for hashtag in tweet['entities']['hashtags']]
    d['text'] = tweet['text']
    d['user'] = tweet['user']['screen_name']
    d['user_loc'] = tweet['user']['location']
    return d


# Create a class that inherits TwythonStreamer
class MyStreamer(TwythonStreamer):     

    # Received data
    def on_success(self, data):

        # Only collect tweets in English
        if data['lang'] == 'en':
            tweet_data = process_tweet(data)
            self.save_to_csv(tweet_data)

    # Problem with the API
    def on_error(self, status_code, data):
        print(status_code, data)
        self.disconnect()

    # Save each tweet to csv file
    def save_to_csv(self, tweet):
        with open(r'saved_tweets.csv', 'a') as file:
            writer = csv.writer(file)
            writer.writerow(list(tweet.values()))

In [None]:
# Instantiate from our streaming class
stream = MyStreamer(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'],  
                    creds['ACCESS_TOKEN'], creds['ACCESS_SECRET'])
# Start the stream
stream.statuses.filter(track='Trump')

## Analysis ##

In [None]:
import pandas as pd  
DF = pd.read_csv("saved_tweets.csv",header=None)  

DF.columns = ["content","hashtags","screen_name","location"]
print(DF)