# An Introduction to Social Media APIs
In this tutorial, we are going to learn to make basic requests from the Facebook and Twitter APIs (application program interface).

Each time, we will build out our scripts to perform the same set of actions:

1. Print 100-200 posts to the screen (100 is the max for Facebook, Twitter's max is 200)

2. Save that data to a CSV file

3. Loop our script a few times to increase the number of posts we download

# Facebook
We will begin with Facebook. Before we write our first script, let's get acquainted with the Graph API Explorer console.

For this exercise, we are going to look at posts made from Donald Trump's Facebook page. The user handle for Trump's Facebook page is @DonaldTrump.

![Trump's Facebook handle](https://www.dropbox.com/s/y7smhoginpxa72t/trumpfbhandle.jpg?raw=1)

### 1. First, get to know the Facebook Graph API
* Visit the Facebook Graph API Explorer. This is where you can experiment with the API and see what you get back.
    * https://developers.facebook.com/tools/explorer/
* (If the Access Token bar is empty, click Get Token.)
* Enter the user handle of Facebook account you want to look at, followed by **/posts** (e.g. **donaldtrump/posts**).

As you will see, the API returns three or four pieces of information for each post:
    1. "created_time"
    2. "message" (if present)
    3. "story" (if present)
    3. "id"

*This information is useful, but there is a lot more available to us. We just have to tell the API what we want. Just a few of the additional fields are illustrated here:*

![Examples of Graph API syntax](https://www.dropbox.com/s/pmucwa2tm70i0e5/fbgraphnames.jpg?raw=1)

### 2. Add some additional fields to your search
* To get more specific in your search by adding additional syntax. Take a look at the documentation about Facebook posts. The Fields table lists all of the pieces of information the API can return about individual Facebook posts. The syntax you need is in the Name column (e.g. the documentations tells us that 'from' returns "Information (name and id) about the Profile that created the Post.").
    * https://developers.facebook.com/docs/graph-api/reference/v3.0/post
* You update your search by adding **?fields=** followed by the syntax from the Name column of the Fields table, e.g.
    * donaldtrump/posts***?fields=created_time,from,message,link,shares,comments***
* Return to the Graph API Explorer console and do a new search containing additional fields.

### 3. Writing our first script to interact with the Facebook Graph API: Print a user's last 100 posts to the screen
In this script, we are going to:

* Ask the Facebook Graph API for the last 100 posts, then...
   * Tell Python to go through each one and look for text in the message field, then...
        * Print the text in the message field to the screen (if it exists)
        
**TASKS:**

1) Copy the Access Token from the Graph API Explorer and paste it between the quote marks after **access_token** in the code below.

2) Enter the ID of the account you would like to look at between the speech marks after **publisher_id**, e.g. 'donaldtrump', 'nytimes', 'breitbart'

Run the code in the cell using SHIFT + ENTER.

In [None]:
# Tell Python we want to use the Requests library (http://docs.python-requests.org/en/master/)
import requests

# Copy the Access Token from the Graph API Explorer console
access_token = ''
# Enter the Facebook handle of the page you want to scrape, e.g. 'nytimes' or 'Breitbart'
publisher_id = 'donaldtrump'

# Tell the API want fields you want it to return (copy everything after 'fields=' from the Graph API console)
# Possible fields for posts can be found here: https://developers.facebook.com/docs/graph-api/reference/v3.0/post/
post_fields = 'created_time,from,message,link,shares,comments'

# Construct the Facebook Graph API URL to be opened
graph_url = 'https://graph.facebook.com/v3.0/' + publisher_id + '/posts/?fields=%s&limit=100&access_token=%s' % (post_fields, access_token)
# Print the Graph API URL to the screen (not necessary - just useful for checking in browser)
print (graph_url)

# Get the contents of URL using Requests
r = requests.get(graph_url)
# Extract the JSON from the Graph page
json = r.json()
# Identify the JSON pertaining to the ~100 Facebook posts returned by the API
posts = json['data']

# Go through each of the posts returned by the API
for post in posts:
    # For each post, display the text ('message') followed by a new line to make the text easier to read ('\n')
    try:
        post_text = post['message']
        print(post_text + '\n')
    except:
        print('No message \n')

### ... BONUS: with a couple more lines of very similar code you can get the text of the comments
With a very slight alteration to the code at the bottom of the script above, we can change our script so that it prints the first batch of comments below the post instead. The process it goes through is:

* Ask the Facebook Graph API for the first 100 posts, then...
   * Tell Python to go through each one and look for comments, then...
        * Ask the Facebook Graph API for the first page of comment replies (if any exist), then...
            * Print the text in the comment message field to the screen
            
Again, paste your Access Token into the **access_token** variable.

Then run the code using SHIFT + ENTER

In [None]:
# Tell Python we want to use the Requests library
import requests

# Copy the Access Token from the Graph API Explorer console
access_token = ''
# Enter the Facebook handle of the page you want to scrape, e.g. 'nytimes' or 'Breitbart'
publisher_id = 'donaldtrump'

# Tell the API want fields you want it to return (copy everything after 'fields=' from the Graph API console)
# Possible fields for posts can be found here: https://developers.facebook.com/docs/graph-api/reference/v2.9/post/
post_fields = 'created_time,from,message,link,shares,comments'

# Construct the Facebook Graph API URL to be opened
graph_url = 'https://graph.facebook.com/v3.0/' + publisher_id + '/posts/?fields=%s&limit=100&access_token=%s' % (post_fields, access_token)
# Print the Graph API URL to the screen (not necessary - just useful for checking in browser)
print (graph_url)

# Open and read the URL using Requests
r = requests.get(graph_url)
# Extract the JSON from the Graph page
json = r.json()
# Identify the JSON pertaining to the ~100 Facebook posts returned by the API
posts = json['data']

# Go through each of the posts returned by the API
for post in posts:
    # THIS IS WHERE THE NEW CODE STARTS
    # Find the JSON for the comments (if any exist)
    try:
        # Extract the first set of comments and assign them to a variable named 'comments'
        comments = post['comments']['data']
        
        # Go through each of the comments
        for comment in comments:
            
            comment_text = comment['message']
            print(comment_text + '\n')
    except:
        print('No comments \n')

### ...Now let's add a little extra code to our first script so the user's posts get saved to a CSV file
Viewing posts on screen is fine, but chances are you are going to want to download data from the API so you can perform some kind of analysis.

Python has a module that makes it really easy to save stuff to CSV files. These can then be viewed with spreadsheet software (Google Sheers, Excel, Numbers, etc), text editors, etc.

The code below builds upon our first script and creates a CSV file containing the 'created_time' (timestamp), 'link' and 'message' of each post to a CSV file named **'facebook_posts.csv'**

This file will be saved to the directory from which you are running this notebook, e.g. Desktop

This script is going to:

* Create a file named 'facebook_posts.csv' and initiate Python's CSV writer
* Ask the Facebook Graph API for the first 100 posts
   * Tell Python to go through each one and look for a 'created_time', 'link' and 'message', then...
            * Bundle those three ('created_time', 'link' and 'message') together into a variable we'll call 'post_info', then...
            * Add the contents of that 'post_info' variable to our CSV and print it to screen
   * Finally, it will close the CSV file
   
You know the drill:
* Paste your Access Token, then run the code (SHIFT + ENTER)

In [None]:
# Import the 'csv' module so we can easily write to a CSV file
import csv
# Import the 'requests' module to interact with web pages and the 'string' module to manipulate strings of text
import requests

# Copy your Access Token from the Graph API Explorer: https://developers.facebook.com/tools/explorer
access_token = ''

# Create an output file, name it and give it write privileges ('w')
csv_file = open('facebook_posts.csv', 'w')
# Initiate the CSV writer
writer = csv.writer(csv_file)

# Enter the Facebook handle of the page you want to scrape, e.g. 'nytimes', 'Breitbart', etc.
publisher_id = 'donaldtrump'

# Enter the post fields you want to be returned by the Facebook API (copy these from the API Graph Explorer)
post_fields = 'created_time,from,message,link,shares,comments'

# Construct the Facebook Graph API URL to be opened
graph_url = 'https://graph.facebook.com/v3.0/' + publisher_id + '/posts/?fields=%s&limit=100&access_token=%s' % (post_fields, access_token)
# Print your graph URL
print (graph_url)

# Open and read your graph URL using Requests
r = requests.get(graph_url)
# Extract the JSON from the Graph page
json = r.json()
# Identify the JSON pertaining to the ~100 Facebook posts returned by the API
posts = json['data']

# Extract the data we want from every post returned by the API
for post in posts:
    # Extract the post's timestamp
    timestamp = post['created_time']
    
    # Extract the post's id
    try:
        link = post['link']
    except:
        link = 'No link'
    
    # Extract the 'message' text from the post
    try:
        message = post['message']
    except:
        message = 'No message'
        
    post_info = timestamp, link, message

    # Write the contents of the 'post_info' to the output file
    writer.writerow(post_info)
    # Print the contents to screen so we can see what we're getting
    print(post_info)

# Close the CSV file you have been writing to
csv_file.close()

### ...finally, lets add a little code so the script goes through multiple pages of results
It's possible that you will want to download more than the 100 posts you get on the first page of results.

Conveniently, the last thing the API tells us, at the bottom of the page, is the URL of the next page of results in ['paging']['next']

![The address of the next of results](https://www.dropbox.com/s/znvk1ah5mtbxoe9/paging.jpg?raw=1)

One (slightly clunky) way of working through multiple pages of results is to use a for loop.

When the code below gets to the final result of a page, it looks for the URL of the next page of results. It then changes the graph_url variable to contain that URL, so that when the loop returns to the beginning, it is using this new URL.

The code below repeats itself three times AKA it goes through three pages of results, resulting in 300 posts.

In [None]:
# Import the 'csv' module so we can easily write to a CSV file
import csv
# Import the 'requests' module to interact with web pages and the 'string' module to manipulate strings of text
import requests

# Copy your Access Token from the Graph API Explorer: https://developers.facebook.com/tools/explorer
access_token = ''

# Create an output file, name it and give it write privileges ('w')
csv_file = open('facebook_posts_large.csv', 'w')
# Initiate the CSV writer
writer = csv.writer(csv_file)

# Enter the Facebook handle of the page you want to scrape, e.g. 'nytimes', 'Breitbart', etc.
publisher_id = 'donaldtrump'

# Enter the post fields you want to be returned by the Facebook API (copy these from the API Graph Explorer)
post_fields = 'created_time,from,message,link,shares,comments'

# Construct the initial Facebook Graph API URL to be opened
graph_url = 'https://graph.facebook.com/v3.0/' + publisher_id + '/posts/?fields=%s&limit=100&access_token=%s' % (post_fields, access_token)

# Set a loop to repeat three times
for i in range(0, 3):
    # Print your graph URL
    print (graph_url)

    # Open and read your graph URL using Requests
    r = requests.get(graph_url)
    # Extract the JSON from the Graph page
    json = r.json()
    # Identify the JSON pertaining to the ~100 Facebook posts returned by the API
    posts = json['data']
    # THIS IS NEW CODE
    # Identify the JSON pertaining to the URL of the next page of results and call it 'next_page'
    next_page = json['paging']['next']

    # Extract the data we want from every post returned by the API
    for post in posts:
        # Extract the post's timestamp
        timestamp = post['created_time']

        # Extract the post's id
        try:
            link = post['link']
        except:
            link = 'No link'

        # Extract the 'message' text from the post
        try:
            message = post['message']
        except:
            message = 'No message'

        post_info = timestamp, link, message

        # Write the contents of the 'post_info' to the output file
        writer.writerow(post_info)
        
        # THIS IS NEW CODE
        # Update the URL in the graph_url variable so it becomes the one we called 'next_page'
        # When the loop restarts it will use this updated address to gather data from the 
        graph_url = next_page

# Close the CSV file you have been writing to
csv_file.close()

print("Finished downloading posts!")

# Twitter
Let's turn our attention to Twitter.

To use the Twitter API, we need to create an app:

* Go to https://apps.twitter.com
* Click Create New App
* Fill out the Name, Description and Website fields, agree to the Developer Agreement and click Create your Twitter application

Once your app has been created...

* Click the 'Keys and Access Tokens' tab
* We are going to need:
    * Consumer Key (API Key)
    * Consumer Secret (API Secret)

### 1. Take a look at what comes back from the API
Unlike Facebook, Twitter no longer has an API console for us to play around with.

However, there are a number of Python modules that will do the heavy lifting for you and make it very easy to gather tweets.

We are going to use Twython. (https://twython.readthedocs.io/en/latest/)

Like Facebook, the data that comes back from the Twitter API is a long stream of JSON, some of which is nested (e.g. "hashtags" is nested within "entities").

* **Have a look at a sample response for a user's tweets:** https://github.com/twitterdev/tweet-updates/blob/master/samples/initial/compatibility_extended_13996.json

*N.B. If you want to view the documentation for the Twitter API it is here: https://developer.twitter.com/en/docs/api-reference-index*

*The docs for Get Tweets timeline, which we are interacting with via Twython, are here: https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline*

### 2. Use the Twython library to print 200 tweets
We are going to use the script below to get a user's last 200 tweets.

Twython has made it so easy for us that we can get a user's tweets back with about 10 lines of code, most of which is copied directly from Twython's support page.

1) Paste your Consumer Key (API Key) between the speech marks after **APP_KEY**

2) Paste your Consumer Secret (API Secret) between the speech marks after **APP_SECRET**

3) Enter the Twitter handle of the account you want to look at between the speech marks after **user_id**, e.g. 'realdonaldtrump', 'nytimes', etc.

In [None]:
from twython import Twython

# Paste the Consumer Key (API Key) from apps.twitter.com
APP_KEY = ''
# Paste the Consumer Secret (API Secret) from apps.twitter.com
APP_SECRET = ''

# The following three lines are lifted from the https://twython.readthedocs.io/en/latest/usage/starting_out.html#oauth2
# Full disclosure: I don't understand what they do, but they work, which is all that matters
twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()
twitter = Twython(APP_KEY, access_token=ACCESS_TOKEN)

# Enter the Twitter handle of the account you want to look at
user_id = 'realdonaldtrump'

# Get JSON containing the user's 200 most recent tweets
tweets = twitter.get_user_timeline(screen_name=user_id, count=200, tweet_mode='extended')

# Go through each of the 200 tweets in 'tweets' JSON
for tweet in tweets:
    # Print the text of the tweet followed by a new line ('\n)
    print(tweet['full_text'] + '\n')

### ... Now, let's save those tweets to a CSV file again
Just as we did with the Facebook posts, we are going to save these 200 tweets to a CSV file.

This time the file name is **tweets.csv**

The resulting CSV should contain each of the following for each tweet:
* The timestamp ('created_at')
* The ID of the tweet ('id')
* The contents of the tweet ('full_text')
* The number of retweets the tweet has received ('retweet_count')
* The number of timee the tweet has been favorited ('favorite_count')

N.B. Don't forget to fill out the **APP_KEY** and **APP_SECRET** variables again. You can also change **user_id** if you want to get tweets from a different account.

In [None]:
import csv
from twython import Twython

# Paste the Consumer Key (API Key) from apps.twitter.com
APP_KEY = ''
# Paste the Consumer Secret (API Secret) from apps.twitter.com
APP_SECRET = ''

# These three lines are lifted from the https://twython.readthedocs.io/en/latest/usage/starting_out.html#oauth2
# Full disclosure: I don't understand what they do, but they work, which is all that matters
twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()
twitter = Twython(APP_KEY, access_token=ACCESS_TOKEN)

# Enter the Twitter handle of the account you want to look at
user_id = 'realdonaldtrump'

# Get JSON containing the user's 200 most recent tweets
tweets = twitter.get_user_timeline(screen_name=user_id, count=200, tweet_mode='extended')

# Create an output file, name it and give it write privileges ('w')
csv_file = open('tweets.csv', 'w')
# Initiate the CSV writer
writer = csv.writer(csv_file)

# Go through each of the 200 tweets in 'tweets' JSON
for tweet in tweets:
    # Extract the timestamp of the tweet
    timestamp = tweet['created_at']
    # Extract the ID of the tweet
    tweet_id = tweet['id']
    # Extract the text of the tweet
    text = tweet['full_text']
    # Extract the number of retweets
    retweets = tweet['retweet_count']
    # Extract the number of favourites the tweet has got
    favourites = tweet['favorite_count']
    
    tweet_info = timestamp, tweet_id, text, retweets, favourites
    
    # Write the contents of the 'post_info' to the output file
    writer.writerow(tweet_info)
    # Print the contents to screen so we can see what we're getting
    print(tweet_info)

csv_file.close()

## Finally... let's download thousands of tweets instead of hundreds
The Twitter API lets you collect up to 3,200 tweets in one session.

We know we can get up to 200 at a time, so we just need to repeat that request 16 times (16 * 200 = 3,200).

We're going to do that with a simple for loop that repeats itself 16 times.

The resulting tweets are going to be saved to a file named **tweets_long.csv**

In [None]:
import csv
from twython import Twython

# Paste the Consumer Key (API Key) from apps.twitter.com
APP_KEY = ''
# Paste the Consumer Secret (API Secret) from apps.twitter.com
APP_SECRET = ''

# These three lines are lifted from the https://twython.readthedocs.io/en/latest/usage/starting_out.html#oauth2
# I don't understand what they do, but they work, which is all that matters
twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()
twitter = Twython(APP_KEY, access_token=ACCESS_TOKEN)

# Create an output file, name it and give it write privileges ('w')
csv_file = open('tweets_long.csv', 'w')
# Initiate the CSV writer
writer = csv.writer(csv_file)

# Enter the Twitter handle of the account you want to look at
user_id = 'realdonaldtrump'

# Find the most recent tweet in the user's timeline
latest_tweet = twitter.get_user_timeline(screen_name=user_id, count=1)
# Get the ID of the latest tweet
tweet_id = latest_tweet[0]['id']

# Set up a loop that will repeat 16 times
for i in range(0, 16):
    # Get JSON containing the user's 200 most recent tweets
    tweets = twitter.get_user_timeline(screen_name=user_id, count=200, tweet_mode='extended', max_id=tweet_id)
    
    # Go through each of the 200 tweets in 'tweets' JSON
    for tweet in tweets:
        # Extract the timestamp of the tweet
        timestamp = tweet['created_at']
        # Extract the ID of the tweet
        tweet_id = tweet['id']
        # Extract the text of the tweet
        text = tweet['full_text']
        # Extract the number of retweets
        retweets = tweet['retweet_count']
        # Extract the number of favourites the tweet has got
        favourites = tweet['favorite_count']

        # Group the scraped tweet information together in a variable called tweet_info
        tweet_info = timestamp, tweet_id, text, retweets, favourites

        # Write the contents of the 'post_info' to the output file
        writer.writerow(tweet_info)
        # Print the contents to screen so we can see what we're getting
        print(tweet_info)
        
        tweet_id = tweet['id'] - 1

# Close the CSV file you have been writing to
csv_file.close()