#Mining the Social Web, 2nd Edition

##Chapter 2: Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More

This IPython Notebook provides an interactive way to follow along with and explore the numbered examples from [_Mining the Social Web (2nd Edition)_](http://bit.ly/135dHfs). The intent behind this notebook is to reinforce the concepts from the sample code in a fun, convenient, and effective way. This notebook assumes that you are reading along with the book and have the context of the discussion as you work through these exercises.

In the somewhat unlikely event that you've somehow stumbled across this notebook outside of its context on GitHub, [you can find the full source code repository here](http://bit.ly/16kGNyb).

## Copyright and Licensing

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

# Facebook API Access

Facebook implements OAuth 2.0 as its standard authentication mechanism, but provides a convenient way for you to get an _access token_ for development purposes, and we'll opt to take advantage of that convenience in this notebook. For details on implementing an OAuth flow with Facebook (all from within IPython Notebook), see the \_AppendixB notebook from the [IPython Notebook Dashboard](/).

For this first example, login to your Facebook account and go to https://developers.facebook.com/tools/explorer/ to obtain and set permissions for an access token that you will need to define in the code cell defining the ACCESS_TOKEN variable below.  

Be sure to explore the permissions that are available by clicking on the "Get Access Token" button that's on the page and exploring all of the tabs available. For example, you will need to set the "friends_likes" option under the "Friends Data Permissions" since this permission is used by the script below but is not a basic permission and is not enabled by default. 

<img src="files/resources/ch02-facebook/images/FB_GraphExplorer_perms.png" width="300px" /><br />

In [6]:
# Copy and paste in the value you just got from the inline frame into this variable and execute this cell.
# Keep in mind that you could have just gone to https://developers.facebook.com/tools/access_token/
# and retrieved the "User Token" value from the Access Token Tool

ACCESS_TOKEN = 'EAAZAsJrV56zoBALnblZAMWIqVSfPkd2wmu0owxonkyNN6uo7o3O9qiMtGDfXNjdoE0gc4g1xvDxAM7nZC7WBkeFEWCLZBCMJtf6JZATNJ80KwZBKGdIFPeQoCazdfBXS5Jnbh03LwtU5HZCZAjTktCKfCoxJrPBV36zzM9hVirNWCertaMgXQrlO'

## Example 1. Making Graph API requests over HTTP

In [7]:
import requests # pip install requests
import json

base_url = 'https://graph.facebook.com/me'

# Get 10 likes for 10 friends
fields = 'id,name,friends.limit(10).fields(likes.limit(10))'

url = '%s?fields=%s&access_token=%s' % \
    (base_url, fields, ACCESS_TOKEN,)

# This API is HTTP-based and could be requested in the browser,
# with a command line utlity like curl, or using just about
# any programming language by making a request to the URL.
# Click the hyperlink that appears in your notebook output
# when you execute this code cell to see for yourself...
print("url: {}".format(url))

# Interpret the response as JSON and convert back
# to Python data structures
content = requests.get(url).json()

# Pretty-print the JSON and display it
print("json.dumps: {}".format(json.dumps(content, indent=1)))

url: https://graph.facebook.com/me?fields=id,name,friends.limit(10).fields(likes.limit(10))&access_token=EAAZAsJrV56zoBALnblZAMWIqVSfPkd2wmu0owxonkyNN6uo7o3O9qiMtGDfXNjdoE0gc4g1xvDxAM7nZC7WBkeFEWCLZBCMJtf6JZATNJ80KwZBKGdIFPeQoCazdfBXS5Jnbh03LwtU5HZCZAjTktCKfCoxJrPBV36zzM9hVirNWCertaMgXQrlO
json.dumps: {
 "name": "Eugene Quinn",
 "friends": {
  "summary": {
   "total_count": 131
  },
  "data": []
 },
 "id": "10202899666329944"
}


Note: If you attempt to run a query for all of your friends' likes and it appears to hang, it is probably because you have a lot of friends who have a lot of likes. If this happens, you may need to add limits and offsets to the fields in the query as described in Facebook's [field expansion](https://developers.facebook.com/docs/reference/api/field_expansion/) documentation. However, the <code>facebook</code> library that we'll use in the next example handles some of these issues, so it's recommended that you hold off and try it out first. This initial example is just to illustrate that Facebook's API is built on top of HTTP.

A couple of field limit/offset examples that illustrate the possibilities follow:

<code>
fields = 'id,name,friends.limit(10).fields(likes)'            # Get all likes for 10 friends 
fields = 'id,name,friends.offset(10).limit(10).fields(likes)' # Get all likes for 10 more friends 
fields = 'id,name,friends.fields(likes.limit(10))'            # Get 10 likes for all friends 
fields = 'id,name,friends.fields(likes.limit(10))'            # Get 10 likes for all friends
</code>

## Example 2. Querying the Graph API with Python

In [10]:
import facebook # pip install facebook-sdk
import json

# A helper function to pretty-print Python objects as JSON

def pp(o): 
    print("json.dumps: {}".format(json.dumps(o, indent=1)))

# Create a connection to the Graph API with your access token

g = facebook.GraphAPI(ACCESS_TOKEN)

# Execute a few sample queries

print('---------------')
print('Me')
print ('---------------')
pp(g.get_object('me\n'))
print ('---------------')
print ('My Friends')
print ('---------------')
pp(g.get_connections('me', 'friends\n'))
print ('---------------')
print ('Social Web')
print ('---------------')
pp(g.request("search", {'q' : 'social web', 'type' : 'page'}))

---------------
Me
---------------
json.dumps: {
 "name": "Eugene Quinn",
 "id": "10202899666329944"
}
---------------
My Friends
---------------
json.dumps: {
 "summary": {
  "total_count": 131
 },
 "data": []
}
---------------
Social Web
---------------
json.dumps: {
 "paging": {
  "cursors": {
   "after": "MjQZD",
   "before": "MAZDZD"
  },
  "next": "https://graph.facebook.com/v2.8/search?access_token=EAAZAsJrV56zoBALnblZAMWIqVSfPkd2wmu0owxonkyNN6uo7o3O9qiMtGDfXNjdoE0gc4g1xvDxAM7nZC7WBkeFEWCLZBCMJtf6JZATNJ80KwZBKGdIFPeQoCazdfBXS5Jnbh03LwtU5HZCZAjTktCKfCoxJrPBV36zzM9hVirNWCertaMgXQrlO&q=social+web&type=page&limit=25&after=MjQZD"
 },
 "data": [
  {
   "name": "WEB Social Agency",
   "id": "501464483358126"
  },
  {
   "name": "Social website",
   "id": "198566960212311"
  },
  {
   "name": "My Social Web",
   "id": "76277666156"
  },
  {
   "name": "Katie Williamsen Web + Social Media Consulting, LLC",
   "id": "1542982969346291"
  },
  {
   "name": "Grafdom - Web Design & Social Med

## Example 3. Results for a Graph API query for Mining the Social Web

In [11]:
# Get an instance of Mining the Social Web
# Using the page name also works if you know it.
# e.g. 'MiningTheSocialWeb' or 'CrossFit'
mtsw_id = '146803958708175'
pp(g.get_object(mtsw_id))

json.dumps: {
 "name": "Mining the Social Web",
 "id": "146803958708175"
}


## Example 4. Querying the Graph API for Open Graph objects by their URLs

In [12]:
# MTSW catalog link
pp(g.get_object('http://shop.oreilly.com/product/0636920030195.do'))

# PCI catalog link
pp(g.get_object('http://shop.oreilly.com/product/9780596529321.do'))

json.dumps: {
 "share": {
  "comment_count": 0,
  "share_count": 288
 },
 "og_object": {
  "updated_time": "2017-01-25T22:17:57+0000",
  "type": "book",
  "description": "Facebook, Twitter, LinkedIn, Google+, and other social web properties generate a wealth of valuable social data, but how can you tap into this data and discover who\u2019s connecting with whom, which insights are lurking just beneath the surface,...",
  "id": "465351090213998",
  "title": "Mining the Social Web"
 },
 "id": "http://shop.oreilly.com/product/0636920030195.do"
}
json.dumps: {
 "share": {
  "comment_count": 0,
  "share_count": 184
 },
 "og_object": {
  "updated_time": "2017-01-22T22:58:43+0000",
  "type": "book",
  "description": "This fascinating book demonstrates how you can build web applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets...",
  "id": "10150339462353143

## Example 5. Comparing likes between Coke and Pepsi fan pages

In [13]:
# Find Pepsi and Coke in search results

pp(g.request('search', {'q' : 'pepsi', 'type' : 'page', 'limit' : 5}))
pp(g.request('search', {'q' : 'coke', 'type' : 'page', 'limit' : 5}))

# Use the ids to query for likes

pepsi_id = '56381779049' # Could also use 'PepsiUS'
coke_id = '40796308305'  # Could also use 'CocaCola'

# A quick way to format integers with commas every 3 digits
def int_format(n): return "{:,}".format(n)

print("Pepsi likes: {}".format(int_format(g.get_object(pepsi_id)['likes'])))
print("Coke likes: {}".format(int_format(g.get_object(coke_id)['likes'])))

json.dumps: {
 "paging": {
  "cursors": {
   "after": "NAZDZD",
   "before": "MAZDZD"
  },
  "next": "https://graph.facebook.com/v2.8/search?access_token=EAAZAsJrV56zoBALnblZAMWIqVSfPkd2wmu0owxonkyNN6uo7o3O9qiMtGDfXNjdoE0gc4g1xvDxAM7nZC7WBkeFEWCLZBCMJtf6JZATNJ80KwZBKGdIFPeQoCazdfBXS5Jnbh03LwtU5HZCZAjTktCKfCoxJrPBV36zzM9hVirNWCertaMgXQrlO&q=pepsi&type=page&limit=5&after=NAZDZD"
 },
 "data": [
  {
   "name": "PepsiCo",
   "id": "260431051694"
  },
  {
   "name": "Pepsi",
   "id": "56381779049"
  },
  {
   "name": "Pepsithai",
   "id": "63619711274"
  },
  {
   "name": "Pepsi Center",
   "id": "111829892187838"
  },
  {
   "name": "PepsiCo.",
   "id": "175063185853562"
  }
 ]
}
json.dumps: {
 "paging": {
  "cursors": {
   "after": "NAZDZD",
   "before": "MAZDZD"
  },
  "next": "https://graph.facebook.com/v2.8/search?access_token=EAAZAsJrV56zoBALnblZAMWIqVSfPkd2wmu0owxonkyNN6uo7o3O9qiMtGDfXNjdoE0gc4g1xvDxAM7nZC7WBkeFEWCLZBCMJtf6JZATNJ80KwZBKGdIFPeQoCazdfBXS5Jnbh03LwtU5HZCZAjTktCKfCoxJrPBV3

KeyError: 'likes'

## Example 6. Querying a page for its "feed" and "links" connections

In [14]:
pp(g.get_connections(pepsi_id, 'feed'))
pp(g.get_connections(pepsi_id, 'links'))

pp(g.get_connections(coke_id, 'feed'))
pp(g.get_connections(coke_id, 'links'))

json.dumps: {
 "paging": {
  "previous": "https://graph.facebook.com/v2.8/56381779049/feed?since=1485356483&access_token=EAAZAsJrV56zoBALnblZAMWIqVSfPkd2wmu0owxonkyNN6uo7o3O9qiMtGDfXNjdoE0gc4g1xvDxAM7nZC7WBkeFEWCLZBCMJtf6JZATNJ80KwZBKGdIFPeQoCazdfBXS5Jnbh03LwtU5HZCZAjTktCKfCoxJrPBV36zzM9hVirNWCertaMgXQrlO&limit=25&__paging_token=enc_AdC6RrZAhE2MxqebzrZCXmkdaFoItvxt5jCik9IkFXAWZC6pVFU0tRuZAkMhDtfl9Oq1f5FMqs24WeY8FGyZADAZBk9zBT&__previous=1",
  "next": "https://graph.facebook.com/v2.8/56381779049/feed?access_token=EAAZAsJrV56zoBALnblZAMWIqVSfPkd2wmu0owxonkyNN6uo7o3O9qiMtGDfXNjdoE0gc4g1xvDxAM7nZC7WBkeFEWCLZBCMJtf6JZATNJ80KwZBKGdIFPeQoCazdfBXS5Jnbh03LwtU5HZCZAjTktCKfCoxJrPBV36zzM9hVirNWCertaMgXQrlO&limit=25&until=1482177600&__paging_token=enc_AdB11cP9ANZCTTaFER2hE8WDsETSrEBuUWGx3VK0YD0J2yrEZBZBwwGXFUvYzc7pLhOutWOaa3gPUria1P1K0RxEyCv"
 },
 "data": [
  {
   "message": "Allen Robinson looking fly with the 11 day countdown to #PepsiHalftime! Join the #FanCountdown with your own \ud83d\udcf9  a

GraphAPIError: (#12) links field is deprecated for versions v2.4 and higher