#Mining the Social Web, 2nd Edition

##Chapter 7: Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More

This IPython Notebook provides an interactive way to follow along with and explore the numbered examples from [_Mining the Social Web (2nd Edition)_](http://bit.ly/135dHfs). The intent behind this notebook is to reinforce the concepts from the sample code in a fun, convenient, and effective way. This notebook assumes that you are reading along with the book and have the context of the discussion as you work through these exercises.

In the somewhat unlikely event that you've somehow stumbled across this notebook outside of its context on GitHub, [you can find the full source code repository here](http://bit.ly/16kGNyb).

## Copyright and Licensing

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

## Example 1. Programmatically obtaining a personal API access token for accessing GitHub's API

In [8]:
import requests
from getpass import getpass
import json

username = '' # Your GitHub username
password = '' # Your GitHub password

# Note that credentials will be transmitted over a secure SSL connection
url = 'https://github.com/login/oauth/authorize'
note = 'Mining the Social Web, 2nd Ed.'
post_data = {'scopes':['repo'],'note': note }

response = requests.post(
    url,
    auth = (username, password),
    data = json.dumps(post_data),
    )   

print("API response:", response.text)
print('\n')
print("Your OAuth token is", response.json()['token'])

# Go to https://github.com/settings/applications to revoke this token

API response: Cookies must be enabled to use GitHub.




JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [9]:
response.json()

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

## Example 2. Making direct HTTP requests to GitHub's API

In [11]:
import json
import requests

# An unauthenticated request that doesn't contain an ?access_token=xxx query string
url = "https://api.github.com/repos/ptwobrussell/Mining-the-Social-Web/stargazers"
response = requests.get(url)

# Display one stargazer

print(json.dumps(response.json()[0], indent=1)+ '\n')

# Display headers
for (k,v) in response.headers.items():
    print(k, "=>", v)

{
 "login": "rdempsey",
 "id": 224,
 "avatar_url": "https://avatars2.githubusercontent.com/u/224?v=4",
 "gravatar_id": "",
 "url": "https://api.github.com/users/rdempsey",
 "html_url": "https://github.com/rdempsey",
 "followers_url": "https://api.github.com/users/rdempsey/followers",
 "following_url": "https://api.github.com/users/rdempsey/following{/other_user}",
 "gists_url": "https://api.github.com/users/rdempsey/gists{/gist_id}",
 "starred_url": "https://api.github.com/users/rdempsey/starred{/owner}{/repo}",
 "subscriptions_url": "https://api.github.com/users/rdempsey/subscriptions",
 "organizations_url": "https://api.github.com/users/rdempsey/orgs",
 "repos_url": "https://api.github.com/users/rdempsey/repos",
 "events_url": "https://api.github.com/users/rdempsey/events{/privacy}",
 "received_events_url": "https://api.github.com/users/rdempsey/received_events",
 "type": "User",
 "site_admin": false
}

Server => GitHub.com
Date => Sun, 29 Apr 2018 21:38:18 GMT
Content-Type => applic

## Example 3. Using PyGithub to query for stargazers of a particular repository

In [10]:
from github import Github

# XXX: Specify your own access token here

ACCESS_TOKEN = ''

# Specify a username and repository of interest for that user.

USER = 'ptwobrussell'
REPO = 'Mining-the-Social-Web'

client = Github(ACCESS_TOKEN, per_page=100)
user = client.get_user(USER)
repo = user.get_repo(REPO)

# Get a list of people who have bookmarked the repo.
# Since you'll get a lazy iterator back, you have to traverse
# it if you want to get the total number of stargazers.

stargazers = [ s for s in repo.get_stargazers() ]
print("Number of stargazers", len(stargazers))

Number of stargazers 1151


## Example 4. Constructing a trivial property graph

In [14]:
import networkx as nx

# Create a directed graph

g = nx.DiGraph()

# Add an edge to the directed graph from X to Y

g.add_edge('X', 'Y')

# Print some statistics about the graph

print(nx.info(g)+'\n')

# Get the nodes and edges from the graph

print("Nodes:", g.nodes())
print("Edges:", g.edges())
print('\n')


# Get node properties

print("X props:", g.node['X'])
print("Y props:", g.node['Y'])

# Get edge properties

print("X=>Y props:", g['X']['Y'])
print('\n')

# Update a node property

g.node['X'].update({'prop1' : 'value1'})
print("X props:", g.node['X'])
print('\n')

# Update an edge property

g['X']['Y'].update({'label' : 'label1'})
print("X=>Y props:", g['X']['Y'])

Name: 
Type: DiGraph
Number of nodes: 2
Number of edges: 1
Average in degree:   0.5000
Average out degree:   0.5000

Nodes: ['X', 'Y']
Edges: [('X', 'Y')]


X props: {}
Y props: {}
X=>Y props: {}


X props: {'prop1': 'value1'}


X=>Y props: {'label': 'label1'}


## Example 5. Constructing an ego graph of a repository and its stargazers

In [15]:
# Expand the initial graph with (interest) edges pointing each direction for 
# additional people interested. Take care to ensure that user and repo nodes 
# do not collide by appending their type.

import networkx as nx

g = nx.DiGraph()
g.add_node(repo.name + '(repo)', type='repo', lang=repo.language, owner=user.login)

for sg in stargazers:
    g.add_node(sg.login + '(user)', type='user')
    g.add_edge(sg.login + '(user)', repo.name + '(repo)', type='gazes')

## Example 6. Introducing some handy graph operations

In [16]:
# Poke around in the current graph to get a better feel for how NetworkX works

print(nx.info(g))
print('\n')
print(g.node['Mining-the-Social-Web(repo)'])
print(g.node['ptwobrussell(user)'])
print('\n')
print(g['ptwobrussell(user)']['Mining-the-Social-Web(repo)'])
# The next line would throw a KeyError since no such edge exists:
# print g['Mining-the-Social-Web(repo)']['ptwobrussell(user)']
print('\n')
print(g['ptwobrussell(user)'])
print(g['Mining-the-Social-Web(repo)'])
print('\n')
print(g.in_edges(['ptwobrussell(user)']))
print(g.out_edges(['ptwobrussell(user)']))
print('\n')
print(g.in_edges(['Mining-the-Social-Web(repo)']))
print(g.out_edges(['Mining-the-Social-Web(repo)']))

Name: 
Type: DiGraph
Number of nodes: 1152
Number of edges: 1151
Average in degree:   0.9991
Average out degree:   0.9991


{'type': 'repo', 'lang': 'JavaScript', 'owner': 'ptwobrussell'}
{'type': 'user'}


{'type': 'gazes'}


{'Mining-the-Social-Web(repo)': {'type': 'gazes'}}
{}


[]
[('ptwobrussell(user)', 'Mining-the-Social-Web(repo)')]


[('rdempsey(user)', 'Mining-the-Social-Web(repo)'), ('frac(user)', 'Mining-the-Social-Web(repo)'), ('prb(user)', 'Mining-the-Social-Web(repo)'), ('mcroydon(user)', 'Mining-the-Social-Web(repo)'), ('batasrki(user)', 'Mining-the-Social-Web(repo)'), ('twleung(user)', 'Mining-the-Social-Web(repo)'), ('kevinchiu(user)', 'Mining-the-Social-Web(repo)'), ('nikolay(user)', 'Mining-the-Social-Web(repo)'), ('tswicegood(user)', 'Mining-the-Social-Web(repo)'), ('ngpestelos(user)', 'Mining-the-Social-Web(repo)'), ('darron(user)', 'Mining-the-Social-Web(repo)'), ('brunojm(user)', 'Mining-the-Social-Web(repo)'), ('rgaidot(user)', 'Mining-the-Social-Web(repo)'), ('

## Example 7. Calculating degree, betweenness, and closeness centrality measures on the Krackhardt kite graph

In [17]:
from operator import itemgetter
from IPython.display import HTML
from IPython.core.display import display

display(HTML('<img src="files/resources/ch07-github/kite-graph.png" width="400px">'))

# The classic Krackhardt kite graph
kkg = nx.generators.small.krackhardt_kite_graph()

print("Degree Centrality")
print(sorted(nx.degree_centrality(kkg).items(), 
             key=itemgetter(1), reverse=True))
print('\n')

print("Betweenness Centrality")
print(sorted(nx.betweenness_centrality(kkg).items(), 
             key=itemgetter(1), reverse=True))
print('\n')

print("Closeness Centrality")
print(sorted(nx.closeness_centrality(kkg).items(), 
             key=itemgetter(1), reverse=True))

Degree Centrality
[(3, 0.6666666666666666), (5, 0.5555555555555556), (6, 0.5555555555555556), (0, 0.4444444444444444), (1, 0.4444444444444444), (2, 0.3333333333333333), (4, 0.3333333333333333), (7, 0.3333333333333333), (8, 0.2222222222222222), (9, 0.1111111111111111)]


Betweenness Centrality
[(7, 0.38888888888888884), (5, 0.23148148148148148), (6, 0.23148148148148148), (8, 0.2222222222222222), (3, 0.10185185185185183), (0, 0.023148148148148143), (1, 0.023148148148148143), (2, 0.0), (4, 0.0), (9, 0.0)]


Closeness Centrality
[(5, 0.6428571428571429), (6, 0.6428571428571429), (3, 0.6), (7, 0.6), (0, 0.5294117647058824), (1, 0.5294117647058824), (2, 0.5), (4, 0.5), (8, 0.42857142857142855), (9, 0.3103448275862069)]


## Example 8. Adding additional interest edges to the graph through the inclusion of "follows" edges

In [19]:
# Add (social) edges from the stargazers' followers. This can take a while 
# because of all of the potential API calls to GitHub. The approximate number
# of requests for followers for each iteration of this loop can be calculated as
# math.ceil(sg.get_followers() / 100.0) per the API returning up to 100 items
# at a time.

import sys

for i, sg in enumerate(stargazers):
    
    # Add "follows" edges between stargazers in the graph if any relationships exist
    try:
        for follower in sg.get_followers():
            if follower.login + '(user)' in g:
                g.add_edge(follower.login + '(user)', sg.login + '(user)', 
                           type='follows')
    except Exception: #ssl.SSLError
        print >> sys.stderr, "Encountered an error fetching followers for", \
                             sg.login, "Skipping."
        print >> sys.stderr, e

    print("Processed", i+1, " stargazers. Num nodes/edges in graph", \
          g.number_of_nodes(), "/", g.number_of_edges())
    print("Rate limit remaining", client.rate_limiting)

Processed 1  stargazers. Num nodes/edges in graph 1152 / 1153
Rate limit remaining (4981, 5000)
Processed 2  stargazers. Num nodes/edges in graph 1152 / 1156
Rate limit remaining (4980, 5000)
Processed 3  stargazers. Num nodes/edges in graph 1152 / 1158
Rate limit remaining (4979, 5000)
Processed 4  stargazers. Num nodes/edges in graph 1152 / 1164
Rate limit remaining (4977, 5000)
Processed 5  stargazers. Num nodes/edges in graph 1152 / 1165
Rate limit remaining (4976, 5000)
Processed 6  stargazers. Num nodes/edges in graph 1152 / 1167
Rate limit remaining (4975, 5000)
Processed 7  stargazers. Num nodes/edges in graph 1152 / 1168
Rate limit remaining (4974, 5000)
Processed 8  stargazers. Num nodes/edges in graph 1152 / 1171
Rate limit remaining (4971, 5000)
Processed 9  stargazers. Num nodes/edges in graph 1152 / 1181
Rate limit remaining (4966, 5000)
Processed 10  stargazers. Num nodes/edges in graph 1152 / 1184
Rate limit remaining (4965, 5000)
Processed 11  stargazers. Num nodes/edg

Processed 87  stargazers. Num nodes/edges in graph 1152 / 1373
Rate limit remaining (4858, 5000)
Processed 88  stargazers. Num nodes/edges in graph 1152 / 1376
Rate limit remaining (4857, 5000)
Processed 89  stargazers. Num nodes/edges in graph 1152 / 1378
Rate limit remaining (4856, 5000)
Processed 90  stargazers. Num nodes/edges in graph 1152 / 1379
Rate limit remaining (4855, 5000)
Processed 91  stargazers. Num nodes/edges in graph 1152 / 1384
Rate limit remaining (4853, 5000)
Processed 92  stargazers. Num nodes/edges in graph 1152 / 1385
Rate limit remaining (4852, 5000)
Processed 93  stargazers. Num nodes/edges in graph 1152 / 1389
Rate limit remaining (4851, 5000)
Processed 94  stargazers. Num nodes/edges in graph 1152 / 1390
Rate limit remaining (4850, 5000)
Processed 95  stargazers. Num nodes/edges in graph 1152 / 1391
Rate limit remaining (4849, 5000)
Processed 96  stargazers. Num nodes/edges in graph 1152 / 1394
Rate limit remaining (4848, 5000)
Processed 97  stargazers. Num 

Processed 172  stargazers. Num nodes/edges in graph 1152 / 1804
Rate limit remaining (4519, 5000)
Processed 173  stargazers. Num nodes/edges in graph 1152 / 1806
Rate limit remaining (4517, 5000)
Processed 174  stargazers. Num nodes/edges in graph 1152 / 1807
Rate limit remaining (4516, 5000)
Processed 175  stargazers. Num nodes/edges in graph 1152 / 1810
Rate limit remaining (4515, 5000)
Processed 176  stargazers. Num nodes/edges in graph 1152 / 1810
Rate limit remaining (4514, 5000)
Processed 177  stargazers. Num nodes/edges in graph 1152 / 1810
Rate limit remaining (4513, 5000)
Processed 178  stargazers. Num nodes/edges in graph 1152 / 1812
Rate limit remaining (4512, 5000)
Processed 179  stargazers. Num nodes/edges in graph 1152 / 1814
Rate limit remaining (4511, 5000)
Processed 180  stargazers. Num nodes/edges in graph 1152 / 1817
Rate limit remaining (4509, 5000)
Processed 181  stargazers. Num nodes/edges in graph 1152 / 1817
Rate limit remaining (4508, 5000)
Processed 182  starg

Processed 256  stargazers. Num nodes/edges in graph 1152 / 1942
Rate limit remaining (4423, 5000)
Processed 257  stargazers. Num nodes/edges in graph 1152 / 1942
Rate limit remaining (4422, 5000)
Processed 258  stargazers. Num nodes/edges in graph 1152 / 1943
Rate limit remaining (4421, 5000)
Processed 259  stargazers. Num nodes/edges in graph 1152 / 1943
Rate limit remaining (4420, 5000)
Processed 260  stargazers. Num nodes/edges in graph 1152 / 1944
Rate limit remaining (4419, 5000)
Processed 261  stargazers. Num nodes/edges in graph 1152 / 1945
Rate limit remaining (4418, 5000)
Processed 262  stargazers. Num nodes/edges in graph 1152 / 1946
Rate limit remaining (4417, 5000)
Processed 263  stargazers. Num nodes/edges in graph 1152 / 1946
Rate limit remaining (4416, 5000)
Processed 264  stargazers. Num nodes/edges in graph 1152 / 1946
Rate limit remaining (4415, 5000)
Processed 265  stargazers. Num nodes/edges in graph 1152 / 1948
Rate limit remaining (4414, 5000)
Processed 266  starg

Processed 341  stargazers. Num nodes/edges in graph 1152 / 2037
Rate limit remaining (4321, 5000)
Processed 342  stargazers. Num nodes/edges in graph 1152 / 2042
Rate limit remaining (4318, 5000)
Processed 343  stargazers. Num nodes/edges in graph 1152 / 2042
Rate limit remaining (4317, 5000)
Processed 344  stargazers. Num nodes/edges in graph 1152 / 2043
Rate limit remaining (4316, 5000)
Processed 345  stargazers. Num nodes/edges in graph 1152 / 2044
Rate limit remaining (4315, 5000)
Processed 346  stargazers. Num nodes/edges in graph 1152 / 2046
Rate limit remaining (4314, 5000)
Processed 347  stargazers. Num nodes/edges in graph 1152 / 2046
Rate limit remaining (4313, 5000)
Processed 348  stargazers. Num nodes/edges in graph 1152 / 2046
Rate limit remaining (4312, 5000)
Processed 349  stargazers. Num nodes/edges in graph 1152 / 2046
Rate limit remaining (4311, 5000)
Processed 350  stargazers. Num nodes/edges in graph 1152 / 2046
Rate limit remaining (4310, 5000)
Processed 351  starg

Processed 425  stargazers. Num nodes/edges in graph 1152 / 2109
Rate limit remaining (4224, 5000)
Processed 426  stargazers. Num nodes/edges in graph 1152 / 2109
Rate limit remaining (4223, 5000)
Processed 427  stargazers. Num nodes/edges in graph 1152 / 2111
Rate limit remaining (4222, 5000)
Processed 428  stargazers. Num nodes/edges in graph 1152 / 2113
Rate limit remaining (4220, 5000)
Processed 429  stargazers. Num nodes/edges in graph 1152 / 2113
Rate limit remaining (4219, 5000)
Processed 430  stargazers. Num nodes/edges in graph 1152 / 2114
Rate limit remaining (4218, 5000)
Processed 431  stargazers. Num nodes/edges in graph 1152 / 2115
Rate limit remaining (4217, 5000)
Processed 432  stargazers. Num nodes/edges in graph 1152 / 2117
Rate limit remaining (4216, 5000)
Processed 433  stargazers. Num nodes/edges in graph 1152 / 2117
Rate limit remaining (4215, 5000)
Processed 434  stargazers. Num nodes/edges in graph 1152 / 2123
Rate limit remaining (4212, 5000)
Processed 435  starg

Processed 509  stargazers. Num nodes/edges in graph 1152 / 2184
Rate limit remaining (4120, 5000)
Processed 510  stargazers. Num nodes/edges in graph 1152 / 2184
Rate limit remaining (4119, 5000)
Processed 511  stargazers. Num nodes/edges in graph 1152 / 2184
Rate limit remaining (4118, 5000)
Processed 512  stargazers. Num nodes/edges in graph 1152 / 2186
Rate limit remaining (4117, 5000)
Processed 513  stargazers. Num nodes/edges in graph 1152 / 2186
Rate limit remaining (4116, 5000)
Processed 514  stargazers. Num nodes/edges in graph 1152 / 2187
Rate limit remaining (4115, 5000)
Processed 515  stargazers. Num nodes/edges in graph 1152 / 2187
Rate limit remaining (4114, 5000)
Processed 516  stargazers. Num nodes/edges in graph 1152 / 2187
Rate limit remaining (4113, 5000)
Processed 517  stargazers. Num nodes/edges in graph 1152 / 2187
Rate limit remaining (4112, 5000)
Processed 518  stargazers. Num nodes/edges in graph 1152 / 2190
Rate limit remaining (4111, 5000)
Processed 519  starg

Processed 593  stargazers. Num nodes/edges in graph 1152 / 2231
Rate limit remaining (4033, 5000)
Processed 594  stargazers. Num nodes/edges in graph 1152 / 2235
Rate limit remaining (4031, 5000)
Processed 595  stargazers. Num nodes/edges in graph 1152 / 2235
Rate limit remaining (4030, 5000)
Processed 596  stargazers. Num nodes/edges in graph 1152 / 2236
Rate limit remaining (4029, 5000)
Processed 597  stargazers. Num nodes/edges in graph 1152 / 2237
Rate limit remaining (4028, 5000)
Processed 598  stargazers. Num nodes/edges in graph 1152 / 2237
Rate limit remaining (4027, 5000)
Processed 599  stargazers. Num nodes/edges in graph 1152 / 2237
Rate limit remaining (4026, 5000)
Processed 600  stargazers. Num nodes/edges in graph 1152 / 2237
Rate limit remaining (4025, 5000)
Processed 601  stargazers. Num nodes/edges in graph 1152 / 2237
Rate limit remaining (4024, 5000)
Processed 602  stargazers. Num nodes/edges in graph 1152 / 2237
Rate limit remaining (4023, 5000)
Processed 603  starg

Processed 677  stargazers. Num nodes/edges in graph 1152 / 2274
Rate limit remaining (3946, 5000)
Processed 678  stargazers. Num nodes/edges in graph 1152 / 2274
Rate limit remaining (3945, 5000)
Processed 679  stargazers. Num nodes/edges in graph 1152 / 2275
Rate limit remaining (3944, 5000)
Processed 680  stargazers. Num nodes/edges in graph 1152 / 2276
Rate limit remaining (3943, 5000)
Processed 681  stargazers. Num nodes/edges in graph 1152 / 2277
Rate limit remaining (3942, 5000)
Processed 682  stargazers. Num nodes/edges in graph 1152 / 2278
Rate limit remaining (3941, 5000)
Processed 683  stargazers. Num nodes/edges in graph 1152 / 2279
Rate limit remaining (3940, 5000)
Processed 684  stargazers. Num nodes/edges in graph 1152 / 2280
Rate limit remaining (3939, 5000)
Processed 685  stargazers. Num nodes/edges in graph 1152 / 2283
Rate limit remaining (3938, 5000)
Processed 686  stargazers. Num nodes/edges in graph 1152 / 2283
Rate limit remaining (3937, 5000)
Processed 687  starg

TypeError: unsupported operand type(s) for >>: 'builtin_function_or_method' and 'OutStream'

## Example 9. Exploring the updated graph's "follows" edges

In [20]:
from operator import itemgetter
from collections import Counter

# Let's see how many social edges we added since last time.
print(nx.info(g))
print('\n')

# The number of "follows" edges is the difference
print(len([e for e in g.edges_iter(data=True) if e[2]['type'] == 'follows']))
print('\n')

# The repository owner is possibly one of the more popular users in this graph.
print(len([e 
           for e in g.edges_iter(data=True) 
               if e[2]['type'] == 'follows' and e[1] == 'ptwobrussell(user)']))
print('\n')

# Let's examine the number of adjacent edges to each node
print(sorted([n for n in g.degree_iter()], key=itemgetter(1), reverse=True)[:10])
print('\n')

# Consider the ratio of incoming and outgoing edges for a couple of users with 
# high node degrees...

# A user who follows many but is not followed back by many.

print(len(g.out_edges('hcilab(user)')))
print(len(g.in_edges('hcilab(user)')))
print('\n')

# A user who is followed by many but does not follow back.

print(len(g.out_edges('ptwobrussell(user)')))
print(len(g.in_edges('ptwobrussell(user)')))
print('\n')

c = Counter([e[1] for e in g.edges_iter(data=True) if e[2]['type'] == 'follows'])
popular_users = [ (u, f) for (u, f) in c.most_common() if f > 1 ]
print("Number of popular users", len(popular_users))
print("Top 10 popular users:", popular_users[:10])

Name: 
Type: DiGraph
Number of nodes: 1152
Number of edges: 2421
Average in degree:   2.1016
Average out degree:   2.1016


1270


127


[('Mining-the-Social-Web(repo)', 1151), ('angusshire(user)', 347), ('kennethreitz(user)', 169), ('ptwobrussell(user)', 128), ('VagrantStory(user)', 77), ('trietptm(user)', 50), ('daimajia(user)', 39), ('rohithadassanayake(user)', 39), ('JT5D(user)', 28), ('hammer(user)', 26)]


0
0


1
127


Number of popular users 202
Top 10 popular users: [('kennethreitz(user)', 166), ('ptwobrussell(user)', 127), ('daimajia(user)', 36), ('hammer(user)', 20), ('jakubroztocil(user)', 20), ('isnowfy(user)', 19), ('japerk(user)', 18), ('dgryski(user)', 13), ('tswicegood(user)', 10), ('ZoomQuiet(user)', 10)]


## Example 10. Snapshotting (pickling) the graph's state to disk

In [None]:
# Save your work by serializing out (pickling) the graph
nx.write_gpickle(g, "resources/ch07-github/data/github.gpickle.1")

# How to restore the graph...
# import networkx as nx
# g = nx.read_gpickle("resources/ch07-github/data/github.gpickle.1")

## Example 11. Applying centrality measures to the interest graph

In [None]:
from operator import itemgetter

# Create a copy of the graph so that we can iteratively mutate the copy
# as needed for experimentation

h = g.copy()

# Remove the seed of the interest graph, which is a supernode, in order
# to get a better idea of the network dynamics

h.remove_node('Mining-the-Social-Web(repo)')

# XXX: Remove any other nodes that appear to be supernodes.
# Filter any other nodes that you can by threshold
# criteria or heuristics from inspection.

# Display the centrality measures for the top 10 nodes


dc = sorted(nx.degree_centrality(h).items(), 
            key=itemgetter(1), reverse=True)

print "Degree Centrality"
print dc[:10]
print

bc = sorted(nx.betweenness_centrality(h).items(), 
            key=itemgetter(1), reverse=True)

print "Betweenness Centrality"
print bc[:10]
print

print "Closeness Centrality"
cc = sorted(nx.closeness_centrality(h).items(), 
            key=itemgetter(1), reverse=True)
print cc[:10]

## Example 12. Adding starred repositories to the graph

In [23]:
# Let's add each stargazer's additional starred repos and add edges 
# to find additional interests.

MAX_REPOS = 500

for i, sg in enumerate(stargazers):
    print(sg.login)
    try:
        for starred in sg.get_starred()[:MAX_REPOS]: # Slice to avoid supernodes
            g.add_node(starred.name + '(repo)', type='repo', lang=starred.language, \
                       owner=starred.owner.login)
            g.add_edge(sg.login + '(user)', starred.name + '(repo)', type='gazes')
    except Exception: #ssl.SSLError:
        print("Encountered an error fetching starred repos for", sg.login, "Skipping.")

    print("Processed", i+1, "stargazers' starred repos")
    print("Num nodes/edges in graph", g.number_of_nodes(), "/", g.number_of_edges())
    print("Rate limit", client.rate_limiting)

rdempsey
Encountered an error fetching starred repos for rdempsey Skipping.
Processed 1 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (59, 60)
frac
Encountered an error fetching starred repos for frac Skipping.
Processed 2 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (58, 60)
prb
Encountered an error fetching starred repos for prb Skipping.
Processed 3 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (57, 60)
mcroydon
Encountered an error fetching starred repos for mcroydon Skipping.
Processed 4 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (56, 60)
batasrki
Encountered an error fetching starred repos for batasrki Skipping.
Processed 5 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (55, 60)
twleung
Encountered an error fetching starred repos for twleung Skipping.
Processed 6 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (54

Encountered an error fetching starred repos for lsinger Skipping.
Processed 50 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (10, 60)
yy
Encountered an error fetching starred repos for yy Skipping.
Processed 51 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (9, 60)
daemianmack
Encountered an error fetching starred repos for daemianmack Skipping.
Processed 52 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (8, 60)
ricardoalmeida
Encountered an error fetching starred repos for ricardoalmeida Skipping.
Processed 53 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (7, 60)
gerad
Encountered an error fetching starred repos for gerad Skipping.
Processed 54 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (6, 60)
amitagrawal
Encountered an error fetching starred repos for amitagrawal Skipping.
Processed 55 stargazers' starred repos
Num nodes/edges in graph 1152 / 2

Encountered an error fetching starred repos for shurik Skipping.
Processed 100 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
zonovo
Encountered an error fetching starred repos for zonovo Skipping.
Processed 101 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
matagus
Encountered an error fetching starred repos for matagus Skipping.
Processed 102 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
phernandez
Encountered an error fetching starred repos for phernandez Skipping.
Processed 103 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
vshulyak
Encountered an error fetching starred repos for vshulyak Skipping.
Processed 104 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
elg0nz
Encountered an error fetching starred repos for elg0nz Skipping.
Processed 105 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate

Encountered an error fetching starred repos for johnthedebs Skipping.
Processed 150 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
ptwobrussell
Encountered an error fetching starred repos for ptwobrussell Skipping.
Processed 151 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
bububa
Encountered an error fetching starred repos for bububa Skipping.
Processed 152 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
barbietunnie
Encountered an error fetching starred repos for barbietunnie Skipping.
Processed 153 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
hezila
Encountered an error fetching starred repos for hezila Skipping.
Processed 154 stargazers' starred repos
Num nodes/edges in graph 1152 / 2421
Rate limit (0, 60)
sudar
Encountered an error fetching starred repos for sudar Skipping.
Processed 155 stargazers' starred repos
Num nodes/edges in graph 115

KeyboardInterrupt: 

**NOTE: Given that Example 12 is potentially a very time-consuming example to run, be sure to snapshot your work**

In [24]:
# Save your work by serializing out another snapshot of the graph
nx.write_gpickle(g, "resources/ch07-github/data/github.gpickle.2")

#import networkx as nx
#g = nx.read_gpickle("resources/ch07-github/data/github.gpickle.2")

FileNotFoundError: [Errno 2] No such file or directory: 'resources/ch07-github/data/github.gpickle.2'

Consider analysis similar to _Example 12_ here. Create a copy of the graph and be selective in pruning it or extracting subgraphs of interst.

## Example 13. Exploring the graph after updates with additional starred repositories

In [25]:
# Poke around: how to get users/repos
from operator import itemgetter

print(nx.info(g))
print('\n')

# Get a list of repositories from the graph.

repos = [n for n in g.nodes_iter() if g.node[n]['type'] == 'repo']

# Most popular repos

print("Popular repositories")
print(sorted([(n,d) 
              for (n,d) in g.in_degree_iter() 
                  if g.node[n]['type'] == 'repo'], \
             key=itemgetter(1), reverse=True)[:10])
print('\n')

# Projects gazed at by a user

print("Respositories that ptwobrussell has bookmarked")
print([(n,g.node[n]['lang']) 
       for n in g['ptwobrussell(user)'] 
           if g['ptwobrussell(user)'][n]['type'] == 'gazes'])
print('\n')

# Programming languages for each user

print("Programming languages ptwobrussell is interested in")
print(list(set([g.node[n]['lang'] 
                for n in g['ptwobrussell(user)'] 
                    if g['ptwobrussell(user)'][n]['type'] == 'gazes'])))
print('\n')

# Find supernodes in the graph by approximating with a high number of 
# outgoing edges

print("Supernode candidates")
print(sorted([(n, len(g.out_edges(n))) 
              for n in g.nodes_iter() 
                  if g.node[n]['type'] == 'user' and len(g.out_edges(n)) > 500], \
             key=itemgetter(1), reverse=True))

Name: 
Type: DiGraph
Number of nodes: 1152
Number of edges: 2421
Average in degree:   2.1016
Average out degree:   2.1016


Popular repositories
[('Mining-the-Social-Web(repo)', 1151)]


Respositories that ptwobrussell has bookmarked
[('Mining-the-Social-Web(repo)', 'JavaScript')]


Programming languages ptwobrussell is interested in
['JavaScript']


Supernode candidates
[]


## Example 14. Updating the graph to include nodes for programming languages

In [26]:
# Iterate over all of the repos, and add edges for programming languages 
# for each person in the graph. We'll also add edges back to repos so that 
# we have a good point to "pivot" upon.

repos = [n 
         for n in g.nodes_iter() 
             if g.node[n]['type'] == 'repo']

for repo in repos:
    lang = (g.node[repo]['lang'] or "") + "(lang)"
    
    stargazers = [u 
                  for (u, r, d) in g.in_edges_iter(repo, data=True) 
                     if d['type'] == 'gazes'
                 ]
    
    for sg in stargazers:
        g.add_node(lang, type='lang')
        g.add_edge(sg, lang, type='programs')
        g.add_edge(lang, repo, type='implements')

## Example 15. Sample queries for the final graph

In [27]:
# Some stats

print(nx.info(g))
print('\n')

# What languages exist in the graph?

print([n 
       for n in g.nodes_iter() 
           if g.node[n]['type'] == 'lang'])
print('\n')

# What languages do users program with?
print([n 
       for n in g['ptwobrussell(user)'] 
           if g['ptwobrussell(user)'][n]['type'] == 'programs'])

# What is the most popular programming language?
print("Most popular languages")
print(sorted([(n, g.in_degree(n))
 for n in g.nodes_iter() 
     if g.node[n]['type'] == 'lang'], key=itemgetter(1), reverse=True)[:10])
print('\n')

# How many users program in a particular language?
python_programmers = [u 
                      for (u, l) in g.in_edges_iter('Python(lang)') 
                          if g.node[u]['type'] == 'user']
print("Number of Python programmers:", len(python_programmers))
print('\n')

javascript_programmers = [u for 
                          (u, l) in g.in_edges_iter('JavaScript(lang)') 
                              if g.node[u]['type'] == 'user']
print("Number of JavaScript programmers:", len(javascript_programmers))
print('\n')

# What users program in both Python and JavaScript?
print("Number of programmers who use JavaScript and Python")
print(len(set(python_programmers).intersection(set(javascript_programmers))))

# Programmers who use JavaScript but not Python
print("Number of programmers who use JavaScript but not Python")
print(len(set(javascript_programmers).difference(set(python_programmers))))

# XXX: Can you determine who is the most polyglot programmer?

Name: 
Type: DiGraph
Number of nodes: 1153
Number of edges: 3573
Average in degree:   3.0989
Average out degree:   3.0989


['JavaScript(lang)']


['JavaScript(lang)']
Most popular languages
[('JavaScript(lang)', 1151)]


Number of Python programmers: 0


Number of JavaScript programmers: 1151


Number of programmers who use JavaScript and Python
0
Number of programmers who use JavaScript but not Python
1151


**NOTE: Optionally, snapshot the final graph**

In [None]:
# Save your work by serializing out another snapshot of the graph
nx.write_gpickle(g, "resources/ch07-github/data/github.gpickle.3")

#import networkx as nx
#g = nx.read_gpickle("resources/ch07-github/data/github.gpickle.3")

## Example 16. Graph visualization of the social network for the original interest graph

In [29]:
import os
import json
from IPython.display import IFrame
from IPython.core.display import display
from networkx.readwrite import json_graph

print("Stats on the full graph")
print(nx.info(g))
print('\n')

# Create a subgraph from a collection of nodes. In this case, the
# collection is all of the users in the original interest graph

mtsw_users = [n for n in g if g.node[n]['type'] == 'user']
h = g.subgraph(mtsw_users)

print("Stats on the extracted subgraph")
print(nx.info(h))

# Visualize the social network of all people from the original interest graph.

d = json_graph.node_link_data(h)
json.dump(d, open('resources/ch07-github/force.json', 'w'))


# IPython Notebook can serve files and display them into
# inline frames. Prepend the path with the 'files' prefix.

# A D3 template for displaying the graph data.
viz_file = 'files/resources/ch07-github/force.html'

# Display the D3 visualization.

display(IFrame(viz_file, '100%', '600px'))

Stats on the full graph
Name: 
Type: DiGraph
Number of nodes: 1153
Number of edges: 3573
Average in degree:   3.0989
Average out degree:   3.0989


Stats on the extracted subgraph
Name: 
Type: DiGraph
Number of nodes: 1151
Number of edges: 1270
Average in degree:   1.1034
Average out degree:   1.1034
