# Querying the GitHub API for repositories and organizations

By Stuart Geiger and Jamie Whitacre, made at a SciPy 2016 sprint. See the rendered, interactive, embedable map [here](http://staeiou.github.io/jupyter-orgs-map.html).

In [1]:
!pip install pygithub
!pip install geopy
!pip install ipywidgets



In [2]:
from github import Github

In [3]:
#this is my private login credentials, stored in ghlogin.py
import ghlogin


In [4]:
g = Github(login_or_token=ghlogin.gh_user, password=ghlogin.gh_passwd)

With this Github object, you can get all kinds of Github objects, which you can then futher explore.

In [5]:
user = g.get_user("staeiou")


In [6]:
print(user.name)
print(user.created_at)
print(user.location)

Stuart Geiger
2013-06-14 00:25:39
Berkeley, CA


In [7]:
repo = g.get_repo("jupyter/notebook")

In [8]:
print(repo.name)
print(repo.description)
print(repo.organization)
print(repo.organization.name)
print(repo.language)


notebook
Jupyter Interactive Notebook
<github.Organization.Organization object at 0x7f2c26dd0d30>
Project Jupyter
JavaScript


There are lots of properties of the various objects (repos, users, organizations), but there are also methods that return lists of objects. You need to iterate through these lists, or access them with indexes. What you usually get from these lists are also objects that have their own properties and methods.

In [9]:
commits = repo.get_commits()
commit = commits[0]
print("Author name: ", commit.author.name)
print("Committer name: ", commit.committer.name)
print("Lines added: ", commit.stats.additions)
print("Lines deleted: ", commit.stats.deletions)
print("Commit message:\n---------\n", commit.commit.message)

Author name:  Matthias Bussonnier
Committer name:  GitHub Web Flow
Lines added:  5
Lines deleted:  0
Commit message:
---------
 Merge pull request #1614 from staeiou/master

Add info on how to launch master branch install


In [11]:
import datetime

In [12]:
one_month_ago = datetime.datetime.now() - datetime.timedelta(days=30)
net_lines_added = 0
num_commits = 0

for commit in repo.get_commits(since = one_month_ago):
    net_lines_added += commit.stats.additions
    net_lines_added -= commit.stats.deletions
    num_commits += 1
    
print(net_lines_added, num_commits)

76 29


In [14]:
issues = repo.get_issues()
for issue in issues:
    last_updated_delta = datetime.datetime.now() - issue.updated_at
    if last_updated_delta > datetime.timedelta(days=365):
        print(issue.title, last_updated_delta.days)
    

Getting rid of IPython global. 391
Semantic highlighting.  407
Is it possible to add notebook metadata, like a description which shows up in the notebook file view? 407


## Organizations

Organizations are objects too, which have similar properties:

In [15]:
org = g.get_organization("jupyter")

In [16]:
print(org.name)
print(org.created_at)
print(org.html_url)

Project Jupyter
2014-04-23 21:36:43
https://github.com/jupyter


The API has a get_public_members() function, but it just shows those who are on the "people" board on the [organization's page](https://github.com/jupyter). You can also see that if someone doesn't have a field set, it returns None. Some people just have usernames set without full names.

In [17]:
for member in org.get_public_members():
    print(member.name, member.url)

Matthias Bussonnier https://api.github.com/users/Carreau
JamieW https://api.github.com/users/JamiesHQ
Corey Stubbs https://api.github.com/users/Lull3rSkat3r
Sylvain Corlay https://api.github.com/users/SylvainCorlay
Afshin Darian https://api.github.com/users/afshin
Steven Silvester https://api.github.com/users/blink1073
Safia Abdalla https://api.github.com/users/captainsafia
Dave Willmer https://api.github.com/users/dwillmer
Fernando Perez https://api.github.com/users/fperez
Paul Ivanov https://api.github.com/users/ivanov
None https://api.github.com/users/jakirkham
Jason Grout https://api.github.com/users/jasongrout
Jonathan Frederic https://api.github.com/users/jdfreder
Jessica B. Hamrick https://api.github.com/users/jhamrick
Min RK https://api.github.com/users/minrk
Peter Parente https://api.github.com/users/parente
Mike https://api.github.com/users/poplav
Kyle Kelley https://api.github.com/users/rgbkrk
Sumit Sahrawat https://api.github.com/users/sumitsahrawat
Thomas Kluyver https://a

We can go through all the repositories in the organization with the get_repos() function. It returns a list of repository objects, which have their own properties and methods.

In [18]:
repo.name

'notebook'

## Rate limiting

Now that we have made a few requests, we can see what our rate limit is. If you are logged in, you get 5,000 requests per hour. If you are not, you only get 60 per hour. You can use methods in the GitHub object to see your limit, usage, and reset time. We have used less than 50 of our 5,000 requests with these calls.

In [20]:
g.rate_limiting

(4908, 5000)

In [21]:
reset_time = g.rate_limiting_resettime
reset_time

1469149774

This value is in seconds since the UTC epoch (Jan 1st, 1970), so we have to convert it. Here is a quick function that takes a GitHub object, queries the API to find our next reset time, and converts it to minutes.

In [22]:
import datetime
def minutes_to_reset(github):
    reset_time = github.rate_limiting_resettime
    timedelta_to_reset = datetime.datetime.fromtimestamp(reset_time) - datetime.datetime.now()
    return timedelta_to_reset.seconds / 60
    

In [23]:
minutes_to_reset(g)

58.11666666666667

## Getting location data for an organization's contributors
### Mapping and geolocation

Before we get into how to query GitHub, we know we will have to get location coordinates for each contributor, and then plot it on a map. So we are going to do that first.

For geolocation, we are using geopy's geolocator object, which is based on Open Street Map's Nominatim API. Nominatim takes in any arbitrary location data and then returns a location object, which includes the best latitude and longitude coordinates it can find. 

This does mean that we will have more error than if we did this manually, and there might be vastly different levels of accuracy. For example, if someone just has "UK" as their location, it will show up in the geographic center of the UK, which is somewhere on the edge of the Lake District. "USA" resolves to somewhere in Kansas. However, you can get very specific location data if you put in more detail.

In [24]:
from geopy.geocoders import Nominatim

geolocator = Nominatim()
uk_loc = geolocator.geocode("UK")
print(uk_loc.longitude,uk_loc.latitude)

us_loc = geolocator.geocode("USA")
print(us_loc.longitude,us_loc.latitude)

bids_loc = geolocator.geocode("Doe Library, Berkeley CA, 94720 USA")
print(bids_loc.longitude,bids_loc.latitude)

-3.2765752 54.7023545
-100.4458824 39.7837304
-122.259492086406 37.87219435


We can plot points on a map using ipyleaflets and ipywidgets. We first set up a map object, which is created with various parameters. Then we create Marker objects, which are then appended to the map. We then display the map inline in this notebook.

In [25]:
import ipywidgets

from ipyleaflet import (
    Map,
    Marker,
    TileLayer, ImageOverlay,
    Polyline, Polygon, Rectangle, Circle, CircleMarker,
    GeoJSON,
    DrawControl
)

center = [30.0, 5.0]
zoom = 2
m = Map(default_tiles=TileLayer(opacity=1.0), center=center, zoom=zoom, layout=ipywidgets.Layout(height="600px"))

uk_mark = Marker(location=[uk_loc.latitude,uk_loc.longitude])
uk_mark.visible
m += uk_mark

us_mark = Marker(location=[us_loc.latitude,us_loc.longitude])
us_mark.visible
m += us_mark

bids_mark = Marker(location=[bids_loc.latitude,bids_loc.longitude])
bids_mark.visible
m += bids_mark

### Querying GitHub for location data

For our mapping script, we want to get profiles for everyone who has made a commit to any of the repositories in the Jupyter organization, find their location (if any), then add it to a list. The API has a get_contributors function for repo objects, which returns a list of contributors ordered by number of commits, but not one that works across all repos in an org. So we have to iterate through all the repos in the org, and run the get_contributors method for We also want to make sure we don't add any duplicates to our list to over-represent any areas, so we keep track of people in a dictionary.

I've written a few functions to make it easy to retreive and map an organization's contributors.

In [26]:
def get_org_contributor_locations(github, org_name):
    """
    For a GitHub organization, get location for contributors to any repo in the org.
    
    Returns a dictionary of {username URLS : geopy Locations}, then a dictionary of various metadata.
    
    """
    
    # Set up empty dictionaries and metadata variables
    contributor_locs = {}
    locations = []
    none_count = 0
    error_count = 0
    user_loc_count = 0
    duplicate_count = 0
    geolocator = Nominatim()

    
    # For each repo in the organization
    for repo in github.get_organization(org_name).get_repos():
        #print(repo.name)
        
        # For each contributor in the repo        
        for contributor in repo.get_contributors():
            print('.', end="")
            # If the contributor_locs dictionary doesn't have an entry for this user
            if contributor_locs.get(contributor.url) is None:
                
                # Try-Except block to handle API errors
                try:
                    # If the contributor has no location in profile
                    if(contributor.location is None):
                        #print("No Location")
                        none_count += 1
                    else:
                        # Get coordinates for location string from Nominatim API
                        location=geolocator.geocode(contributor.location)

                        #print(contributor.location, " | ", location)
                        
                        # Add a new entry to the dictionary. Value is user's URL, key is geocoded location object
                        contributor_locs[contributor.url] = location
                        user_loc_count += 1
                except Exception:
                    print('!', end="")
                    error_count += 1
            else:
                duplicate_count += 1
                
    return contributor_locs,{'no_loc_count':none_count, 'user_loc_count':user_loc_count, 
                             'duplicate_count':duplicate_count, 'error_count':error_count}


With this, we can easily query an organization. The U.D. Digital Service (org name: usds) is a small org that works well for testing. It takes about a second per contributor to get this data, so we want to test on small orgs.

In [27]:
usds_locs, usds_metadata = get_org_contributor_locations(g,'usds')

...............................

In [28]:
usds_metadata

{'duplicate_count': 1,
 'error_count': 0,
 'no_loc_count': 8,
 'user_loc_count': 22}

We are going to explore this dataset, but not plot names or usernames. I'm a bit hesitant to publish location data with unique identifiers, even if people put that information in their profiles. 

In [29]:
usds_locs_nousernames = []
for contributor, location in usds_locs.items():
    usds_locs_nousernames.append(location)
usds_locs_nousernames

[Location(D,C, Buccaneer Ridge Drive, Johnson City, Washington County, Tennessee, 37614, United States of America, (36.29885175, -82.3591932141095, 0.0)),
 Location(Washington, District of Columbia, United States of America, (38.8949549, -77.0366455, 0.0)),
 Location(東京都, 日本, (34.2255804, 139.294774527387, 0.0)),
 Location(Seattle, King County, Washington, United States of America, (47.6038321, -122.3300623, 0.0)),
 Location(Washington, District of Columbia, United States of America, (38.8949549, -77.0366455, 0.0)),
 Location(Washington, District of Columbia, United States of America, (38.8949549, -77.0366455, 0.0)),
 Location(Washington, District of Columbia, United States of America, (38.8949549, -77.0366455, 0.0)),
 Location(Dayton, Montgomery County, Ohio, United States of America, (39.7589478, -84.1916068, 0.0)),
 Location(United States of America, (39.7837304, -100.4458824, 0.0)),
 Location(D,C, Buccaneer Ridge Drive, Johnson City, Washington County, Tennessee, 37614, United Stat

Now we can map this data using another function I have written.

In [30]:
def map_location_dict(map_obj,org_location_dict):
    """
    Maps the locations in a dictionary of {ids : geoPy Locations}. 
    
    Must be passed a map object, then the dictionary. Returns the map object.
    
    """
    for username, location in org_location_dict.items():
        if(location is not None):
            mark = Marker(location=[location.latitude,location.longitude])
            mark.visible
            map_obj += mark
            

    return map_obj

In [31]:
center = [30.0,5.0]
zoom = 2
usds_map = Map(default_tiles=TileLayer(opacity=1.0), center=center, zoom=zoom, layout=ipywidgets.Layout(height="600px"))

usds_map = map_location_dict(usds_map, usds_locs)

In [33]:
usds_map

--- Logging error ---
Traceback (most recent call last):
  File "/home/mam/anaconda3/lib/python3.5/logging/__init__.py", line 984, in emit
    self.flush()
  File "/home/mam/anaconda3/lib/python3.5/logging/__init__.py", line 964, in flush
    self.stream.flush()
OSError: [Errno 5] Input/output error
Call stack:
  File "/home/mam/anaconda3/lib/python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/mam/anaconda3/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mam/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "/home/mam/anaconda3/lib/python3.5/site-packages/traitlets/config/application.py", line 596, in launch_instance
    app.start()
  File "/home/mam/anaconda3/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 442, in start
    ioloop.IOLoop.instance().start()
  File "/home/mam/anaconda3/lib/python3.5/site-packages/zmq/eventloop/