# Contributors to California Civic Data Coalition repositories

By Ben Welsh

This analysis is drawn from the open-source list of contributors compiled by GitHub. It was last harvested on Dec. 18, 2016, [using a Python script that interacts with GitHub's API](https://github.com/california-civic-data-coalition/django-calaccess-raw-data/blob/master/example/network-analysis/contributors.csv).  

In [1]:
import pandas as pd
import numpy as np

## Load in the data

In [2]:
table = pd.read_csv("./contributors.csv")

In [3]:
table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 183 entries, 0 to 182
Data columns (total 9 columns):
repo             183 non-null object
login            183 non-null object
name             141 non-null object
email            93 non-null object
company          93 non-null object
location         118 non-null object
bio              27 non-null object
avatar_url       183 non-null object
contributions    183 non-null int64
dtypes: int64(1), object(8)
memory usage: 12.9+ KB


In [4]:
table = table.replace(np.nan, "")

In [5]:
table.login = table.login.map(str.strip).str.lower()
table.company = table.company.map(str.strip)
table.location = table.location.map(str.strip)
table.avatar_url = table.avatar_url.map(str.strip)

In [6]:
corrections = pd.read_csv("contributors-corrections.csv")

In [7]:
table = table.merge(corrections, on="login", how="left")

KeyError: 'login'

In [None]:
table.company = table.corrected_company.fillna(table.company)

In [None]:
table.drop('corrected_company', axis=1, inplace=True)

In [None]:
table[table.company == ''].sort_values("login")

In [None]:
table.head(25)

### Contribution rankings

In [None]:
table.groupby("repo", as_index=False).contributions.sum().sort_values("contributions", ascending=False)

In [None]:
table.groupby('login', as_index=False).contributions.sum().sort_values("contributions", ascending=False).head(10)

In [None]:
table.groupby('company', as_index=False).contributions.sum().sort_values("contributions", ascending=False).head(10)

In [None]:
table.groupby('location', as_index=False).contributions.sum().sort_values("contributions", ascending=False).head(10)

## Unique contributors

In [None]:
table.groupby("repo", as_index=False).size().reset_index().sort_values(0, ascending=False)

In [None]:
unique_contributors = table.groupby(["login", "company", "location", "avatar_url"]).contributions.sum().reset_index()

In [None]:
unique_contributors.info()

In [None]:
unique_contributors.sort_values("login").head(25)

In [None]:
unique_contributors.to_csv("./unique-contributors.csv")