# Compute Project Stats

* This notebook uses the GitHub [GraphQL API](https://developer.github.com/v4/) to compute the number of open and 
  closed bugs pertaining to Kubeflow GitHub Projects
  * Stats are broken down by labels
* Results are plotted using [plotly](https://plot.ly)
  * Plots are currently published on plot.ly for sharing; they are publicly vieable by anyone
  
## Setup GitHub

* You will need a GitHub personal access token in order to use the GitHub API
* See these [instructions](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/) for creating a personal access token
  * You will need the scopes:
    * repo
    * read:org    
* Set the environment variable `GITHUB_TOKEN` to pass your token to the code

## Setup Plot.ly Online

* In order to use plot.ly to publish the plot you need to create a plot.ly account and get an API key
* Follow plot.ly's [getting started guide](https://plot.ly/python/getting-started/)
* Store your API key in `~/.plotly/.credentials `

In [177]:
# Use plotly cufflinks to plot data frames
# https://plot.ly/ipython-notebooks/cufflinks/
# instructions for offline plotting
# https://plot.ly/python/getting-started/#initialization-for-offline-plotting
#
# Follow the instructions for online plotting:
# https://plot.ly/python/getting-started/
# You will need to setup an account
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
import cufflinks as cf
from importlib import reload
import itertools

In [194]:
import project_stats
reload(project_stats)


<module 'project_stats' from '/home/jlewi/git_kubeflow-community/scripts/project_stats.py'>

In [195]:
c = project_stats.ProjectStats()
c.main()

Make plots showing different groups of labels

* Columns are multi level indexes
* See [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) for instructions on multilevel indexes
   * We specify a list of tuples where each tuple specifies the item to select at the corresponding level in the index

In [198]:
counts = ["open", "total"]
labels = ["cuj/build-train-deploy", "cuj/multi-user", "area/katib"]
#labels = ["priority/p0", "priority/p1", "priority/p2"]
columns = [(a,b) for (a,b) in itertools.product(counts, labels)]

c.stats.loc[:, columns].iplot(kind='scatter', width=5, filename='project-stats', title='0.5.0 Issue Count')

In [213]:
c.stats.loc[:, columns].iloc[-1]

       label                 
open   cuj/build-train-deploy     4
       cuj/multi-user            16
       area/katib                 6
total  cuj/build-train-deploy     7
       cuj/multi-user            16
       area/katib                 6
Name: 2019-01-29 11:50:50.418719, dtype: int64