# Compute Project Stats

* This notebook uses the GitHub [GraphQL API](https://developer.github.com/v4/) to compute the number of open and 
  closed bugs pertaining to Kubeflow GitHub Projects
  * Stats are broken down by labels
* Results are plotted using [plotly](https://plot.ly)
  * Plots are currently published on plot.ly for sharing; they are publicly vieable by anyone
  
## Setup GitHub

* You will need a GitHub personal access token in order to use the GitHub API
* See these [instructions](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/) for creating a personal access token
  * You will need the scopes:
    * repo
    * read:org    
* Set the environment variable `GITHUB_TOKEN` to pass your token to the code

## Setup Plot.ly Online

* In order to use plot.ly to publish the plot you need to create a plot.ly account and get an API key
* Follow plot.ly's [getting started guide](https://plot.ly/python/getting-started/)
* Store your API key in `~/.plotly/.credentials `

In [8]:
# Use plotly cufflinks to plot data frames
# https://plot.ly/ipython-notebooks/cufflinks/
# instructions for offline plotting
# https://plot.ly/python/getting-started/#initialization-for-offline-plotting
#
# Follow the instructions for online plotting:
# https://plot.ly/python/getting-started/
# You will need to setup an account
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
import cufflinks as cf
from importlib import reload
import itertools

In [9]:
import project_stats
reload(project_stats)


<module 'project_stats' from '/home/jlewi/git_kubeflow-community/scripts/project_stats.py'>

In [10]:
c = project_stats.ProjectStats()
c.main()

Make plots showing different groups of labels

* Columns are multi level indexes
* See [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) for instructions on multilevel indexes
   * We specify a list of tuples where each tuple specifies the item to select at the corresponding level in the index

In [14]:
counts = ["open", "total"]
#labels = ["cuj/build-train-deploy", "cuj/multi-user", "area/katib"]
labels = ["priority/p0", "priority/p1", "priority/p2"]
columns = [(a,b) for (a,b) in itertools.product(counts, labels)]

import datetime
start=datetime.datetime(2019, 1, 1)

i = c.stats.index > start
#c.stats.iloc[i]
c.stats.loc[i, columns].iplot(kind='scatter', width=5, filename='project-stats', title='0.5.0 Issue Count')


In [12]:
c.stats.iloc[-1][columns]

       label                 
open   cuj/build-train-deploy    25
       cuj/multi-user            14
       area/katib                 8
total  cuj/build-train-deploy    33
       cuj/multi-user            19
       area/katib                 9
Name: 2019-02-16 00:33:48, dtype: int64

In [13]:
import datetime
start=datetime.datetime(2019, 1, 1)

i = c.stats.index > start
c.stats.iloc[i]


Unnamed: 0_level_0,open,open,open,open,open,open,open,open,open,open,...,total,total,total,total,total,total,total,total,total,total
label,addition/feature,api/v1alpha2,area/0.3.0,area/0.4.0,area/0.5.0,area/1.0.0,area/ambassador,area/api,area/api/v1beta1,area/bootstrap,...,improvement/optimization,kind/bug,kind/feature,nolabels,platform/azure,platform/gcp,priority/p0,priority/p1,priority/p2,priority/p3
time,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2019-01-03 12:16:47,4,4,1,2,17,1,1,2,1,8,...,1,1,1,1,1,12,7,72,22,2
2019-01-03 19:45:38,4,4,1,2,17,1,1,2,1,8,...,1,1,1,1,1,12,7,73,22,2
2019-01-04 02:34:32,4,4,1,2,17,1,1,2,1,8,...,1,1,1,1,1,12,7,73,23,2
2019-01-04 02:35:17,4,4,1,2,17,1,1,2,1,8,...,1,1,1,1,1,12,7,73,24,2
2019-01-06 21:20:15,4,4,1,2,17,1,1,2,1,8,...,1,1,1,1,1,12,7,74,24,2
2019-01-06 21:51:20,4,4,1,2,17,1,1,2,1,8,...,1,1,1,1,1,12,7,74,25,2
2019-01-07 01:40:27,4,4,1,2,17,1,1,2,1,8,...,1,2,1,1,1,12,7,75,25,2
2019-01-07 01:51:43,4,4,1,2,17,1,1,2,1,8,...,1,2,1,1,1,12,7,76,25,2
2019-01-07 04:19:07,4,4,1,2,17,1,1,2,1,8,...,1,2,1,1,1,12,7,77,25,2
2019-01-07 14:36:30,4,4,1,2,17,1,1,2,1,8,...,1,2,1,1,1,12,7,78,25,2
