# Compute Project Stats

* This notebook uses the GitHub [GraphQL API](https://developer.github.com/v4/) to compute the number of open and 
  closed bugs pertaining to Kubeflow GitHub Projects
  * Stats are broken down by labels
* Results are plotted using [plotly](https://plot.ly)
  * Plots are currently published on plot.ly for sharing; they are publicly vieable by anyone
  
## Setup GitHub

* You will need a GitHub personal access token in order to use the GitHub API
* See these [instructions](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/) for creating a personal access token
  * You will need the scopes:
    * repo
    * read:org    
* Set the environment variable `GITHUB_TOKEN` to pass your token to the code

## Setup Plot.ly Online

* In order to use plot.ly to publish the plot you need to create a plot.ly account and get an API key
* Follow plot.ly's [getting started guide](https://plot.ly/python/getting-started/)
* Store your API key in `~/.plotly/.credentials `

In [8]:
# Use plotly cufflinks to plot data frames
# https://plot.ly/ipython-notebooks/cufflinks/
# instructions for offline plotting
# https://plot.ly/python/getting-started/#initialization-for-offline-plotting
#
# Follow the instructions for online plotting:
# https://plot.ly/python/getting-started/
# You will need to setup an account
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
import cufflinks as cf
#from importlib import reload
import itertools

In [15]:
import project_stats
reload(project_stats)


<module 'project_stats' from 'project_stats.py'>

In [17]:
c = project_stats.ProjectStats()
c.main()

Make plots showing different groups of labels

* Columns are multi level indexes
* See [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) for instructions on multilevel indexes
   * We specify a list of tuples where each tuple specifies the item to select at the corresponding level in the index

In [18]:
counts = ["open", "total"]
#labels = ["cuj/build-train-deploy", "cuj/multi-user", "area/katib"]
labels = ["priority/p0", "priority/p1", "priority/p2"]
columns = [(a,b) for (a,b) in itertools.product(counts, labels)]

import datetime
start=datetime.datetime(2019, 1, 1)

i = c.stats.index > start
#c.stats.iloc[i]
c.stats.loc[i, columns].iplot(kind='scatter', width=5, filename='project-stats', title='0.5.0 Issue Count')


In [19]:
c.stats.iloc[-1][columns]

       label      
open   priority/p0      1
       priority/p1      6
       priority/p2      2
total  priority/p0     23
       priority/p1    130
       priority/p2     14
Name: 2019-04-07 15:14:07, dtype: int64

In [20]:
import datetime
start=datetime.datetime(2019, 1, 1)

i = c.stats.index > start
c.stats.iloc[i]


Unnamed: 0_level_0,open,open,open,open,open,open,open,open,open,open,...,total,total,total,total,total,total,total,total,total,total
label,addition/feature,api/v1alpha2,area/0.4.0,area/0.5.0,area/ambassador,area/api,area/api/v1beta1,area/bootstrap,area/centraldashboard,area/docs,...,kind/feature,kind/investigate,nolabels,platform/azure,platform/gcp,platform/onprem,priority/p0,priority/p1,priority/p2,priority/p3
time,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2019-01-03 12:16:47,2,4,4,12,3,1,1,4,0,3,...,1,0,0,1,6,0,5,43,10,1
2019-01-04 07:19:45,2,4,4,12,3,1,1,4,0,3,...,1,0,0,1,6,0,5,43,10,1
2019-01-04 08:08:16,2,4,4,12,3,1,1,4,0,3,...,1,0,0,1,6,0,5,43,10,1
2019-01-04 14:12:55,2,4,4,12,3,1,1,4,0,3,...,1,0,0,1,6,0,5,43,10,1
2019-01-05 20:48:50,2,4,3,12,3,1,1,4,0,3,...,1,0,0,1,6,0,5,43,10,1
2019-01-07 01:40:27,2,4,3,12,3,1,1,4,0,3,...,1,0,0,1,6,0,5,44,10,1
2019-01-07 01:51:43,2,4,3,12,3,1,1,4,0,3,...,1,0,0,1,6,0,5,45,10,1
2019-01-07 04:19:07,2,4,3,12,3,1,1,4,0,3,...,1,0,0,1,6,0,5,46,10,1
2019-01-07 14:36:30,2,4,3,12,3,1,1,4,0,4,...,1,0,0,1,6,0,5,47,10,1
2019-01-07 15:08:48,2,4,3,11,3,1,1,3,0,4,...,1,0,0,1,6,0,5,47,10,1


In [21]:
c.data

Unnamed: 0,delta,label,time,total_delta
0,1.0,area/kfctl,2019-04-04 05:30:30,1.0
1,1.0,priority/p0,2019-04-04 05:30:30,1.0
2,1.0,area/kfctl,2019-02-04 18:18:17,1.0
3,1.0,priority/p1,2019-02-04 18:18:17,1.0
4,1.0,area/pipelines,2019-03-21 00:25:25,1.0
5,1.0,priority/p1,2019-03-21 00:25:25,1.0
6,1.0,area/0.5.0,2019-03-15 21:22:55,1.0
7,1.0,priority/p1,2019-03-15 14:56:27,1.0
8,1.0,area/docs,2019-01-07 14:36:30,1.0
9,1.0,area/jupyter,2019-01-07 14:36:30,1.0
