# Compute Project Stats

* This notebook uses the GitHub [GraphQL API](https://developer.github.com/v4/) to compute the number of open and 
  closed bugs pertaining to Kubeflow GitHub Projects
  * Stats are broken down by labels
* Results are plotted using [plotly](https://plot.ly)
  * Plots are currently published on plot.ly for sharing; they are publicly vieable by anyone
  
## Setup GitHub

* You will need a GitHub personal access token in order to use the GitHub API
* See these [instructions](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/) for creating a personal access token
  * You will need the scopes:
    * repo
    * read:org    
* Set the environment variable `GITHUB_TOKEN` to pass your token to the code

## Setup Plot.ly Online

* In order to use plot.ly to publish the plot you need to create a plot.ly account and get an API key
* Follow plot.ly's [getting started guide](https://plot.ly/python/getting-started/)
* Store your API key in `~/.plotly/.credentials `

In [3]:
# Use plotly cufflinks to plot data frames
# https://plot.ly/ipython-notebooks/cufflinks/
# instructions for offline plotting
# https://plot.ly/python/getting-started/#initialization-for-offline-plotting
#
# Follow the instructions for online plotting:
# https://plot.ly/python/getting-started/
# You will need to setup an account
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
import cufflinks as cf
#from importlib import reload
import itertools

In [4]:
import project_stats
#reload(project_stats)


In [5]:
c = project_stats.ProjectStats(project="0.6.0")
c.main()

Make plots showing different groups of labels

* Columns are multi level indexes
* See [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) for instructions on multilevel indexes
   * We specify a list of tuples where each tuple specifies the item to select at the corresponding level in the index

In [6]:
counts = ["open", "total"]
#labels = ["cuj/build-train-deploy", "cuj/multi-user", "area/katib"]
labels = ["priority/p0", "priority/p1", "priority/p2"]
columns = [(a,b) for (a,b) in itertools.product(counts, labels)]

import datetime
start=datetime.datetime(2019, 1, 1)

i = c.stats.index > start
#c.stats.iloc[i]
c.stats.loc[i, columns].iplot(kind='scatter', width=5, filename='project-stats', title='{0} Issue Count'.format(c.project))



Consider using IPython.display.IFrame instead



In [7]:
c.stats.iloc[-1][columns]

       label      
open   priority/p0      8
       priority/p1    105
       priority/p2     32
total  priority/p0     10
       priority/p1    129
       priority/p2     35
Name: 2019-05-03 22:57:02, dtype: int64

In [8]:
import datetime
start=datetime.datetime(2019, 1, 1)

i = c.stats.index > start
c.stats.iloc[i]


Unnamed: 0_level_0,open,open,open,open,open,open,open,open,open,open,...,total,total,total,total,total,total,total,total,total,total
label,addition/feature,area/0.3.0,area/0.4.0,area/0.5.0,area/0.6.0,area/1.0.0,area/bootstrap,area/build-release,area/centraldashboard,area/deployment,...,improvement/enhancement,kind/bug,kind/feature,nolabels,platform/aws,platform/gcp,platform/minikube,priority/p0,priority/p1,priority/p2
time,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2019-01-03 19:45:38,0,1,3,5,0,1,7,2,0,0,...,1,0,0,1,0,9,1,3,32,14
2019-01-04 02:34:32,0,1,3,5,0,1,7,3,0,0,...,1,0,0,1,0,9,1,3,32,15
2019-01-04 02:35:17,0,1,3,5,0,1,7,4,0,0,...,1,0,0,1,0,9,1,3,32,16
2019-01-06 21:20:15,0,1,3,5,0,1,7,4,0,0,...,1,0,0,1,0,9,1,3,33,16
2019-01-06 21:51:20,0,1,3,5,0,1,7,4,0,0,...,1,0,0,1,0,9,1,3,33,17
2019-01-07 15:06:42,0,1,3,5,0,1,7,4,0,0,...,1,0,0,1,0,9,1,3,34,17
2019-01-08 21:01:44,0,1,3,5,0,1,7,4,0,0,...,1,0,0,1,0,9,1,3,34,18
2019-01-11 23:45:09,0,1,3,5,0,1,7,4,0,0,...,1,0,0,1,0,9,1,3,35,18
2019-01-14 00:38:51,0,1,3,5,0,1,7,4,0,0,...,1,0,0,1,0,9,1,3,36,18
2019-01-14 00:44:31,0,1,3,5,0,1,7,4,0,0,...,1,0,0,1,0,9,1,3,37,18


In [9]:
c.data

Unnamed: 0,time,delta,label,total_delta
0,2019-05-03 22:57:02,1.0,area/testing,1.0
1,2019-05-03 22:57:02,1.0,help wanted,1.0
2,2019-05-03 22:57:02,1.0,priority/p1,1.0
3,2019-05-03 22:40:54,1.0,area/testing,1.0
4,2019-05-03 22:40:54,1.0,help wanted,1.0
5,2019-05-03 22:40:54,1.0,priority/p1,1.0
6,2019-05-01 04:14:04,1.0,area/front-end,1.0
7,2019-05-01 04:14:04,1.0,priority/p1,1.0
8,2019-04-04 18:55:51,1.0,priority/p1,1.0
9,2019-04-30 22:04:11,1.0,priority/p1,1.0
