# Compute Project Stats
*Ack: Derived from project_stats.ipynb and project_stats.py*

* This notebook uses the GitHub [GraphQL API](https://developer.github.com/v4/) to compute the number of open and 
  closed bugs pertaining to Kubeflow GitHub Projects
  * Stats are broken down by labels
* Results are plotted using [plotly](https://plot.ly)
  * Plots are currently published on plot.ly for sharing; they are publicly vieable by anyone
  
## Setup GitHub

* You will need a GitHub personal access token in order to use the GitHub API
* See these [instructions](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/) for creating a personal access token
  * You will need the scopes:
    * repo
    * read:org    
* Set the environment variable `GITHUB_TOKEN` to pass your token to the code

## Setup Plot.ly Online

* In order to use plot.ly to publish the plot you need to create a plot.ly account and get an API key
* Follow plot.ly's [getting started guide](https://plot.ly/python/getting-started/)
* Store your API key in `~/.plotly/.credentials `

In [1]:
# Using Plotly v4.1.1 doesn't require account creation. Works offline.
# https://plot.ly/python/getting-started/
import plotly.graph_objs as go
import itertools
import math
import pandas

In [2]:
import project_stats

## Process issues from the specified Project

In [3]:
c = project_stats.ProjectStats(project="KF1.0")
c.main()

Make plots showing different groups of labels

* Columns are multi level indexes
* See [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) for instructions on multilevel indexes
   * We specify a list of tuples where each tuple specifies the item to select at the corresponding level in the index

### Compute expected effort 


In [4]:
# Costs of effort labels in workdays per eng
effort_labels_costs = { 
    "effort/1-day" : 1,
    "effort/1-days" : 1,
    "effort/3-days" : 3,
    "effort/5-days" : 5,
    "effort/2-weeks" : 10,
# removed temporarily because no issues in KF1.0 refer to this label yet
#    "effort/2-weeks+" : 20
}

effort_labels = list(effort_labels_costs.keys())

In [5]:
def current_effort_distribution(stats_df, open_or_total='open'):
    return stats_df[open_or_total][effort_labels].tail(1)
    
def open_effort_distribution(stats_df):
    return current_effort_distribution(stats_df, 'open')

def total_effort_distribution(stats_df):
    return current_effort_distribution(stats_df, 'total')

def current_effort_weeks(stats_df, open_or_total='open'):
    e = current_effort_distribution(stats_df, open_or_total)
    days = sum([e.iloc[0, e.columns.get_loc(l)] * effort_labels_costs[l] for l in effort_labels])
    # consider 5 workdays in weeks
    return math.ceil(days / 5)

def open_effort_weeks(stats_df):
    return current_effort_weeks(stats_df, 'open')

def total_effort_weeks(stats_df):
    return current_effort_weeks(stats_df, 'total')

In [6]:
open_effort_distribution(c.stats)

label,effort/1-day,effort/1-days,effort/3-days,effort/5-days,effort/2-weeks
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-09-18 13:18:46,4,6,7,7,7


In [7]:
total_effort_distribution(c.stats)

label,effort/1-day,effort/1-days,effort/3-days,effort/5-days,effort/2-weeks
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-09-18 13:18:46,4,6,7,7,7


In [8]:
print(open_effort_weeks(c.stats))

28


In [9]:
print(total_effort_weeks(c.stats))

28


In [41]:
df = pandas.DataFrame({}, index = effort_labels)
df['open'] = open_effort_distribution(c.stats).values[0]
df['total'] = total_effort_distribution(c.stats).values[0]

data = [
   go.Bar(
    x=effort_labels,
    y=df[col].values,
    name=col
    ) for col in df.columns
]

go.Figure(data)

## Compute Tracking Issue Stats


In [11]:
import os
import re
from github import Github

In [12]:
# NOTE: Setup GITHUB_TOKEN environment variable to your Github access token
gh = Github(os.environ['GITHUB_TOKEN'])
kf_org, = [o for o in gh.get_user().get_orgs() if o.name == 'Kubeflow']
kf_org.name

'Kubeflow'

In [13]:
def parse_issue_url(url_str):
    """ Parse a github issue url to extrac the repo name and the issue id.
        url_str: url for the issue.
        return (repo_name, issue_id)
    """
    url_parts = url_str.split('/')
    issue_id = url_parts[-1]
    repo_name = url_parts[-3]
    return (repo_name, int(issue_id))

In [15]:
def get_issue(gh_org, repo_name, issue_id):
    """ Fetch the specified Github Issue.
        gh_org: Org The Github org that owns the repo.
        repo_name: string Name of the repo
        issue_id: int
    """
    return gh_org.get_repo(repo_name).get_issue(issue_id)

In [17]:
def extract_referenced_issues(content):
    """Collect a list of github issues referenced in the content.
       
       content: string markdown representing the body of the tracking issue.
    """
    ref_issues = re.findall(r'[a-z/]*#\d+', content)
    return ref_issues

In [18]:
def get_issue_status(fq_issue_id, cur_repo):
    """Identify if the github issue is open or close.
        fq_issue_id: string representing github format for the issue id.
        cur_repo: string reponame of the current repository. All short form issues
                         are assumed to belong to this repository.
        
        returns 'open' or 'closed'.
    """
    parts = fq_issue_id.split('#')
    issue_id = int(parts[-1])
    repo_name = cur_repo
    if len(parts[0]) > 0:
        repo_name = parts[0].split('/')[-1]
    iss = get_issue(kf_org, repo_name, issue_id)
    return iss.state

In [46]:
def tracking_issue_state(issue_url):
    """ Compute the tracking issue state given the tracking issue url
    """
    repo_name, tr_issue_id = parse_issue_url(issue_url)
    tr_issue = get_issue(kf_org, repo_name, tr_issue_id)
    ref_issues = extract_referenced_issues(tr_issue.body)
    total_issues = len(ref_issues)
    ref_issue_states = [get_issue_status(r, repo_name) for r in ref_issues]
    closed = ref_issue_states.count('closed')
    return(closed, total_issues)
    #print('closed: ', closed, ' :: total: ', total_issues) 

In [34]:
# Tracking issues for KF 1.0
kf1_tracking_issues = {
    'Kfctl' : 'https://github.com/kubeflow/kfctl/issues/18',
    'Central Dashboard' : 'https://github.com/kubeflow/kubeflow/issues/4026',
    'Notebooks Manager UI' : 'https://github.com/kubeflow/kubeflow/issues/4062',
    'Notebooks Controller' : 'https://github.com/kubeflow/kubeflow/issues/3656',
    'Profiles Controller' : 'https://github.com/kubeflow/kubeflow/issues/4058',
    'KFServing Deployments' : 'https://github.com/kubeflow/kubeflow/issues/4061',
}

In [85]:
def plot_tracking_stats():
    tracking_df = pandas.DataFrame({}, columns=['closed', 'total', 'percent'])
    print('Fetching stats on...')
    for k,v in kf1_tracking_issues.items():
        print(k)
        closed, total = tracking_issue_state(v)
        tracking_df.loc[k] = [closed, total, closed * 100.0 / total]
    print("Done")
    bar_texts = [str(tracking_df.loc[i, 'closed']) + '/' + str(tracking_df.loc[i, 'total']) for i in tracking_df.index.values]
    data = [
       go.Bar(
        y=tracking_df.index.values,
        x=tracking_df['percent'],
        orientation = 'h',
        text = bar_texts,
        ) 
    ]

    return go.Figure(data)

In [86]:
plot_tracking_stats()

Fetching stats on...
Kfctl
Central Dashboard
Notebooks Manager UI
Notebooks Controller
Profiles Controller
KFServing Deployments
Done
