# Triage Issue

* This is a helper notebook to go with the triage module
* The notebook serves the following purposes
  1. It can be used to generate plots showing how well the Kubeflow project is doing triaging issues in a timely fashion
  1. It provides snippets of code that can be executed to triage issues
  1. It provides snippets of code that can be used to collect some of the information needed by the code
* Issues needing Triage are added to the [Needs Triage Project](https://github.com/orgs/kubeflow/projects/26)
  * Kubeflow maintainers can look at the Kanban board to identify issues needing triage and triage them

## Setup

* The cells below import required libraries and do some other housekeeping

In [6]:
import matplotlib
import importlib
import logging
import sys
import os
import datetime
from dateutil import parser as dateutil_parser
import glob
import json
import numpy as np
import pandas as pd
# A bit of a hack to set the path correctly
sys.path = [os.path.abspath(os.path.join(os.getcwd(), "..", "..", "py"))] + sys.path

logging.basicConfig(level=logging.INFO,
                  format=('%(levelname)s|%(asctime)s'
                        '|%(message)s|%(pathname)s|%(lineno)d|'),
                datefmt='%Y-%m-%dT%H:%M:%S',
                )
logging.getLogger().setLevel(logging.INFO)

In [2]:
%matplotlib inline

In [7]:
import code_intelligence
from code_intelligence import graphql
from issue_triage import triage
importlib.reload(triage)

<module 'issue_triage.triage' from '/home/jovyan/git_kubeflow-code-intelligence/py/issue_triage/triage.py'>

In [4]:
client = graphql.GraphQLClient()

## Update Needs Triage For Recently Updated Issues

* The cells below are used to invoke the code that will update recently updated issues
* If an issue needs triage it is added to the [Needs Triage Kanban board](https://github.com/orgs/kubeflow/projects/26)
* If an issue in the [Needs Triage Kanban board](https://github.com/orgs/kubeflow/projects/26) has been triaged it is removed from the kanban board

In [8]:
importlib.reload(triage)
triager=triage.IssueTriage()

In [7]:
repos = ["examples", "fairing", "kubeflow", "kfserving", "manifests", "metadata", "pytorch-operator",
         "testing", "tf-operator", "website"]

for name in repos:
    repo = "kubeflow/" + name
    today = datetime.datetime.now()
    today = datetime.datetime(year=today.year, month=today.month, day=today.day)
    
    start_time = today - datetime.timedelta(weeks=24)

    issue_filter = {
        "since": start_time.isoformat(),
    }
    triager.triage(repo, issue_filter=issue_filter)

INFO|2019-11-26T15:37:26|kubeflow/examples has a total of 174 issues|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|374|
INFO|2019-11-26T15:37:26|Processing shard 0|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|539|
INFO|2019-11-26T15:37:26|Issue https://github.com/kubeflow/examples/issues/3:
state:Issue doesn't need attention.
|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|677|
INFO|2019-11-26T15:37:26|Issue https://github.com/kubeflow/examples/issues/5:
state:Issue doesn't need attention.
|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|677|
INFO|2019-11-26T15:37:26|Issue https://github.com/kubeflow/examples/issues/7:
state:Issue doesn't need attention.
|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|677|
INFO|2019-11-26T15:37:26|Issue https://github.com/kubeflow/examples/issues/32:
state:Issue doesn't need attention.
|/home/jlewi/git_kubeflow-code-intelligence/py/issue_tria

## Update Needs Triage Kanban Board

* The code below processes all issues in the needs triage kanban board and removes issues that have already been triaged

In [None]:
importlib.reload(triage)
triager = triage.IssueTriage()
triager.update_kanban_board()

<function extract_stack at 0x7efdb3e8c6a8>|/home/jovyan/git_kubeflow-code-intelligence/py/code_intelligence/graphql.py|30|
INFO|2020-04-20T13:40:49|Issue https://github.com/kubeflow/website/issues/1911:
state:Issue needs triage:
	 Issue needs one of the priorities ['priority/p0', 'priority/p1', 'priority/p2', 'priority/p3']
	 Issue needs an area label
|/home/jovyan/git_kubeflow-code-intelligence/py/issue_triage/triage.py|677|
INFO|2020-04-20T13:40:49|Issue https://github.com/kubeflow/website/issues/1911 already in triage project|/home/jovyan/git_kubeflow-code-intelligence/py/issue_triage/triage.py|751|
INFO|2020-04-20T13:40:49|Issue https://github.com/kubeflow/kubeflow/issues/4968:
state:Issue needs triage:
	 Issue needs one of the priorities ['priority/p0', 'priority/p1', 'priority/p2', 'priority/p3']
	 Issue needs an area label
|/home/jovyan/git_kubeflow-code-intelligence/py/issue_triage/triage.py|677|
INFO|2020-04-20T13:40:49|Issue https://github.com/kubeflow/kubeflow/issues/4968 al

## Download Issues

* The cells below use GitHub's GraphQL API to download all issues in a specified repository that have been updated since `start_time`
* The downloaded issues are stored in `.data`; this makes it easy to rerun the processing without needing to redownload the issues

In [8]:
today = datetime.datetime.now()
today = datetime.datetime(year=today.year, month=today.month, day=today.day)

start_time = today - datetime.timedelta(weeks=24)

In [9]:
issue_filter = {
    "since": start_time.isoformat(),
}
start_time_day =  start_time.strftime("%Y%m%d")
repo = "kubeflow/kubeflow"
issues_dir = os.path.join(os.getcwd(), ".data", "issues", repo, start_time_day)

if os.path.exists(issues_dir):
    logging.info("Issues data already exists; not redownloading")
else:    
    triager = triage.IssueTriage()
    try:        
        triager.download_issues(repo, issues_dir)
    except Exception as download_error:
        raise

INFO|2019-11-04T09:49:51|kubeflow/kubeflow has a total of 1111 issues|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|378|
INFO|2019-11-04T09:49:51|initializing the shard writer|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|381|
INFO|2019-11-04T09:49:51|Wrote shard 0|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|410|
INFO|2019-11-04T09:49:58|Wrote shard 1|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|410|
INFO|2019-11-04T09:50:07|Wrote shard 2|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|410|
INFO|2019-11-04T09:50:15|Wrote shard 3|/home/jlewi/git_kubeflow-code-intelligence/py/issue_triage/triage.py|410|


KeyboardInterrupt: 

## Compute Triage Stats

* The cells below compute a time series indicating the number of untriaged issues as a function of time
* The graph is used to determine whether the backlog of untriaged issues is increasing or decreasing

In [None]:
shard_files = glob.glob(os.path.join(issues_dir, "*.json"))

def init_df(offset=0, size=300):
    """Initialize a dataframe of the specified size."""
    return pd.DataFrame({
        "time": [datetime.datetime.now()] * size,
        "delta": np.zeros(size),
    }, index=offset + np.arange(size))


def init_issue_df(offset=0, size=300):
    return pd.DataFrame({
        "created_at": [datetime.datetime(year=2050, month=1, day=1)] * size,
        "triaged_at": [datetime.datetime(year=2050, month=1, day=1)] * size,
        "closed": [bool] * size,
        "url": [""] * size,
        "needs_triage": [bool] * size,
    }, index=offset + np.arange(size))
    
    
def grow_df(df, offset=0, size=300):
    return pd.concat([df, init_df(offset, size)])

num_issues = 0

triage_stats = init_df(size=len(shard_files) * 100 * 2)
issues_df = init_issue_df(size=len(shard_files) * 100)

issues_index = 0

for f in shard_files:
    logging.info("Processing %s", f)
    with open(f) as hf:
        issues = json.load(hf)

    delta = 2 * len(issues)
    if num_issues + delta  > triage_stats.shape[0]:
        # Grow the dataframe
        triage_stats = grow_df(triage_stats, offset=triage_stats.shape[0], size=delta)

    
    if issues_index + len(issues) > issues_df.shape[0]:
        issues_df = grow_df(issues_df, offset= issues_df.shape[0], size=len(issues))    

    for i in issues:        
        info = triage.TriageInfo.from_issue(i)
        
        create_time = dateutil_parser.parse(info.issue["createdAt"])
        
        issues_df["created_at"].at[issues_index] = create_time
        issues_df["url"].at[issues_index] = info.issue["url"]
        issues_df["needs_triage"].at[issues_index] = info.needs_triage
        
        if not info.needs_triage and not info.triaged_at:
            raise ValueError("Issue doesn't need triage but triaged at time not set")
        if info.triaged_at:
            issues_df["triaged_at"].at[issues_index] = info.triaged_at            

        if info.closed_at:
            issues_df["closed"].at[issues_index] = True
        else:
            issues_df["closed"].at[issues_index] = False
        issues_index += 1


issues_df = issues_df[:issues_index]

In [None]:
# Filter issues to issues created after start_time
indexes = issues_df["created_at"] > start_time
issues_df = issues_df.iloc[indexes.values]

* Compute a series containing the number of untriaged issues as a function of time

In [None]:
opened = pd.Series([1]*issues_df.shape[0], index=issues_df["created_at"])
triaged_issues = issues_df.iloc[(issues_df["needs_triage"] == False).values]
triaged = pd.Series([-1]*triaged_issues.shape[0], index=triaged_issues["triaged_at"])

deltas = pd.concat([opened, triaged])
deltas = deltas.sort_index()
untriaged = deltas.cumsum()

### Plot Number of Untriaged Issues Over Time

* The graph below shows the number of untriaged issues over time
* Ideally this graph should be hovering around zero indicating Kubeflow is triaging issues in a timely fashion
* If the number of untriaged issues is increasing over time then the Kubeflow project isn't keeping up with incoming issues

In [None]:
from matplotlib import pylab 
pylab.plot(untriaged.index, untriaged.values, '.-')
pylab.title("Untriaged issues in " + repo)

## Triage a Single issue

* This cell can be used to triage a single issue
* Its useful if you want to find out specific reasons why an issue isn't considered triaged

In [None]:
importlib.reload(triage)
triager = triage.IssueTriage()
url = "https://github.com/kubeflow/kubeflow/issues/1583"
issue_info = triager.triage_issue(url)

## Fetch Card Id

* This is a snipped which is useful for getting the id of a project card.
* We use it to fetch the card id that triage.py should add issues needing triage to.

In [None]:
project_query="""query projectCards($org: String!, $project: String!) {
  organization(login: $org) {
    projects(last: 1, search: $project) {
      totalCount
      edges {
        node {
          name
          url
          columns(first: 20) {
            totalCount
            pageInfo {
              endCursor
              hasNextPage
            }
            nodes {
              name
              id
            }
          }
        }
      }
    }
  }
}

"""
variables = {
    "org": "kubeflow",
    "project": "Bug Triage",
}
results = client.run_query(project_query, variables)
results