<a id='intro'></a>
## Scatter Diagram for Task Completion Time

In this notebook, main goal is generating a project diagram displaying daily mean/Q1/Q3/std of story cycle times by;

* reading issue data from a csv file
* calculating daily mean/Q1/Q3/std
* calculation all time averages of cycle time
* plotting all findings on a single diagram

In [20]:
import pandas as pd
import math
import numpy as np

from plotly.offline import plot
import plotly.graph_objs as go

In [21]:
df = pd.read_csv(
    'github_issues.csv', usecols=['issue_id', 'done_fd', 'issue_type', 'in_progress', 'sign_off'], parse_dates=['done_fd']).dropna()
df['complete'] = df['in_progress'] + df['sign_off']
df.head(3)

Unnamed: 0,issue_id,issue_type,in_progress,sign_off,done_fd,complete
5,418,bug,0,1,2020-03-12,1
8,415,spike,2,4,2020-03-17,6
9,414,story,0,1,2020-03-11,1


In [22]:
# filter issues by types, only get the implementation stories
types = ['bug', 'story', 'techdebt'] 
df = df[df.issue_type.isin(types)]
df.head(3)

Unnamed: 0,issue_id,issue_type,in_progress,sign_off,done_fd,complete
5,418,bug,0,1,2020-03-12,1
9,414,story,0,1,2020-03-11,1
13,410,bug,0,5,2020-03-10,5


In [23]:
# concatenating type with ID column 
df["info"]= df["issue_id"].astype(str) + ',' + df["issue_type"]
df.head(3)

Unnamed: 0,issue_id,issue_type,in_progress,sign_off,done_fd,complete,info
5,418,bug,0,1,2020-03-12,1,"418,bug"
9,414,story,0,1,2020-03-11,1,"414,story"
13,410,bug,0,5,2020-03-10,5,"410,bug"


In [24]:
median = df['complete'].median()
print(median)

4.0


In [25]:
# daily mean, standard deviation, Q1, and Q3 is calculated.
def q1(x):
    return x.quantile(0.25)

def q3(x):
    return x.quantile(0.75)

f = {'complete': ['mean', 'std', q1, q3]}
df_define = df.groupby('done_fd').agg(f)

df_define.complete['std'] = df_define.complete['std'].fillna(df.groupby('done_fd')['complete'].last())

df_define.head(3)

Unnamed: 0_level_0,complete,complete,complete,complete
Unnamed: 0_level_1,mean,std,q1,q3
done_fd,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2019-09-25,0.0,0.0,0.0,0.0
2019-10-07,6.0,6.0,6.0,6.0
2019-10-14,9.8,6.797058,6.0,14.0


## Plotting
Here we will display multiple scatters in one layout;

* mean per day
* upper & lower bound per day
* issues by dot plotting

In [32]:
upper_bound = go.Scatter(
    name='Upper Bound',
    x=df_define.complete.index,
    y=df_define.complete['mean'] + df_define.complete['std'],
    mode='lines',
    marker=dict(color="#444"),
    line=dict(width=0),
    fillcolor='rgba(68, 68, 68, 0.3)',
    fill='tonexty')

trace = go.Scatter(
    name='Rolling Mean',
    x=df_define.complete.index,
    y=df_define.complete['mean'],
    mode='lines',
    line=dict(color='rgb(31, 119, 180)'),
    fillcolor='rgba(68, 68, 68, 0.3)',
    fill='tonexty')

lower_bound = go.Scatter(
    name='Lower Bound',
    x=df_define.complete.index,
    y=df_define.complete['mean'] - df_define.complete['std'],
    marker=dict(color="#444"),
    fillcolor='rgba(68, 68, 68, 0.3)',
    line=dict(width=0),
    mode='lines')

average = go.Scatter(
    name='Average',
    x=df.done_fd,
    y=[median] * len(df),
    mode='lines')

dots = go.Scatter(name="Issue",
                  x=df.done_fd, 
                  y=df.complete, 
                  hovertext=df['info'].astype(str),
                  mode='markers', 
                  marker=dict(
                    color=df['complete'], #set color equal to a variable
                    colorscale='Viridis', # one of plotly colorscales
                    showscale=True)
                 )

# Trace order can be important
# with continuous error bars
data = [lower_bound, trace, upper_bound, average, dots]

layout = go.Layout(
    hovermode='closest',
    yaxis=dict(title='Completed in Days'),
    title='Story/TechDebt/Bug - Completion Time Diagram<br>Notice the hover text!',
    showlegend = False)

fig = go.Figure(data=data, layout=layout)
plot(fig, filename='control_diagram.html')

'file:///Users/fatmaurek/Notebooks/Github Project Management/control_diagram.html'