# Software Development Capability Analysis
## by Marc Vitalis

## Preliminary Wrangling

> Briefly introduce your dataset here.

In [None]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from pandas.api.types import CategoricalDtype

%matplotlib inline

> Load in your dataset and describe its properties through the questions below.
Try and motivate your exploration goals through this section.

In [None]:
workitems = pd.read_csv('workitems_master.csv')
workitems.head()

In [None]:
workitems.info()

**Convert Dates to Date type**

Date format are represented as string (object), we should change them first to datetime format.

In [None]:
workitems.new = pd.to_datetime(workitems.new)
workitems.doing = pd.to_datetime(workitems.doing)
workitems.done = pd.to_datetime(workitems.done)

workitems.head()

In [None]:
workitems.info()

**Convert Category Types**

In [None]:
workitems.workitem_type.value_counts()

In [None]:
workitem_types = CategoricalDtype(categories = ['Story', 'Bug', 'Issue'], ordered=False)
workitems.workitem_type = workitems.workitem_type.astype(workitem_types)
workitems.head()

**Extract `actual_work`**

In [None]:
workitems.info()

In [None]:
workitems['actual_work'] = (workitems.done - workitems.doing).dt.days

#zero sum should be converted to a day of work if they have worked on it at least 0.5h
zero_work = workitems.actual_work == 0
workitems.loc[zero_work, 'actual_work'] = 1

#just remove the zero effort ones
workitems = workitems[((workitems.done - workitems.doing) / pd.Timedelta(hours = 1)) > 0.5]

workitems.info()

### What is the structure of your dataset?

> The dataset consists of 2393, with 10 features (workitem_type, estimate, words, rel (release), sprint, assigned_to, new (date started), doing (date started working), done, and actual work (done - doing). Variables main point of interest are the date stamps for the work. Some are just to describe the work item such as sprint, release and assigned_to.

### What is/are the main feature(s) of interest in your dataset?

> I'm more interested how variables affects `actual_work`. The goal is find out for the patterns that affects the actual work.

### What features in the dataset do you think will help support your investigation into your feature(s) of interest?

> The dataset contains data and underwent to three (3) SDLC pattern (non-structured, semi-agile, scrum). The date stamps are very important (`new`, `doing`, `done`), this will help me extract important information, such as days of the week, months, or observe the time flow pattern if the SDLC pattern improves through time, or made it worst. As bonus I can also make use the correlation of titles to the actual work.

## Univariate Exploration

> First to explore is the main point of interest, `actual_work`.

In [None]:
binsize = 20
bins = np.arange(0, workitems.actual_work.max()+binsize, binsize)

plt.figure(figsize=[8, 5])
plt.hist(data = workitems, x = 'actual_work', bins = bins)
plt.xlabel('Actual Work (Days)')
plt.show()

> Make sure that, after every plot or related series of plots, that you
include a Markdown cell with comments about what you observed, and what
you plan on investigating next.

### Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

> Your answer here!

### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

> Your answer here!

## Bivariate Exploration

> In this section, investigate relationships between pairs of variables in your
data. Make sure the variables that you cover here have been introduced in some
fashion in the previous section (univariate exploration).

### Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> Your answer here!

### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Your answer here!

## Multivariate Exploration

> Create plots of three or more variables to investigate your data even
further. Make sure that your investigations are justified, and follow from
your work in the previous sections.

### Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

> Your answer here!

### Were there any interesting or surprising interactions between features?

> Your answer here!

> At the end of your report, make sure that you export the notebook as an
html file from the `File > Download as... > HTML` menu. Make sure you keep
track of where the exported file goes, so you can put it in the same folder
as this notebook for project submission. Also, make sure you remove all of
the quote-formatted guide notes like this one before you finish your report!