# Jupyter Notebook and Python test area

**Important shortcut: Press "Ctrl" + "Enter" for executing cells.**

Try it out! Click on the next cell (grey box) and press "Ctrl" + "Enter"

In [None]:
"hello world"

Test

# Analysis
## Using Production Coverage Measures to delete Parts of a Software System
### Background

Developers of the Java application "Spring PetClinic" said, that there would be much code that isn't used at all. 

### Our Task

Before migrating the application to the new platform, an analysis should find out, which parts of the system are still in use and which are not (and may be deleted safely).

## The Dataset

### Origin

To understand how much code isn't used, we recorded the executed code in production with the coverage tool [JaCoCo](http://www.jacoco.org/). The measurement took place between 21st Oct 2017 and 27st Oct 2017. The results were exported into a CSV file using the JaCoCo command line tool with the following command:

```bash
java -jar jacococli.jar report "C:\Temp\jacoco.exec" --classfiles \
C:\dev\repos\spring-petclinic\target\classes --csv jacoco.csv
```

### File
The CSV file `../dataset/jacoco_production_measure.csv` contains all lines of code that were passed through during the measurement's time span.

### Content

It shows a subset of the dataset created above with just the relevant coluns:

* `PACKAGE`: The package name of the Java source code file
* `CLASS`: The name of the Java class
* `LINE_COVERED`: The number of lines that were executed
* `LINE_MISSED`: The number of lines that weren't executed

## Our Analysis

### Step 1: Load the dataset

###### <span style="color:green;">SOLUTION <small>(Click the arrow on the left side if a hint is needed)</small></span>

In [None]:
import pandas as pd

coverage = pd.read_csv("../dataset/jacoco_production_measure.csv")
coverage.head()

### Step 2: Data preparation
Create additional columns that how all lines of a class and the ratio of the covered lines to all lines of a class,

###### <span style="color:green;">SOLUTION <small>(Click the arrow on the left side if a hint is needed)</small></span>

In [None]:
coverage['lines'] = coverage.LINE_COVERED + coverage.LINE_MISSED
coverage['ratio'] = coverage.LINE_COVERED / coverage.lines
coverage.head()

### Step 3: Modelling
Calculate the mean coverage ratio per package.

###### <span style="color:green;">SOLUTION <small>(Click the arrow on the left side if a hint is needed)</small></span>

In [None]:
mean_coverage = coverage.groupby("PACKAGE")['ratio'].mean()
mean_coverage

### Step 4: Visualization
Create a horizontal line chart that visualizes the coverage ratios for each package.

###### <span style="color:green;">SOLUTION <small>(Click the arrow on the left side if a hint is needed)</small></span>

In [None]:
%matplotlib inline
mean_coverage.plot(kind='barh');