<a href="https://colab.research.google.com/github/munich-ml/MLPy2020/blob/master/15_Logfiles_w_NumPy_blank.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Logfile challenge

## setup

This following code block prepares the coding challenge:
1. clone GitHub project `MLPy2020`
1. import and execute `create_csv` helper function
1. open file `logfile.csv` and store content into variable `s`


In [0]:
import os
import numpy as np

# Get munich-ml repo from GitHub
if "MLPy2020" in os.listdir():
    !git -C MLPy2020 pull
else:
    !git clone https://github.com/munich-ml/MLPy2020/

with open(os.path.join("MLPy2020", "datasets", "logfile.csv"), "r") as file:
    s = file.read()

# reuse `parse_logfile_string`

The function `parse_logfile_string` has been moved to MLPy_helper_funcs.py 

In [0]:
from MLPy2020.MLPy_helper_funcs import parse_logfile_string

In [0]:
log = parse_logfile_string(s)
log

In [0]:
log.keys()

# use NumPy

In [0]:
data = np.array(log["data"])

In [0]:
data

In [0]:
data[:, 0]

Ok, indexing in NumPy is very easy!



## Exercise
Write a wrapper function `parse_logfile_string_to_numpy(s)`, that wraps the original `parse_logfile_string` function and converts the list of lists `data` field into a `NumPy.array`.

`parse_logfile_string_to_numpy` shall return: `{'params': <type 'dict', 'names': <type 'list', 'data': <type 'numpy.ndarray'>}`

In [0]:
def parse_logfile_string_to_numpy(s):
    
    return None

## Testing `parse_logfile_string_to_numpy` 

In [0]:
log = parse_logfile_string_to_numpy(s)
log

In [0]:
type(log["data"])

## Create `apply calibration` function

In [0]:
def apply_calibration(log):
    for param, cal_factor in log["params"].items():
        if "calibration factor" in param:
            sig = param.split(" ")[-1]
            col_index = log["names"].index(sig)

            log["data"][:, col_index] *= float(cal_factor)

In [0]:
log = parse_logfile_string_to_numpy(s)

print("data head before calibration:")
print(log["data"][:3, :])

apply_calibration(log)

print("data head after calibration:")
print(log["data"][:3, :])

## `in-place` operation



`apply_calibration` overwrote the original data. 

Thus, one has to remember whether `apply_calibration` has already been executed and avoid executing it twice.

## adding a new column `sig1_cal`


In [0]:
col_idx = 1
row_max = 5

In [0]:
log["data"][:row_max, :]

In [0]:
log["data"][:row_max, col_idx]

In [0]:
new_col = log["data"][:row_max, col_idx] * 2   # arbitrary cal factor 2
new_col


In [0]:
new_col.shape

`new_col` is a 1D array. Thus, `np.hstack` or `np.concatenate` can't be used!

In [0]:
new_data = np.column_stack([log["data"][:row_max, :], new_col])
new_data[:row_max, :]

## Exercise 
Modifi `apply_calibration` to make it work **out-of-place**

In [0]:
def apply_calibration(log):
    for param, cal_factor in log["params"].items():
        if "calibration factor" in param:
            sig = param.split(" ")[-1]
            col_index = log["names"].index(sig)

            ...

## Testing `apply_calibration`

In [0]:
log = parse_logfile_string_to_numpy(s)

print("data head before calibration:")
print(log["names"])
print(log["data"][:3, :])

apply_calibration(log)

print("data head after calibration:")
print(log["names"])
print(log["data"][:3, :])

# Matplotlib

In [0]:
import matplotlib.pyplot as plt

In [0]:
plt.plot(log["data"][:,0], log["data"][:,1], "o");

## Exercise
Create a function `plot` takes two inputs:
- the `log` container, and
- `sig_names`, a list of names

Make a plot incl. legend and 'not-in-names' warning message.

In [0]:
def plot(log, sig_names):
    ...
    plt.plot(x, y, "o", label="label")
    ...
    plt.grid(), plt.legend(), plt.tight_layout();

## Testing `plot`

In [0]:
plot(log, sig_names=["sig0_cal", "sig1_cal", "sig1", "sig42", "sig2_cal"])

In [0]:
plot(log, sig_names=log["names"])