<a href="https://colab.research.google.com/github/munich-ml/MLPy2020/blob/master/17_Logfiles_w_Pandas_blank.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Logfile challenge

## setup

This following code block prepares the coding challenge:
1. clone GitHub project `MLPy2020`
1. import and execute `create_csv` helper function
1. open file `logfile.csv` and store content into variable `s`


In [0]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Get munich-ml repo from GitHub
if "MLPy2020" in os.listdir():
    !git -C MLPy2020 pull
else:
    !git clone https://github.com/munich-ml/MLPy2020/

from MLPy2020.MLPy_helper_funcs import parse_logfile_string

with open(os.path.join("MLPy2020", "datasets", "logfile.csv"), "r") as file:
    s = file.read()

    

# Reuse functions from **NumPy** session



In [0]:
def apply_calibration(log):
    for param, cal_factor in log["params"].items():
        if "calibration factor" in param:
            sig = param.split(" ")[-1]
            col_index = log["names"].index(sig)

            # append the new column with calibrated data to the right
            new_col = log["data"][:, col_index] * float(cal_factor)
            log["data"] = np.column_stack([log["data"], new_col])

            # also add the new name to the names list
            log["names"].append(sig + "_cal") 

In [0]:
def plot(log, sig_names):
    x = log["data"][:, log["names"].index("x")]
    for sig_name in sig_names:
        if sig_name in log["names"]:
            y = log["data"][:, log["names"].index(sig_name)]
            plt.plot(x, y, "o", label=sig_name)
        else:
            print("Warning, '{}' is not in 'log['names']'!".format(sig_name))
    plt.grid(), plt.legend(), plt.tight_layout()

In [0]:
def parse_logfile_string_to_numpy(s):
    """
    Wraps 'parse_logfile_string' function and converts the "data" list of lists to an NumPy array
    """
    log_dict = parse_logfile_string(s)
    log_dict["data"] = np.array(log_dict["data"])
    return log_dict

In [0]:
log = parse_logfile_string_to_numpy(s)

print("data head before calibration:")
print(log["names"])
print(log["data"][:3, :])

apply_calibration(log)

print("data head after calibration:")
print(log["names"])
print(log["data"][:3, :])

In [0]:
plot(log, sig_names=["sig0_cal", "sig1_cal", "sig1", "sig42", "sig2_cal"])

... the NumPy version still works.

Next, lets introduce **Pandas**...

# Exercise
1. Create a `parse_logfile_string_to_pandas` that returns `data` and `names` are in a `Pandas.DataFrame`
1. Rework `apply_calibration` function to accept the new `DataFrame`
1. Rework `plot` function to accept the new `DataFrame`

## Testing `parse_logfile_string_to_pandas` 

In [0]:
def parse_logfile_string_to_pandas(s):
    log = parse_logfile_string(s)
    ...
    return None

In [0]:
log = parse_logfile_string_to_pandas(s)
log["data"].head()

## Testing `apply_calibration` 

In [0]:
def apply_calibration(log):
#  predecessor NumPy version
#    for param, cal_factor in log["params"].items():
#        if "calibration factor" in param:
#            sig = param.split(" ")[-1]
#            col_index = log["names"].index(sig)
#
#            # append the new column with calibrated data to the right
#            new_col = log["data"][:, col_index] * float(cal_factor)
#            log["data"] = np.column_stack([log["data"], new_col])
#
#            # also add the new name to the names list
#            log["names"].append(sig + "_cal") 
    
    for param, cal_factor in log["params"].items():
        if "calibration factor" in param:
            sig = param.split(" ")[-1]
            ...

In [0]:
apply_calibration(log)
log["data"].head()

## Testing `plot` 

In [0]:
def plot(log, sig_names):
#  predecessor NumPy version
#    x = log["data"][:, log["names"].index("x")]
#    for sig_name in sig_names:
#        if sig_name in log["names"]:
#            y = log["data"][:, log["names"].index(sig_name)]
#            plt.plot(x, y, "o", label=sig_name)
#        else:
#            print("Warning, '{}' is not in 'log['names']'!".format(sig_name))
#    plt.grid(), plt.legend(), plt.tight_layout()

    ...
    
    plt.grid(), plt.legend(), plt.tight_layout()
 

In [0]:
plot(log, sig_names=["sig0_cal", "sig1_cal", "sig1", "sig42", "sig2_cal"])

# ToDo: Object orientation