<a href="https://colab.research.google.com/github/munich-ml/MLPy2020/blob/master/12_Logfile_challenge_blank.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Logfile challenge

## setup

This following code block prepares the coding challenge:
1. clone GitHub project `MLPy2020`
1. import and execute `create_csv` helper function
1. open file `logfile.csv` and store content into variable `s`


In [0]:
import os

# Get munich-ml repo from GitHub
if "MLPy2020" in os.listdir():
    !git -C MLPy2020 pull
else:
    !git clone https://github.com/munich-ml/MLPy2020/

path = os.path.join("MLPy2020", "datasets", "logfile.csv")

with open(path, "r") as file:
    s = file.read()

## The task

After executing the setup, the variable `s` should be available, holding the content of a logfile.

The task is to **parse the logfile content**.

In [0]:
print(s)

# Python built-in functions

The Python interpreter has a couple of functions build-in. They are available without any preparation. 

Complete list of built-in funcitons: https://docs.python.org/3/library/functions.html

Some prominent examples:
- `print`
- `type`
- `min`
- `len`




# `str` object

## Slicing

Single characters of the string can be accessed using their index in brakets `[index]`:




Ranges can also be indexed by `[start_index : end_index]`.
The `start_index` is included, the `end_index` not!

## `str` methods

The `str` object not only contains **data**, it also contains **functions**, called **methods**.

Split the logfile content `s` on the *new line character* `\n`

# `list` object

`list` objects also support slicing

# `for` loops

The next step in the *logfile challenge* requires looping. 

- **`for`, `in`** are keywords in Python, as well as
- **`while`**,
- and of course there are more keywords: [all Python keywords](https://docs.python.org/3/reference/lexical_analysis.html#keywords)

For a *C-style* `for` loop such as:

>`lines[0]`
>
>`lines[1]`
>
>`...`

we need an `index` variable. 


### `list` objects are iterable

Iterable objects can be looped without using an index!

## `enumerate` operator

if an index is required, use `enumerate`

# Dictionary `dict`

In [0]:
idxs = dict()

In [0]:
type(idxs)

Adding more items to the dictionary

# Table of Python containers
|Python container|list|tuple|set|dictionary|
|---|---|---|---|---|
|creation|`li = [1, 1, "Hi!"]`|`tup = (1, 1, "Hi!")`|`se = set([1, 1, "Hi!"])`|`d = {"Musk":"Elon", "Bezos":"Jeff"}`|
|print return|`[1, 1, 'Hi!']`|`(1, 1, 'Hi!')`|`{1, 'Hi!'}`|`{'Musk': 'Elon', 'Bezos': 'Jeff'}`|
|mutablility|mutable|**immutable**|mutable|mutable|
|slicing|yes, `li[0]` --> `1`|no slicing|yes, `li[0]` --> `1`|no slicing|
|primary usage|basic container with<br>`append()` method|use if data doesn't<br>change|- set operations: `union`, `diff`,<br>- get unique values|lookup table|



# Return to the 'Logfile challenge'

## Functions

A function is needs that:
- takes a `str` input
- converts the `str` to a `float`
- removes `Ohms` text
- removes `mOhms` text and divides the value by 1000

Before programming the actual functionality, it is good practivce to write the test bench first:


How to convert a `str` to a `float`?

In [0]:
float("3.33")

**Status** on the logfile challenge. We got:
- `data` in a list of lists (N x 4)
- `names` list with column names
- `params` parameter dictionary

# Complete `parse_logfile_string` function

Let's summarize the functionality we got so far, into a function `parse_logfile_string` that:
- takes the logfile string `s` as input and
- returns `params`, `names` and `data`

In [0]:
def parse_logfile_string(s):
    # split the input string on "\n" new line
    lines = s.split("\n")

    # create a look-up table of sections and line numbers
    idxs = dict()
    for lineNo, line in enumerate(lines):
        if line in ['measurements', "header"]:
            idxs[line] = lineNo 
    idxs["names"] = idxs["measurements"] + 1
    idxs["params_begin"] = idxs["header"] + 1
    idxs["params_end"] = idxs["measurements"] - 1
    idxs["data"] = idxs["names"] + 1

    # parse the column 
    names = lines[idxs["names"]].split(",")

    # parse the params_lines list(str) into params dict{param: value}
    params = dict()
    for line in lines[idxs["params_begin"] : idxs["params_end"]]:
        key, value = line.split(",")
        params[key] = value

    # converts str to float incl. "Ohms" removal
    def string_to_float(s):
        idx = s.find("Ohms")
        if idx > 0:
            number = item.split(" ")[0]
            prefix = s[idx-1]
            return float(number) * {" ": 1, "m": 0.001}[prefix]
        return float(item)

    # parse data_lines list(str) into data list(list(floats))
    data = list()
    for data_line in lines[idxs["data"] :]:
        row = list()
        for item in data_line.split(","):
            row.append(string_to_float(item))
        data.append(row)

    return {"params": params, "names": names, "data":data}

In [0]:
log = parse_logfile_string(s)
log

In [0]:
log.keys()

done!

We successfully created a **function** `parse_logfile_string(s)` that parses the input text string `s` and returns a `dict` with 
- the header parameters `params`, 
- the actual data `data`,
- the data column names `names` 

# Limitations of Python basic containers

The Python general purpose container `list` worked great for reading and appending items from file. However, for further mathematical processing it's non-ideal, as we will see,...

## Apply calibration factors

The task:
- search the `params` for `calibration factor sig?` keys
- multiply all column values of `sig?` with the respective `calibration factor`


In [0]:
log = parse_logfile_string(s)
params = log["params"]
data = log["data"]
names = log["names"]

In [0]:
for param, cal_factor in params.items():
    if "calibration factor" in param:
        sig = param.split(" ")[-1]
        print("Signal={}, cal_factor={}".format(sig, cal_factor))

Next step is to index the data: Get all rows of a specific column...

## Array slicing for visualization

Another common usage is **arbitrary array slicing** for data visualization (e.g. using `matplotlib`)
