<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /></a><div align="center">This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</div>

---

## Exercise 6.A

Try loading file `values2.txt` with the `load_data()` function from Exercise 5.D -- what exception does Python raise?

Write a new `load_data2()` function that works just like `load_data()` but in addition *ignores* any line that does not contain an integer number.

In [1]:
def load_data2(filename):
    result = []
    with open(filename) as stream:
        for line in stream:
            try:
                num = int(line)
            except ValueError:
                # ignore line
                continue
            result.append(num)
    return result

If your solution is correct, evaluating the following cell should produce `[299850, 300070, 299930]` as output:

In [2]:
load_data2('values2.txt')

[299850, 300070, 299930]

### Bonus points

Re-write `load_data2()` so that it has exactly the same output of `load_data()`, i.e. minimize the number of rejected input lines.

From inspection of file `values2.txt`, or from tracing the `load_data2()` code, we can see see that the erroring input lines are of two kinds:

1. Lines that contain a *floating-point* number (e.g., `299740.0`)
2. Lines that contain non-numeric output.

One simple way of minimizing errors is thus to first convert to floating-point, and *then* convert to integer:

In [3]:
def load_data3(filename):
    result = []
    with open(filename) as stream:
        for line in stream:
            try:
                num = int(float(line))
            except ValueError:
                # ignore line
                continue
            result.append(num)
    return result

If the code is correct, the output will now be the same as for the `load_data()` function of exercise 4.A:

In [4]:
load_data3('values2.txt')

[299850, 299740, 299900, 300070, 299930]

----

## Exercise 6.B

Write a function `read_csv(p)` which reads a CSV
(*Comma-Separated Values*) file and returns a list of all
rows in it.  A *row* will be represented as a Python list of
(string) items.

In [5]:
def read_csv(path):
    with open(path, 'r') as input_file:
        rows = []
        for line in input_file:
            parts = line.split(',')
            # remove spaces before and after any comma
            # (to be conformant to CSV usage, we should also remove quotes)
            row = []
            for item in parts:
                row.append(item.strip())
            # commit row
            rows.append(row)
        return rows

If your solution is correct, evaluating the following cell should produce a Python list, whose first item is the list `['NAME', 'EXITCODE', 'MEM', 'WALLTIME', 'CPUTIME']`, the second is the list `['check_image_setmake_batch_folder', '0', '0', '0', '0.004']`, etc.

In [6]:
read_csv('data.csv')

[['NAME', 'EXITCODE', 'MEM', 'WALLTIME', 'CPUTIME'],
 ['check_image_set-make_batch_folder', '0', '0', '0', '0.004'],
 ['check_image_set-check_image_set', '0', '0', '5', '0.836'],
 ['convert_tiff_to_png-clean_up', '0', '0', '1', '0.072'],
 ['convert_tiff_to_png-create_new_batch', '0', '0', '4', '0.488'],
 ['convert_tiff_to_png-convert_PNGs_listfiles', '0', '0', '1', '0.176'],
 ['convert_tiff_to_png-convert_PNGs_run__p0', '0', '0', '0', '0'],
 ['illum_corr-prepare_batches', '0', '0', '19', '13.96'],
 ['illum_corr-illumination_statistics_listfiles', '0', '0', '0', '0.14'],
 ['illum_corr-illumination_statistics_run__p0',
  '0',
  '3048',
  '1475',
  '1197.56']]

### Advanced

Make `read_csv()` into a *generator* that iterates over rows.

In [7]:
def read_csv(path):
    with open(path, 'r') as input_file:
        for line in input_file:
            parts = line.split(',')
            # remove spaces before and after any comma
            # (to be conformant to CSV usage, we should also remove quotes)
            row = []
            for item in parts:
                row.append(item.strip())
            # commit row
            yield row

If your solution is correct, evaluating the two cells below should produce the list `['NAME', 'EXITCODE', 'MEM', 'WALLTIME', 'CPUTIME']` in the first case, and list `['check_image_setmake_batch_folder', '0', '0', '0', '0.004']` in the second.

In [8]:
g = read_csv('data.csv')

next(g)

['NAME', 'EXITCODE', 'MEM', 'WALLTIME', 'CPUTIME']

In [9]:
next(g)

['check_image_set-make_batch_folder', '0', '0', '0', '0.004']

### More advanced

How would you modify `read_csv()` so that it is possible to specify what
types the CSV file's columns are?  Can you implement it so that a
row is a list of items of the right type (i.e., not all strings)?

A simple way is to add a parameter `converters`, which is a list of functions to use to convert each field in a row from a string (as read from the file) to the right Python type. Some rows could however be malformed (e.g., they contain less fields, or some fields fail to convert), so we should decide what to do in this case: ignore, or raise an error?

In [10]:
def read_csv(path, converters):
    with open(path, 'r') as input_file:
        for line in input_file:
            parts = line.split(',')
            # remove spaces before and after any comma
            # (to be conformant to CSV usage, we should also remove quotes)
            row = []
            if len(parts) != len(converters):
                # error!
                raise RuntimeError("Number of items in line does not match number of converters")
            for i in range(len(parts)):
                conv = converters[i]
                row.append(conv(item.strip()))
            # commit row
            yield row