## Generating data

Consider the following data file.

In [1]:
%cat data/measurement.txt

case dim temperature
1 1 -0.5
2 1 0.0
3 1 0.5
4 2 -0.5
5 2 0.0
6 2 0.5
7 3 -0.5
8 3 0.0
9 3 0.5


Write Python code that generates this output in the notebook.

In [2]:
print('case dim temperature')
case = 0
for dim in range(1, 4):
    for temperature in [-0.5, 0.0, 0.5]:
        case += 1
        print(f'{case} {dim} {temperature}')

case dim temperature
1 1 -0.5
2 1 0.0
3 1 0.5
4 2 -0.5
5 2 0.0
6 2 0.5
7 3 -0.5
8 3 0.0
9 3 0.5


## Lists

Start from the following list of two elements, the strings `'a'` and `'b'`.

In [3]:
l = ['a', 'b']

Find the length of the list.

In [4]:
len(l)

2

Append a new element `'c'` to the list.

In [5]:
l.append('c')

Remove the last element of the list.

In [6]:
l.pop()

'c'

Insert `'c'` at index 1 of the list.

In [7]:
l.insert(1, 'c')
l

['a', 'c', 'b']

Remove the element at index 1 from the list.

In [8]:
l.pop(1)
l

['a', 'b']

Extend the list with the list `['c', 'd']]`.

In [9]:
l.extend(['c', 'd'])
l

['a', 'b', 'c', 'd']

In [10]:
assert l == ['a', 'b', 'c', 'd'], f'got {l}'

Get the first element of the list.

In [11]:
v = l[0]
v

'a'

In [12]:
assert v == 'a', f'got {v}'

Get the second element of the list.

In [13]:
l[1]

'b'

Get the last element of the list.

In [14]:
v = l[-1]
v

'd'

In [15]:
assert v == 'd', f'got {v}'

Get the one before last element.

In [16]:
v = l[-2]
v

'c'

In [17]:
assert v == 'c', f'got {v}'

Replace the third wlement by `'?'`.

In [18]:
l[2] = '?'

In [19]:
assert l == ['a', 'b', '?', 'd'], f'got {l}'

Create a list with the elements 1 to 5.

In [20]:
numbers = list(range(1, 6))
numbers

[1, 2, 3, 4, 5]

Create a list that contains the third up to and including the fourth element of that list.

In [21]:
sub_list = numbers[2:4]
sub_list

[3, 4]

In [22]:
assert sub_list == [3, 4], f'got {sublist}'

Create a sublist that constains the first three elements of the list.

In [23]:
sub_list = numbers[:3]
sub_list

[1, 2, 3]

In [24]:
assert sub_list == [1, 2, 3], f'got {sublist}'

Create a sublist that contains all elements of the list, starting from the third.

In [25]:
sub_list = numbers[2:]
sub_list

[3, 4, 5]

In [26]:
assert sub_list == [3, 4, 5]

Create a sublist that contains every second element of the original list.

In [27]:
sub_list = numbers[::2]
sub_list

[1, 3, 5]

In [28]:
assert sub_list == [1, 3, 5], f'got {sublist}'

Create a sublist with the last up to the third element of the original list.

In [29]:
sub_list = numbers[-1:-1 - 3:-1]
sub_list

[5, 4, 3]

In [30]:
assert sub_list == [5, 4, 3], f'got {sublist}'

Create a list that is the reverse of the original list.

In [31]:
reversed_numbers = numbers[::-1]
reversed_numbers

[5, 4, 3, 2, 1]

In [32]:
assert list(reversed(numbers)) == reversed_numbers, f'got {reversed_numbers}'

Assign `'a'`, `'b'`, '`c`' to the first, third and last element of the list.

In [33]:
numbers[::2] = 'a', 'b', 'c'

In [34]:
assert numbers == ['a', 2, 'b', 4, 'c'], f'got {numbers}'

## List comprehensions

Create a list using a comprehension with the elements -1.5, -1.0, ..., 1.0, 1.5.

In [35]:
numbers = [0.5*x for x in range(-3, 4)]

In [36]:
assert numbers == [-1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5], f'got {numbers}'

## String formatting

Print the value of $\pi$ with 3 digits after the decimal dot.

In [37]:
import math

In [38]:
print(f'{math.pi:.3f}')

3.142


Print a two-column table, columns seperated by tabs.  The first column shows the numbers from 0 to 10 as floating point values with one digit (always 0 in this case) and their square root to a precision of 5 digits after the decimal dot.  For bonus points, ensure that the values are aligned on the decimal dot.

In [39]:
for x in (float(x) for x in range(11)):
    print(f'{x:>4.1f}\t{math.sqrt(x):>10.5f}')

 0.0	   0.00000
 1.0	   1.00000
 2.0	   1.41421
 3.0	   1.73205
 4.0	   2.00000
 5.0	   2.23607
 6.0	   2.44949
 7.0	   2.64575
 8.0	   2.82843
 9.0	   3.00000
10.0	   3.16228


## Modifying data

Print the file `data/measurements.txt`, replacing negative values of the temperature by 0.  You should get something similar to the output below.

In [40]:
%cat data/clipped_measurements.txt

case dim temperature
1 1 0.0000
2 1 0.0000
3 1 0.5000
4 2 0.0000
5 2 0.0000
6 2 0.5000
7 3 0.0000
8 3 0.0000
9 3 0.5000


In [41]:
with open('data/measurement.txt') as file:
    print(file.readline().strip())
    for line in file:
        case, dim, temperature = (conversion(value) for value, conversion in zip(line.strip().split(), (int, int, float)))
        if temperature < 0.0:
            temperature = 0.0
        print(f'{case} {dim} {temperature:.4f}')

case dim temperature
1 1 0.0000
2 1 0.0000
3 1 0.5000
4 2 0.0000
5 2 0.0000
6 2 0.5000
7 3 0.0000
8 3 0.0000
9 3 0.5000


## Data types

How many distinct values for 'dim' are in the file `data/measurements.txt`?  Also, print the values as a tab-separated list.

In [42]:
dims = set()
with open('data/measurement.txt') as file:
    _ = file.readline()
    for line in file:
        _, dim, _ = (conversion(value) for value, conversion in zip(line.strip().split(), (int, int, float)))
        dims.add(dim)
print(f'number dims: {len(dims)}')
print(f'dims: {"\t".join(str(dim) for dim in dims)}')

number dims: 3
dims: 1	2	3


This was likely copy/paste, write a function `parse_line` that takes a line containing data, and returns the values on that line converted to the appropriate type.

In [43]:
def parse_line(line: str) -> tuple[int, int, float]:
    return tuple(conversion(value) for value, conversion in zip(line.strip().split(), (int, int, float)))

Use the function for the previous problem.

In [44]:
dims = set()
with open('data/measurement.txt') as file:
    _ = file.readline()
    for line in file:
        _, dim, _ = parse_line(line)
        dims.add(dim)
print(f'number dims: {len(dims)}')
print(f'dims: {"\t".join(str(dim) for dim in dims)}')

number dims: 3
dims: 1	2	3


Define a named tuple to represent the output of the `parse_line` function.

In [45]:
import typing

In [46]:
class LineData(typing.NamedTuple):
    case: int
    dim: int
    temperature: float

In [49]:
def parse_line(line: str) -> LineData:
    values = line.strip().split()
    conversions = (int, int, float)
    return tuple(conversion(value) for value, conversion in zip(values, conversions))

How many times does each dimension occur in the data?

In [50]:
dim_count = {}
with open('data/measurement.txt') as file:
    _ = file.readline()
    for line in file:
        _, dim, _ = parse_line(line)
        if dim not in dim_count:
            dim_count[dim] = 0
        dim_count[dim] += 1
for dim, count in dim_count.items():
    print(f'{dim}: {count}')

1: 3
2: 3
3: 3
