*Edited: 2022-09-13*

# Unit 2.2 &mdash; Additional Examples

## Question 1

Write a pure function `f` which returns the minimum element and the maximum element of a list. The returned value will be a tuple.

For example `f([5, 2, 8, 3, 2, 7])` returns `(2, 8)`

### Solution

The easiest way to do this is using built-in functions!

In [None]:
def f(x):
    return min(x), max(x)

This is a perfectly acceptable solution.

As an exercise, let's think how else we could do this another way.

For example, we could sort the data, but if we do it like this:

In [None]:
def f(x):
    x.sort()
    return x[0], x[-1]

it works. But has a side-effect that the input data set gets changed.

This means our faction is **not** a pure function!

Let's try to see:

In [None]:
data = [1, 100, 1, 4, -8, 2, 9, 19, 10]

print(f(data))
print(data) # check to make sure it hasn't chnaged the list - sadly it has!

We could use a non-mutating sort and variable reassignment. Which fixes this problem.

In [None]:
def f(x):
    x = sorted(x)
    return x[0], x[-1]

We also discussed how we could do this manually, with iteration.

We would need to have a variable which stores the state of our search `minimum` for looking for the minimum and `maximum` for looking for the maximum, and update it as we look at each element.

In [None]:
def f(x):
    
    smallest = None # to start with we don't have a smalles tor biggest element
    biggest = None
    
    for e in data:
        
        # have we found a new smallest element?
        if smallest is None or e < smallest:
            smallest = e
        
        # have we found a new biggest element?
        if biggest is None or e > biggest:
            biggest = e
            
    return smallest, biggest

In [None]:
data = [1, 100, 1, 4, -8, 2, 9, 19, 10]

print(f(data))
print(data) # check to make sure it hasn't chnaged the list - it hasn't

## Question 2

Write a pure function `evens` which takes a list and returns the even elements as a new list.

### Solution

In the previous example the starting value of our iteration was `None` - a placeholder for not having any known max or min items yet.

This uses the same principal of a starting value. Our starting value is `[]` (an empty list with no items yet) and we will append to it.

The starting value should typically be the value which we would like to return if there was no data.

In [None]:
def evens(x):
    even = []
    for e in x:
        if e % 2 == 0:
            even.append(e)
    return even

In [None]:
data = [1, 100, 1, 4, -8, 2, 9, 19, 10]

result = evens(data)

result

We could actually do this using the `filter` operation, you may wish to try this by filtering instead of using an `if` statement.

In [None]:
def is_even(e):
    return e % 2 == 0

def evens(x):
    even = []
    for e in filter(is_even, x):
        even.append(e)
    return even

In [None]:
data = [1, 100, 1, 4, -8, 2, 9, 19, 10]

result = evens(data)

result

Since the boolean function `is_even` is a one-line pure function only used in one place we could use a lambda expression instead.

To use a lambda, we out the logic `e % 2 == 0` into the lambda like `lambda e: e % 2 == 0` and use this in place of the function reference.

If this is a little hard to follow, it's still okay to use the above version with an explicitly named function.

In [None]:
def evens(x):
    even = []
    for e in filter(lambda e: e % 2 == 0, x):
        even.append(e)
    return even

In [None]:
data = [1, 100, 1, 4, -8, 2, 9, 19, 10]

result = evens(data)

result

Since our logic is only doing a filter on a collection and nothing else, we can just cast the result of the filter to a list and return that!

In [None]:
def evens(x):
    return list(filter(lambda e: e % 2 == 0, x))

In [None]:
data = [1, 100, 1, 4, -8, 2, 9, 19, 10]

result = evens(data)

result

We have a solution which is a lot more compact, though perhaps a little harder to follow if the reader isn't familiar with filter and lambdas.

## Question 3

We have a slightly difficult file which is some HTML. We want to extract a list of people from `html.txt`. Here is the file contents:

    <table style="width:100%">
      <tr>
        <th>Firstname</th>
        <th>Lastname</th> 
        <th>Age</th>
      </tr>
      <tr>
        <td>Jill</td>
        <td>Smith</td> 
        <td>50</td>
      </tr>
      <tr>
        <td>Eve</td>
        <td>Jackson</td> 
        <td>94</td>
      </tr>
    </table>

Write a program to extract the people as a `dict` mapping names to ages e.g. `{'Jill Smith' : 50, 'Eve Jackson' : 94}`.

### Solution

We know we have to start with what we would return if there were no data `{}` and build up from there.

Since knowing whether we have a first name, last name, or age is determine on how many lines in we are from the start of a `<tr>` block, we will keep a counter called `counter`. This will be set to `0` when we see a `<tr>` or increased otherwise

In [None]:
people = {}

counter = 0

first = None
last = None
age = None

with open('html.txt') as file:

    for line in file:
        
        line = line.strip()
        
        # counter resets at the start of each <tr> block
        if line == '<tr>':
            counter = 0
        else:
            counter += 1
        
        # found an element of data
        if line.startswith('<td>'):
            
            if counter == 1:
                first = line[4:-5]
            elif counter == 2:
                last = line[4:-5]
            elif counter == 3:
                age = int(line[4:-5])
                # if we have all three items, add to the dict
                people[f'{first} {last}'] = age

In [None]:
people

This code could still use some tidying, for example, the logic of stripping off `<td>` and `</td>` is repeated three times and is rather obscurely implemented as `[4:-5]`.

A good thing to do would be to pull out a function, which we'll call `striptd`.

In [None]:
def striptd(line):
    # removes <td></td>
    return line[4:-5]

people = {}

counter = 0

first = None
last = None
age = None

with open('html.txt') as file:
    
    for line in file:
        
        line = line.strip()
        
        # counter resets at the start of each <tr> block
        if line == '<tr>':
            counter = 0
        else:
            counter += 1
            
        # found an element of data
        if line.startswith('<td>'):
            
            if counter == 1:
                first = striptd(line)
            elif counter == 2:
                last = striptd(line)
            elif counter == 3:
                age = int(striptd(line))
                # if we have all three items, add to the dict
                people[f'{first} {last}'] = age

In [None]:
people

Can you think of any other ways to make this code clearer?

When you are processing a messy data file, you need to look at the file, and make some assumptions about its structure.

Then write code to deal with each part of the problem.