# Python Practice 1: Solutions

This notebook contains solutions to the practice problems in the notebook [Algorithms: Thinking in Code](https://github.com/gwu-libraries/gwlibraries-workshops/blob/master/python-practice/python_practice_1.ipynb).

## Problem 1: Finding Palindromes

<details>
    <summary><h3>Click for a solution</h3></summary>
<pre>
<code>
def is_palindrome(p):
    return p == p[::-1]
</code>
</pre>
</details>

## Problem 2: Run-Length Encoding

<details>
    <summary><h3>Click for a solution</h3></summary>
<pre>
<code>
def make_rle(sequence):
    rle = ""
    if len(sequence) == 1:
        return sequence + "1"
    count = 1
    for char, next_char in zip(sequence[:-1], sequence[1:]):
        if char == next_char:
            count += 1
        else:
            rle += f"{char}{count}"
            count = 1
    rle += f"{sequence[-1]}{count}"
    return rle
</code>
</pre>
</details>

<details>
    <summary><h3>Click for another solution</h3></summary>
<pre>
<code>
from itertools import groupby, chain
def make_rle_g(sequence):
    rle = [f"{g[0]}{len(list(g[1]))}" for g in groupby(sequence)]
    return "".join(rle)
</code>
</pre>
</details>

## Problem 3: Aggregating by Groups

<details>
    <summary><h3>Click for a solution</h3></summary>
<pre>
<code>
def average_by_group(scores):
    score_dict = {}
    for score in scores:
        score_n = int(score)
        if score_n in score_dict:
            score_dict[score_n].append(float(score))
        else:
            score_dict[score_n] = [float(score)]
    averages = []
    for score_list in score_dict.values():
        averages.append(sum(score_list) / len(score_list))
    return sorted(averages)
</code>
</pre>
</details>

<details>
    <summary><h3>Click for another solution</h3></summary>
<pre>
<code>
# Same as above but with a single loop
def average_by_group(scores):
    score_dict = {}
    for score in scores:
        score_n = int(score)
        if score_n in score_dict:
            previous_avg, n = score_dict[score_n]
            score_dict[score_n] = ((previous_avg * n + float(score)) / (n + 1), n + 1)
        else:
            score_dict[score_n] = (float(score), 1)
    return sorted([v[0] for v in score_dict.values()])
</code>
</pre>
</details>

<details>
    <summary><h3>Click for yet another solution</h3></summary>
<pre>
<code>
def average_by_group(scores):
    scores = [float(score) for score in scores]
    scores = sorted(scores)
    avgs = {score: list(score_group) for (score, score_group) in groupby(scores, key=int)}
    return [sum(score_list)/len(score_list) for score_list in avgs.values()]
</code>
</pre>
</details>

## Problem 4: Rolling Weather Windows

We'll talk through this solution step by step before we write out a single function

### Step 1 

After retrieving the CSV file from the Internet and reading it into a Python data structure, our first step is to sort it.

In [4]:
from csv import DictReader
from urllib.request import urlretrieve
urlretrieve('https://corgis-edu.github.io/corgis/datasets/csv/weather/weather.csv', './weather.csv')
with open('./weather.csv') as f:
    reader = DictReader(f)
    weather_data = [r for r in reader]

We can use the optional `key` argument to the `sorted` function to tell the function which data points we want to sort on.

When sorting a Python list of dictionaries, it's common to use a **lambda function** as the `key` argument. 

A lambda is basically a one-line function without a name. 

To create a lambda, you write the word `lambda`, followed by any arguments the function should accept, followed by a colon, followed by the body of the function. The lambda function will return the result of whatever happens to the right of the colon.

In sorting our dataset, we write a lambda function to extract two data points: the weather station location, and the date of the observation. Since we want to compute the rolling mean _for each weather station_, we'll need to sort the list by station first and then by date.

We could do this in two steps, by calling `sorted` twice. But since the station name and the date are both already Python strings and are sortable, we can combine them into a single sort key, using an `f` string. That appraoch is shown below.

In [8]:
weather_data = sorted(weather_data, key=lambda w: f"{w['Station.Location']}{w['Date.Full']}")

Let's test to make sure we've sorted the list correctly.

In [None]:
for w in weather_data:
    print(f"{w['Station.Location']} {w['Date.Full']}")

## Step 2

Now that the data has been sorted, we can compute the rolling average.

How do we approach this? Well, since we want to see the rolling averages _for each weather station_ (as opposed to for the whole country), grouping by weather station is probably a good start. 

We can use the `groupby` function to do this, using a lambda function as the `key` argument, just as we did with `sorted`. This time, our lambda function is only picking out the value of the `Station.Location` element, since we do _not_ want to group by date. (As a rule of thumb, with rolling average, you don't usually want to group by the time series element.)

Let's test this approach first to make sure it works. In the code below, we inspect the first group obtained by grouping the sorted data by `Station.Location`. The `itertools.groupby` function actually returns a Python iterator, which is a data structure that is meant to be consumed within a loop. In order to get just the first element, however, we can use the `next` function, which simple advances the iterator by one iteration. 

Then we can inspect the two parts of the `groupby` result for the first iteration: the group key, which in this case should be `Aberdeen, SD`, and the group itself, which should be a list of all the records corresponding to that weather station.

In [None]:
from itertools import groupby
g = next(groupby(weather_data, key=lambda w: w["Station.Location"]))
print(g[0])
list(g[1])

With confidence that our `groupby` expression will work, we can now use it in a `for` loop to calculate the rolling average for each group.

To do the latter, we need to loop over each record in the group, which corresponds to one week's average temperature. We know that the data are already sorted by date (and by definition, `groupby` doesn't change the sort order), so we can define the rolling average for each date as the average of the **current** average temperature value and all of the previous values (for that group). 

With one exception: the first data point doesn't have any previous values, so its rolling average will be the same as that date's average temp.

The field that contains the temperature data is called `Data.Temperature.Avg Temp`. We will create a new field to hold the rolling average, called `Data.Temperature.Rolling Temp`. 

And we'll use Python's `enumerate` function to keep track of how many data points need to be averaged at any given time. That will allow us to use the previous value for the rolling average to compute the current value (without having to sum all the previous temperature values every time). 

As in the solution to Problem 3, we're exploiting the fact that if `x` represents the average of some `n` numbers (`n1`, `n2`, `n3`, etc.), then to create a new average, `x1`, out of the previous average plus a new number, `n4`, we use the following formula:

```
x1 = ((x*n) + n4) / (n+1)
```

Finally, we `append` each result as we compute it to a new list, `weather_rolling`. That way, we can easily access the previous rolling average for our computations: it will be the last element in the `weather_rolling` list.

In [15]:
weather_rolling = []
for _, group in groupby(weather_data, key=lambda w: w["Station.Location"]):
    for i, record in enumerate(group):
        if i == 0:
            record['Data.Temperature.Rolling Temp'] = float(record['Data.Temperature.Avg Temp'])
        else:
            record['Data.Temperature.Rolling Temp'] = (weather_rolling[-1]['Data.Temperature.Rolling Temp'] * i\
                                                        + float(record['Data.Temperature.Avg Temp'])) / (i+1)
        weather_rolling.append(record)