## Problem 4 (*optional*) - Parsing daily temperatures

**This is an optional task for those who want more practice.**

This problem is more challenging as we provide only minimal instructions for completing the given tasks. You will need to search through the pandas documentation (and other resources) for help. We will cover data aggregation in more detail during Lesson 6, so this is a good opportunity to get a head start for next week!

In this problem, the aim is to aggregate the hourly temperature data for Helsinki Kumpula and Rovaniemi weather stations to the daily level. Currently, there are (at most) 3 measurements per hour in the data as you can see from the `YR--MODAHRMN` column (Year-Month-Day-Hour-Minute in Greenwich Mean Time (GMT):

```
    USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  28450  201705010000  31.0  NaN  NaN       -1
1  28450  201705010020  30.0  NaN  NaN       -1
2  28450  201705010050  30.0  NaN  NaN       -1
3  28450  201705010100  31.0  NaN  NaN       -1
4  28450  201705010120  30.0  NaN  NaN       -1
```

The output should contain mean, max, and min Celsius temperatures for each day (for example, one mean temperature value for the 1st of May and so on).

### What to do

- Your task is to summarize the information for each day by aggregating (grouping) the DataFrame.
- The output should be a new DataFrame where you have calculated the mean, max, and min Celsius temperatures for each day separately based on hourly values.
- Repeat the task for the two data sets you created in Problem 2 (May-August temperatures from Rovaniemi and Kumpula).

Don't forget to:

- Include useful comments in your code
- Push your solution to GitHub

### Hint

You can find help from the [Pandas Official documentation](https://pandas.pydata.org/pandas-docs/stable/) and Google. Don't hestiate to ask for tips in Discord!

In [41]:
# YOUR CODE HERE
import pandas as pd

kumpula = pd.read_csv('Kumpula_temps_May_Aug_2017.csv')
rovaniemi = pd.read_csv('Rovaniemi_temps_May_Aug_2017.csv')

# convert dates to strings, shorten strings to just YRMODA
kumpula['YRMODA'] = kumpula['YR--MODAHRMN'].astype(str).str.slice(start=0, stop=8)
rovaniemi['YRMODA'] = rovaniemi['YR--MODAHRMN'].astype(str).str.slice(start=0, stop=8)

# assign groups for each day
grouped_kumpula = kumpula.groupby('YRMODA')
grouped_rovaniemi = rovaniemi.groupby('YRMODA')

# for each day, loop through each entry and find mean, max, min
yrmoda_kumpula = []
mean_kumpula = []
max_kumpula = []
min_kumpula = []
for key, group in grouped_kumpula:
    yrmoda_kumpula.append(key)
    mean_kumpula.append(group['Celsius'].mean().round(0).astype(int))
    max_kumpula.append(group['Celsius'].max())
    min_kumpula.append(group['Celsius'].min())

yrmoda_rovaniemi = []
mean_rovaniemi = []
max_rovaniemi = []
min_rovaniemi = []
for key, group in grouped_rovaniemi:
    yrmoda_rovaniemi.append(key)
    mean_rovaniemi.append(group['Celsius'].mean().round(0).astype(int))
    max_rovaniemi.append(group['Celsius'].max())
    min_rovaniemi.append(group['Celsius'].min())

# compile results into a new dataframe
new_kumpula = pd.DataFrame()
new_kumpula['YRMODA'] = yrmoda_kumpula
new_kumpula['MEAN'] = mean_kumpula
new_kumpula['MAX'] = max_kumpula
new_kumpula['MIN'] = min_kumpula
print(new_kumpula)

new_rovaniemi = pd.DataFrame()
new_rovaniemi['YRMODA'] = yrmoda_rovaniemi
new_rovaniemi['MEAN'] = mean_rovaniemi
new_rovaniemi['MAX'] = max_rovaniemi
new_rovaniemi['MIN'] = min_rovaniemi
print(new_rovaniemi)

       YRMODA  MEAN  MAX  MIN
0    20170501     8   12    3
1    20170502    10   16    2
2    20170503     9   13    4
3    20170504     7   11    3
4    20170505    10   17    2
..        ...   ...  ...  ...
118  20170827    11   14    6
119  20170828    12   16    9
120  20170829    14   17    8
121  20170830    17   19   15
122  20170831    17   19   16

[123 rows x 4 columns]
       YRMODA  MEAN  MAX  MIN
0    20170501     2    7   -1
1    20170502     3    7    1
2    20170503     2    4   -1
3    20170504     4    9   -1
4    20170505     7   12    1
..        ...   ...  ...  ...
118  20170827     8   10    5
119  20170828     9   13    3
120  20170829    11   12    8
121  20170830    11   14    9
122  20170831    12   17    8

[123 rows x 4 columns]
