## Problem 4 (*optional*) - Parsing daily temperatures

**This is an optional task for those who want more practice.**

This problem is more challenging as we provide only minimal instructions for completing the given tasks. You will need to search through the pandas documentation (and other resources) for help. We will cover data aggregation in more detail during Lesson 6, so this is a good opportunity to get a head start for next week!

In this problem, the aim is to aggregate the hourly temperature data for Helsinki Kumpula and Rovaniemi weather stations to the daily level. Currently, there are (at most) 3 measurements per hour in the data as you can see from the `YR--MODAHRMN` column (Year-Month-Day-Hour-Minute in Greenwich Mean Time (GMT):

```
    USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  28450  201705010000  31.0  NaN  NaN       -1
1  28450  201705010020  30.0  NaN  NaN       -1
2  28450  201705010050  30.0  NaN  NaN       -1
3  28450  201705010100  31.0  NaN  NaN       -1
4  28450  201705010120  30.0  NaN  NaN       -1
```

The output should contain mean, max, and min Celsius temperatures for each day (for example, one mean temperature value for the 1st of May and so on).

### What to do

- Your task is to summarize the information for each day by aggregating (grouping) the DataFrame.
- The output should be a new DataFrame where you have calculated the mean, max, and min Celsius temperatures for each day separately based on hourly values.
- Repeat the task for the two data sets you created in Problem 2 (May-August temperatures from Rovaniemi and Kumpula).

Don't forget to:

- Include useful comments in your code
- Push your solution to GitHub

### Hint

You can find help from the [Pandas Official documentation](https://pandas.pydata.org/pandas-docs/stable/) and Google. Don't hestiate to ask for tips in Slack!

In [1]:
import pandas as pd

In [2]:
# Import May-August temperature data for Kumpula and Rovaniemi
## Kumpula
### File path
Kumpula_may_aug_fp = r"Kumpula_temps_May_Aug_2017.csv"
### Kumpula data
Kumpula_may_aug = pd.read_csv(Kumpula_may_aug_fp)

## Rovaniemi
### File path
Rovaniemi_may_aug_fp = r"Rovaniemi_temps_May_Aug_2017.csv"
### Rovaniemi data
Rovaniemi_may_aug = pd.read_csv(Rovaniemi_may_aug_fp)


In [3]:
# Check Kumpula data
Kumpula_may_aug.head()

Unnamed: 0.1,Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN,CELSIUS
0,8770,29980,201705010000,37.0,,,3
1,8771,29980,201705010100,37.0,,,3
2,8772,29980,201705010200,37.0,,,3
3,8773,29980,201705010300,37.0,,,3
4,8774,29980,201705010400,39.0,,,4


In [4]:
Kumpula_may_aug.tail()

Unnamed: 0.1,Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN,CELSIUS
2919,11689,29980,201708311900,64.0,,,18
2920,11690,29980,201708312000,64.0,,,18
2921,11691,29980,201708312100,64.0,,,18
2922,11692,29980,201708312200,64.0,,,18
2923,11693,29980,201708312300,64.0,,,18


In [5]:
# Check Rovaniemi data
Rovaniemi_may_aug.head()

Unnamed: 0.1,Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN,CELSIUS
0,0,28450,201705010000,31.0,,,-1
1,1,28450,201705010020,30.0,,,-1
2,2,28450,201705010050,30.0,,,-1
3,3,28450,201705010100,31.0,,,-1
4,4,28450,201705010120,30.0,,,-1


In [6]:
Rovaniemi_may_aug.tail()

Unnamed: 0.1,Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN,CELSIUS
8762,8765,28450,201708312220,46.0,,,8
8763,8766,28450,201708312250,46.0,,,8
8764,8767,28450,201708312300,48.0,,,9
8765,8768,28450,201708312320,46.0,,,8
8766,8769,28450,201708312350,48.0,,,9


In [7]:
# Convert YR--MODAHRMN to string
## Kumpula 
Kumpula_may_aug['YR--MODAHRMN_STR'] = Kumpula_may_aug['YR--MODAHRMN'].astype(str)
Kumpula_may_aug.head()

Unnamed: 0.1,Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN,CELSIUS,YR--MODAHRMN_STR
0,8770,29980,201705010000,37.0,,,3,201705010000
1,8771,29980,201705010100,37.0,,,3,201705010100
2,8772,29980,201705010200,37.0,,,3,201705010200
3,8773,29980,201705010300,37.0,,,3,201705010300
4,8774,29980,201705010400,39.0,,,4,201705010400


In [8]:
# Select the month and day for the YR--MODAHRMN_STR
Kumpula_may_aug['Month_day'] = Kumpula_may_aug['YR--MODAHRMN_STR'].str.slice(start=4, stop=8)

# Preview outcome
Kumpula_may_aug.head()

Unnamed: 0.1,Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN,CELSIUS,YR--MODAHRMN_STR,Month_day
0,8770,29980,201705010000,37.0,,,3,201705010000,501
1,8771,29980,201705010100,37.0,,,3,201705010100,501
2,8772,29980,201705010200,37.0,,,3,201705010200,501
3,8773,29980,201705010300,37.0,,,3,201705010300,501
4,8774,29980,201705010400,39.0,,,4,201705010400,501


In [9]:
# Group Kumpula data
Kumpula_grouped_MODA = Kumpula_may_aug.groupby('Month_day')

In [10]:
type(Kumpula_grouped_MODA)

pandas.core.groupby.generic.DataFrameGroupBy

In [11]:
len(Kumpula_grouped_MODA)

123

In [12]:
Kumpula_grouped_MODA.groups.keys()

dict_keys(['0501', '0502', '0503', '0504', '0505', '0506', '0507', '0508', '0509', '0510', '0511', '0512', '0513', '0514', '0515', '0516', '0517', '0518', '0519', '0520', '0521', '0522', '0523', '0524', '0525', '0526', '0527', '0528', '0529', '0530', '0531', '0601', '0602', '0603', '0604', '0605', '0606', '0607', '0608', '0609', '0610', '0611', '0612', '0613', '0614', '0615', '0616', '0617', '0618', '0619', '0620', '0621', '0622', '0623', '0624', '0625', '0626', '0627', '0628', '0629', '0630', '0701', '0702', '0703', '0704', '0705', '0706', '0707', '0708', '0709', '0710', '0711', '0712', '0713', '0714', '0715', '0716', '0717', '0718', '0719', '0720', '0721', '0722', '0723', '0724', '0725', '0726', '0727', '0728', '0729', '0730', '0731', '0801', '0802', '0803', '0804', '0805', '0806', '0807', '0808', '0809', '0810', '0811', '0812', '0813', '0814', '0815', '0816', '0817', '0818', '0819', '0820', '0821', '0822', '0823', '0824', '0825', '0826', '0827', '0828', '0829', '0830', '0831'])

In [13]:
# Create a dataframe with the mean, min and max of the Celsius column in the grouped Kumpula data
Kumpula_grouped_MODA.agg(
    CELSIUS_MEAN=pd.NamedAgg(column='CELSIUS', aggfunc='mean'),
    CELSIUS_MIN=pd.NamedAgg(column='CELSIUS', aggfunc='min'),
    CELSIUS_MAX=pd.NamedAgg(column='CELSIUS', aggfunc='max'),
)

Unnamed: 0_level_0,CELSIUS_MEAN,CELSIUS_MIN,CELSIUS_MAX
Month_day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0501,7.625000,3,12
0502,9.750000,2,16
0503,9.208333,4,13
0504,6.666667,3,11
0505,10.250000,2,17
...,...,...,...
0827,10.625000,6,14
0828,11.826087,9,16
0829,14.500000,8,17
0830,16.833333,15,19


In [14]:
## Rovaniemi
Rovaniemi_may_aug['YR--MODAHRMN_STR'] = Rovaniemi_may_aug['YR--MODAHRMN'].astype(str)
Rovaniemi_may_aug.head()

Unnamed: 0.1,Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN,CELSIUS,YR--MODAHRMN_STR
0,0,28450,201705010000,31.0,,,-1,201705010000
1,1,28450,201705010020,30.0,,,-1,201705010020
2,2,28450,201705010050,30.0,,,-1,201705010050
3,3,28450,201705010100,31.0,,,-1,201705010100
4,4,28450,201705010120,30.0,,,-1,201705010120


In [15]:
# Select the month and day for the YR--MODAHRMN_STR
Rovaniemi_may_aug['Month_day'] = Rovaniemi_may_aug['YR--MODAHRMN_STR'].str.slice(start=4, stop=8)

# Preview outcome
Rovaniemi_may_aug.head()

Unnamed: 0.1,Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN,CELSIUS,YR--MODAHRMN_STR,Month_day
0,0,28450,201705010000,31.0,,,-1,201705010000,501
1,1,28450,201705010020,30.0,,,-1,201705010020,501
2,2,28450,201705010050,30.0,,,-1,201705010050,501
3,3,28450,201705010100,31.0,,,-1,201705010100,501
4,4,28450,201705010120,30.0,,,-1,201705010120,501


In [16]:
# Group Kumpula data
Rovaniemi_grouped_MODA = Rovaniemi_may_aug.groupby('Month_day')

In [17]:
len(Rovaniemi_grouped_MODA)

123

In [18]:
# Create a dataframe with the mean, min and max of the Celsius column in the grouped Rovaniemi data
Rovaniemi_may_aug.agg(
    CELSIUS_MEAN=('CELSIUS', 'mean'),
    CELSIUS_MIN=('CELSIUS', 'min'),
    CELSIUS_MAX=('CELSIUS', 'max'),
)

Unnamed: 0,CELSIUS
CELSIUS_MEAN,10.351317
CELSIUS_MIN,-7.0
CELSIUS_MAX,24.0
