# Exercise 6: Weather anomalies (10 points)

The aim of this exercise is to analyze historical weather data. In Problem 1 you read in a tricky data file and explore it's contents. In problem 2, you will convert and aggregate the data from daily temperatures in Fahrenheit, to monthly average temperatures in Celsius. In Problem 3, you will finally analyze weather anomalies by comparing monthly average temperatures to a long-term average.

### Tips for completing this exercise

- Use **exactly** the same variable names as in the instructions because your answers will be automatically graded, and the tests that grade your answers rely on following the same formatting or variable naming as in the instructions.
- **Please do not**:

    - **Change the file names**. Do all of your editing in the provided `Exercise-6-problems-1-3.ipynb` file (this file).
    - **Copy/paste cells in this notebook**. We use an automated grading system that will fail if there are copies of code cells.
    - **Change the existing cell types**. You can add cells, but changing the cell types for existing cells (from code to markdown, for example) will also cause the automated grader to fail. 

## Problem 1 - Reading in a tricky data file (2 points)

You first task for this exercise is to read in the data file [data/1091402.txt](data/1091402.txt) to a variable called `data`. Pay attention to the input data structure and no data values.

### Scores for this problem

**Your score on this problem will be based on following criteria:**

- Reading the data into a variable called `data` using pandas
    - Skipping the second row of the datafile that contains `----------` characters that don't belong to the data
    - Convert the no-data values (`-9999`) into `NaN` 
- Calculating basic statistics from the data
- Including comments that explain what most lines in the code do

### Part 1 (1 point)

You should start by loading the data file.

- Read the data file into variable the variable `data`
    - Skip the second row
    - Convert the no-data values (`-9999`) into `NaN`

In [1]:
import pandas as pd


# Name of the file to be read
filename = r"data/1091402.txt"

# Reading the data and saving it to the variable, data
data = pd.read_csv(filename,
                  delim_whitespace=True,
                  skiprows=[1],
                  na_values=[-9999])


In [2]:
# Check that the dataframe looks ok:
data.head()

Unnamed: 0,STATION,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TAVG,TMAX,TMIN
0,GHCND:FIE00142080,51,60.3269,24.9603,19520101,0.31,37.0,39.0,34.0
1,GHCND:FIE00142080,51,60.3269,24.9603,19520102,,35.0,37.0,34.0
2,GHCND:FIE00142080,51,60.3269,24.9603,19520103,0.14,33.0,36.0,
3,GHCND:FIE00142080,51,60.3269,24.9603,19520104,0.05,29.0,30.0,25.0
4,GHCND:FIE00142080,51,60.3269,24.9603,19520105,0.06,27.0,30.0,25.0


In [3]:
# Check the last rows of the data (there should be some NaN values)
data.tail()

Unnamed: 0,STATION,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TAVG,TMAX,TMIN
23711,GHCND:FIE00142080,51,60.3269,24.9603,20170930,,47.0,49.0,44.0
23712,GHCND:FIE00142080,51,60.3269,24.9603,20171001,0.04,47.0,48.0,45.0
23713,GHCND:FIE00142080,51,60.3269,24.9603,20171002,,47.0,49.0,46.0
23714,GHCND:FIE00142080,51,60.3269,24.9603,20171003,0.94,47.0,,44.0
23715,GHCND:FIE00142080,51,60.3269,24.9603,20171004,0.51,52.0,56.0,


In [4]:
#data['TAVG_FAHR'].isna().sum()

### Part 2 (1 point)

In this section, you will calculate some basic statistics of the input data.

- Calculate how many no-data (NaN) values there are in the `TAVG` column
    - Assign your answer to a variable called `tavg_nodata_count`

In [5]:
# The number of no-data values in the TAVG column
tavg_nodata_count = data['TAVG'].isna().sum()


In [6]:
# Print out the solution:
print(f'Number of no-data values in column "TAVG": {tavg_nodata_count}')

Number of no-data values in column "TAVG": 3308


- Calculate how many no-data (NaN) values there are for the `TMIN` column
    - Assign your answer into a variable called `tmin_nodata_count`

In [7]:
# The number of no-data values in the TMIN column
tmin_nodata_count = data['TMIN'].isna().sum()

In [8]:
# Print out the solution:
print(f'Number of no-data values in column "TMIN": {tmin_nodata_count}')

Number of no-data values in column "TMIN": 365


- Calculate the total number of days covered by this data file
    - Assign your answer into a variable called `day_count`

In [9]:
# The total number of days found in the data
day_count = len(data)


In [10]:
# Print out the solution:
print(f'Number of days: {day_count}')

Number of days: 23716


- Find the date of the oldest (first) observation
    - Assign your answer into a variable called `first_obs`

In [11]:
# The date of the oldest observation
first_obs = pd.to_datetime(data.at[0,'DATE'].astype(str))

In [12]:
# Print out the solution:
print(f'Date of the first observation: {first_obs}')

Date of the first observation: 1952-01-01 00:00:00


- Find the date of the most recent (last) observation
    - Assign your answer into a variable called `last_obs`

In [13]:
# The date of the last observation
last_obs = pd.to_datetime(data.at[len(data)-1,'DATE'].astype(str))


In [14]:
# Print out the solution:
print(f'Date of the last observation: {last_obs}')


Date of the last observation: 2017-10-04 00:00:00


- Find the average temperature for the whole data file (all observtions) from column `TAVG`
    - Assign your answer into a variable called `avg_temp`

In [15]:
# Average temperature for the whole data
avg_temp = data['TAVG'].mean()

In [16]:
# Print out the solution:
print(f'Average temperature (F) for the whole dataset: {round(avg_temp, 2)}')

Average temperature (F) for the whole dataset: 41.32


- Find the average `TMAX` temperature over the Summer of 1969 (months May, June, July, and August of the year 1969)
    - Assign your answer into a variable called `avg_temp_1969`

In [17]:
# The average TMAX temperature over the Summer of 1969
TMAX_summer_1969 = data['TMAX'].loc[(data['DATE'] >= 19690501) & (data['DATE'] <= 19690831)]
avg_temp_1969 = TMAX_summer_1969.mean()
data_summer_1969 = data.loc[(data['DATE'] >= 19690501) & (data['DATE'] <= 19690831)]
data_summer_1969

Unnamed: 0,STATION,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TAVG,TMAX,TMIN
6054,GHCND:FIE00142080,51,60.3269,24.9603,19690501,0.00,,41.0,33.0
6055,GHCND:FIE00142080,51,60.3269,24.9603,19690502,0.00,,48.0,31.0
6056,GHCND:FIE00142080,51,60.3269,24.9603,19690503,0.00,,44.0,27.0
6057,GHCND:FIE00142080,51,60.3269,24.9603,19690504,0.00,,48.0,29.0
6058,GHCND:FIE00142080,51,60.3269,24.9603,19690505,0.00,,55.0,31.0
...,...,...,...,...,...,...,...,...,...
6172,GHCND:FIE00142080,51,60.3269,24.9603,19690827,0.10,,64.0,54.0
6173,GHCND:FIE00142080,51,60.3269,24.9603,19690828,0.00,,66.0,52.0
6174,GHCND:FIE00142080,51,60.3269,24.9603,19690829,0.03,,68.0,50.0
6175,GHCND:FIE00142080,51,60.3269,24.9603,19690830,0.27,,64.0,57.0


In [18]:
# This test print should print a number
print(f'Average temperature (F) for the Summer of 69: {round(avg_temp_1969, 2)}')


Average temperature (F) for the Summer of 69: 67.82


## Problem 2 - Calculating monthly average temperatures (3 points)

For this problem your goal is to calculate monthly average temperatures in degrees Celsius from the daily Fahrenheit values we have in the data file. You can continue working with the same DataFrame that you used in Problem 1.

### Scores for this problem

**Your score on this problem will be based on following criteria:**

- Calculating the monthly average temperatures in degrees Celsius for the each month in the dataset (i.e., for each month of each year)
    - You should store the monthly average temperatures in a new Pandas DataFrame called `monthly_data`
    - `monthly_data` should contain a new column called `temp_celsius` the monthly average temperatures in Celsius
    - Convert the `TAVG` values in Fahrenheit into Celsius and store the output in the `temp_celsius`
- Including comments that explain what most lines in the code do

*Hint: you can start by creating a new column with a label for each month and then continue grouping the data based on this information.*

In [19]:
def fahr_to_cels(temp_fahr):
    """Function to convert a temperature in Fahrenheit to Celsius
    
    Parameters
    ----------
    temp_fahr: int|float
        A temperature in Fahrenheit
        
    Returns
    -------
    temp_celsius: float
        A temperature in Celsius
    """
    temp_celsius = (temp_fahr - 32) / 1.8
    
    return temp_celsius

In [20]:
# Data checkpoint
data_chkPt = data

In [21]:
# Rename the TAVG column to TAVG_FAHR
new_col_name = {'TAVG':'TAVG_FAHR'}
data = data.rename(columns=new_col_name)

# Add a new column containing the celsius values of TAVG
data['temp_celsius'] = data['TAVG_FAHR'].apply(fahr_to_cels)

# Add a new column containing the string form of the date column
data['DATE_STR'] = data['DATE'].astype(str)

# Add a new column containing only year and month
data['YEAR_MONTH'] = data['DATE_STR'].str.slice(start=0, stop=6)

# Add a new column containing only year
data['YEAR'] = data['DATE_STR'].str.slice(start=0, stop=4)

# Add a new column containing only month
data['MONTH'] = data['DATE_STR'].str.slice(start=4, stop=6)

data

Unnamed: 0,STATION,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TAVG_FAHR,TMAX,TMIN,temp_celsius,DATE_STR,YEAR_MONTH,YEAR,MONTH
0,GHCND:FIE00142080,51,60.3269,24.9603,19520101,0.31,37.0,39.0,34.0,2.777778,19520101,195201,1952,01
1,GHCND:FIE00142080,51,60.3269,24.9603,19520102,,35.0,37.0,34.0,1.666667,19520102,195201,1952,01
2,GHCND:FIE00142080,51,60.3269,24.9603,19520103,0.14,33.0,36.0,,0.555556,19520103,195201,1952,01
3,GHCND:FIE00142080,51,60.3269,24.9603,19520104,0.05,29.0,30.0,25.0,-1.666667,19520104,195201,1952,01
4,GHCND:FIE00142080,51,60.3269,24.9603,19520105,0.06,27.0,30.0,25.0,-2.777778,19520105,195201,1952,01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23711,GHCND:FIE00142080,51,60.3269,24.9603,20170930,,47.0,49.0,44.0,8.333333,20170930,201709,2017,09
23712,GHCND:FIE00142080,51,60.3269,24.9603,20171001,0.04,47.0,48.0,45.0,8.333333,20171001,201710,2017,10
23713,GHCND:FIE00142080,51,60.3269,24.9603,20171002,,47.0,49.0,46.0,8.333333,20171002,201710,2017,10
23714,GHCND:FIE00142080,51,60.3269,24.9603,20171003,0.94,47.0,,44.0,8.333333,20171003,201710,2017,10


In [22]:
# Group data by YEAR_MONTH column
data_YrMon_group = data.groupby(by=["YEAR", "MONTH"])


In [23]:
# Monthly average temperatures data
monthly_data = data_YrMon_group[['TAVG_FAHR', 'TMAX', 'TMIN', 'temp_celsius']].mean()
monthly_data


Unnamed: 0_level_0,Unnamed: 1_level_0,TAVG_FAHR,TMAX,TMIN,temp_celsius
YEAR,MONTH,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1952,01,29.478261,33.263158,27.545455,-1.400966
1952,02,24.800000,29.142857,17.761905,-4.000000
1952,03,13.807692,26.045455,-0.625000,-10.106838
1952,04,39.607143,49.920000,29.884615,4.226190
1952,05,44.666667,53.304348,33.916667,7.037037
...,...,...,...,...,...
2017,06,56.300000,65.533333,48.566667,13.500000
2017,07,60.290323,69.709677,52.516129,15.716846
2017,08,60.290323,68.064516,53.774194,15.716846
2017,09,52.333333,58.850000,47.625000,11.296296


In [24]:
# This test print should print the length of variable monthly_data
print(len(monthly_data))

790


In [25]:
# This test print should print the column names of monthly_data
print(monthly_data.columns.values)

['TAVG_FAHR' 'TMAX' 'TMIN' 'temp_celsius']


In [26]:
# This test print should print the mean of temp_celsius
print(monthly_data['temp_celsius'].mean())

5.097114347669992


In [27]:
# This test print should print the median of temp_celsius
print(round(monthly_data['temp_celsius'].median(), 2))

4.73


## Problem 3 - Calculating temperature anomalies (5 points)

Our goal in this problem is to calculate monthly temperature anomalies in order to see how temperatures have changed over time, relative to an observation period between 1952-1980. You can continue working with the same data that you used in Problems 1 and 2.

**Your score on this problem will be based on following criteria:**

### Part 1:

- Calculating ***the mean temperature for each month over the period from 1952 up to and including 1980*** in a new DataFrame called `reference_temps`
    - You should end up with 12 values, 1 mean temperature for each month during the time period (see example table and figure below).
    - The columns in the new DataFrame should be `month` and `ref_temp`
    
Your `reference_temps` dataframe should have the following structure: 1 value for each month of the year (12 total) and the values represent and average in the observation period 1952-1980. The `ref_temp` temperatures should be in degrees Celsius.
   
| month    | ref_temp         |
|----------|------------------|
| 01       | -5.838761        |
| 02       | -7.064088        |
| 03       | -3.874213        |
| ...      | ...              |

### Part 2:

- Calculating **a temperature anomaly for every month** in the `monthly_data` DataFrame using the corresponding monthly average temperature for each of the 12 months:
    - In order to achieve this you need to make **a table join** (see [hints for this week](https://geo-python.github.io/site/lessons/L5/exercise-6.html)) between `monthly_data` and `ref_temps` based on the month.
    - Temperature anomaly is calculated as the difference between the temperature for a given month (`temp_celsius` column in `monthly_data`) and the corresponding monthly reference temperature (`ref_temp` column in`reference_temps`) 
    - Store the result in a new column `"diff"` 
    
As the output of the table join and the calculation, you should have three new columns in the `monthly_data` DataFrame: 
1. `diff`: The temperature anomaly, i.e. the difference between the temperature for a given month (e.g., February 1960) and the mean temperature during the reference period (e.g., for February 1952 to 1980), 
2. `month`: The month for that row of observations
3. `ref_temp`: The monthly reference temperature

A summary of the relationships between the `monthly_data` and `reference_temps` DataFrames, as well as how the `diff` value should be calculated in the `monthly_data` DataFrame is presented in the figure below.

![Exercise 6 dataframes](img/exercise-6-dataframes.png)<br/>
*Figure 1. Relationships between the `monthly_data` and `reference_temps` DataFrames.*

You should finally report which month had the greatest weather anomaly during the observed time period.

Remember to include comments in your code.

In [28]:
# Selecting the data from 1952 to 1980 inclusive
data_1952_1980 = data.loc[(data['DATE'] >= 19520101) & (data['DATE'] <= 19801231)]

# Group the selected by month
data_1952_1980_grpMonth = data_1952_1980.groupby(by="MONTH") 

# Rename the "temp_celsius" column to "ref_temp"
ch_col_name = {"temp_celsius":"ref_temp"}

# Create the "reference_temps" dataframe
reference_temps = data_1952_1980_grpMonth[["MONTH","temp_celsius"]].mean()
reference_temps = reference_temps.rename(columns=ch_col_name)
reference_temps


  reference_temps = data_1952_1980_grpMonth[["MONTH","temp_celsius"]].mean()


Unnamed: 0_level_0,ref_temp
MONTH,Unnamed: 1_level_1
1,-5.877342
2,-6.990482
3,-3.84127
4,2.427875
5,9.522613
6,14.711898
7,16.498881
8,15.022075
9,9.91092
10,4.947222


In [29]:
# Monthly_data checkpoint
monthly_data1 = monthly_data

# Create a new "monthly_data" joining the "reference_temps" data to the initial monthly data
monthly_data = monthly_data.merge(reference_temps, on='MONTH', how="left")
monthly_data

Unnamed: 0_level_0,TAVG_FAHR,TMAX,TMIN,temp_celsius,ref_temp
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
01,29.478261,33.263158,27.545455,-1.400966,-5.877342
02,24.800000,29.142857,17.761905,-4.000000,-6.990482
03,13.807692,26.045455,-0.625000,-10.106838,-3.841270
04,39.607143,49.920000,29.884615,4.226190,2.427875
05,44.666667,53.304348,33.916667,7.037037,9.522613
...,...,...,...,...,...
06,56.300000,65.533333,48.566667,13.500000,14.711898
07,60.290323,69.709677,52.516129,15.716846,16.498881
08,60.290323,68.064516,53.774194,15.716846,15.022075
09,52.333333,58.850000,47.625000,11.296296,9.910920


In [30]:
# Add the "diff" column 
monthly_data["diff"] = monthly_data['temp_celsius'] - monthly_data['ref_temp']
monthly_data

Unnamed: 0_level_0,TAVG_FAHR,TMAX,TMIN,temp_celsius,ref_temp,diff
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
01,29.478261,33.263158,27.545455,-1.400966,-5.877342,4.476376
02,24.800000,29.142857,17.761905,-4.000000,-6.990482,2.990482
03,13.807692,26.045455,-0.625000,-10.106838,-3.841270,-6.265568
04,39.607143,49.920000,29.884615,4.226190,2.427875,1.798315
05,44.666667,53.304348,33.916667,7.037037,9.522613,-2.485576
...,...,...,...,...,...,...
06,56.300000,65.533333,48.566667,13.500000,14.711898,-1.211898
07,60.290323,69.709677,52.516129,15.716846,16.498881,-0.782036
08,60.290323,68.064516,53.774194,15.716846,15.022075,0.694771
09,52.333333,58.850000,47.625000,11.296296,9.910920,1.385377


In [31]:
# Check the monthly data:
monthly_data.head()

Unnamed: 0_level_0,TAVG_FAHR,TMAX,TMIN,temp_celsius,ref_temp,diff
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,29.478261,33.263158,27.545455,-1.400966,-5.877342,4.476376
2,24.8,29.142857,17.761905,-4.0,-6.990482,2.990482
3,13.807692,26.045455,-0.625,-10.106838,-3.84127,-6.265568
4,39.607143,49.92,29.884615,4.22619,2.427875,1.798315
5,44.666667,53.304348,33.916667,7.037037,9.522613,-2.485576


In [32]:
#monthly_data['temp_celsius'].isna().sum()

In [33]:
# Print out desriptive statistics for the relevant columns:
monthly_data[["temp_celsius", "ref_temp", "diff"]].describe()

Unnamed: 0,temp_celsius,ref_temp,diff
count,682.0,790.0,682.0
mean,5.097114,4.389928,0.704662
std,8.483949,8.273489,2.537382
min,-17.97491,-6.990482,-12.097568
25%,-1.685185,-3.84127,-0.840714
50%,4.726105,4.947222,0.762644
75%,12.87037,13.511653,2.32168
max,22.329749,16.498881,8.161117


Remember also to calculate which month had the largest temperature anomaly during the observed time period in comparison with the reference data. Use the cell below to calculate and print out the answers. Note, you may want to consider the largest absolute value of the temperature anomaly, as well as the largest positive and negative anomalies.

In [34]:
# Add a column containing the absolute temperature anomaly
monthly_data['abs_diff'] = abs(monthly_data['diff'])

# Sorting the "monthly_data" by the anomaly('diff') column
monthly_data.sort_values(by="diff", ascending=True).head()


Unnamed: 0_level_0,TAVG_FAHR,TMAX,TMIN,temp_celsius,ref_temp,diff,abs_diff
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,-0.354839,6.290323,-6.451613,-17.97491,-5.877342,-12.097568,12.097568
1,4.064516,11.451613,-3.096774,-15.519713,-5.877342,-9.642371,9.642371
12,9.032258,15.580645,1.806452,-12.759857,-4.168044,-8.591813,8.591813
2,4.785714,11.571429,-1.607143,-15.119048,-6.990482,-8.128566,8.128566
2,5.809524,14.277778,-5.866667,-14.550265,-6.990482,-7.559782,7.559782


In [35]:
#monthly_data1

In [36]:
# The month with the highest negative temperature anomaly. The month and its corresponding year were acquired from the 
# initial "monthly_data" in Problem 2
highest_negative_diff = monthly_data1.loc[(monthly_data1['temp_celsius'] <= -17.974910)]
highest_negative_diff

Unnamed: 0_level_0,Unnamed: 1_level_0,TAVG_FAHR,TMAX,TMIN,temp_celsius
YEAR,MONTH,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1987,1,-0.354839,6.290323,-6.451613,-17.97491


In [37]:
# Sorting the "monthly_data" by the anomaly('diff') column
monthly_data.sort_values(by="diff", ascending=False).head()

Unnamed: 0_level_0,TAVG_FAHR,TMAX,TMIN,temp_celsius,ref_temp,diff,abs_diff
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2,34.107143,37.642857,30.107143,1.170635,-6.990482,8.161117,8.161117
12,37.903226,40.548387,32.354839,3.27957,-4.168044,7.447614,7.447614
2,32.75,37.571429,27.5,0.416667,-6.990482,7.407149,7.407149
2,32.62069,35.965517,28.586207,0.344828,-6.990482,7.33531,7.33531
2,32.214286,35.392857,27.928571,0.119048,-6.990482,7.10953,7.10953


In [38]:
# The month with the highest positive temperature anomaly. The month and its corresponding year were acquired from the 
# initial "monthly_data" in Problem 2
highest_positive_diff = monthly_data1.loc[(monthly_data1['temp_celsius'] <= 1.170635)].sort_values(by='temp_celsius', ascending=False)
highest_positive_diff.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,TAVG_FAHR,TMAX,TMIN,temp_celsius
YEAR,MONTH,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1990,2,34.107143,37.642857,30.107143,1.170635
1975,11,34.1,38.3,28.4,1.166667
1985,4,33.933333,40.866667,27.4,1.074074
1958,4,33.833333,41.965517,24.964286,1.018519
1992,3,33.645161,37.258065,30.741935,0.913978


In [39]:
monthly_data1

Unnamed: 0_level_0,Unnamed: 1_level_0,TAVG_FAHR,TMAX,TMIN,temp_celsius
YEAR,MONTH,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1952,01,29.478261,33.263158,27.545455,-1.400966
1952,02,24.800000,29.142857,17.761905,-4.000000
1952,03,13.807692,26.045455,-0.625000,-10.106838
1952,04,39.607143,49.920000,29.884615,4.226190
1952,05,44.666667,53.304348,33.916667,7.037037
...,...,...,...,...,...
2017,06,56.300000,65.533333,48.566667,13.500000
2017,07,60.290323,69.709677,52.516129,15.716846
2017,08,60.290323,68.064516,53.774194,15.716846
2017,09,52.333333,58.850000,47.625000,11.296296


In [40]:
# Summer(June, July, August) temperature in Helsinki(Mean)
helsinki_groupMon_mean = data.groupby(by='MONTH').mean()
helsinki_groupMon_mean.iloc[5:8]

  helsinki_groupMon_mean = data.groupby(by='MONTH').mean()


Unnamed: 0_level_0,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TAVG_FAHR,TMAX,TMIN,temp_celsius
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
6,51.0,60.3269,24.9603,19850720.0,0.073886,58.432836,67.731682,49.154522,14.684909
7,51.0,60.3269,24.9603,19849020.0,0.091389,63.110155,71.645978,54.056987,17.283419
8,51.0,60.3269,24.9603,19847430.0,0.106603,60.081097,68.457242,52.030572,15.600609


In [41]:
# Summer(June, July, August) temperature in Helsinki(Mean)
helsinki_groupMon_mean = data[['MONTH', 'TMAX', 'TMIN', 'TAVG_FAHR', 'temp_celsius']].groupby(by='MONTH').std()
helsinki_groupMon_mean.iloc[5:8]

Unnamed: 0_level_0,TMAX,TMIN,TAVG_FAHR,temp_celsius
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
6,7.383669,6.064493,6.174929,3.430516
7,6.323164,5.373896,5.192221,2.884567
8,6.463779,5.764585,5.306949,2.948305


### On to Problem 4 (*optional*)

Now you can continue to the *optional* [Problem 4](Exercise-6-problem-4.ipynb)