# Problem 4 (optional)

This is an optional task for more advanced students. 

## What to do

1. Start by downloading your own data (daily summaries for years **1959-2018 August**) for **Sodankyla Lokka** (notice the place name should be without `ä` letter), from the [NOAA Climate Data Online Search](https://www.ncdc.noaa.gov/cdo-web/search?datasetid=GHCND). Make sure to click on starting day (and ending day) in the date selection panel after changing year! After you have searched, click “Add to cart” for a selected station, then go to cart. Select the ``Custom GHCN-Daily Text`` format for the resulting output file and hit continue.

    - From the `Station Detail & Data Flag Options` choose two of the following attributes: Station Name, Geographic Location. **Notice:** Do **NOT** include data flags because it makes the data difficult to read. Use **Standard** units.
    - Take also Precipitation and Temperature which are under a separate button below. 
    - From the next page, add your own email address where the weather data will be sent after a short moment.

2. After you have downloaded the data. you should first,

    - Calculate the average temperature using columns `TMAX` and `TMIN` and insert those values into a new column called `TAVG`.

3. Next, you should use the approaches learned during this week and used in Problem 3 to answer / do the following:

    - Calculate the temperature anomalies in Sodankyla, i.e. the difference between `reference_temps` and the average temperature for each month (see Problem 3).
    - Calculate the monthly temperature differences between Sodankyla and Helsinki stations
        - How different are the summer temperatures (June, July, August) between Helsinki (used in Problems 1-3) and Sodankyla station?
        - What were the summer mean temperatures for both of these stations?
        - What were the summer standard deviations for both of these stations?
    - Calculate the monthly differences in a DataFrame and save it (as `CSV` file) into your own Exercise repository for this week
4. Upload your script and data to GitHub

In [4]:
import pandas as pd

# Read in data
data = pd.read_csv(r"data/3969405.txt", sep=r"\s+", skiprows=[1], na_values=["-9999"])
data.head()

Unnamed: 0,Unnamed: 1,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590801,0.0,60.0,35.0
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590802,0.0,71.0,33.0
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590803,0.12,71.0,42.0
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590804,0.17,65.0,55.0
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590805,0.1,68.0,56.0


In [5]:
# Select key columns
data = data[["DATE","TMAX", "TMIN"]].reset_index(drop=True)
data.head()

Unnamed: 0,DATE,TMAX,TMIN
0,19590801,60.0,35.0
1,19590802,71.0,33.0
2,19590803,71.0,42.0
3,19590804,65.0,55.0
4,19590805,68.0,56.0


In [6]:
# Create TAVG column
data["TAVG"] = (data["TMAX"] + data["TMIN"]) / 2
data.head()

Unnamed: 0,DATE,TMAX,TMIN,TAVG
0,19590801,60.0,35.0,47.5
1,19590802,71.0,33.0,52.0
2,19590803,71.0,42.0,56.5
3,19590804,65.0,55.0,60.0
4,19590805,68.0,56.0,62.0


In [7]:
# Convert to datetime
data["DATE"] = data["DATE"].astype(str).str.slice(start=0, stop=6)
data["DATE"] = pd.to_datetime(data["DATE"], format="%Y%m", exact=False).dt.strftime("%Y-%m")
data.head()

Unnamed: 0,DATE,TMAX,TMIN,TAVG
0,1959-08,60.0,35.0,47.5
1,1959-08,71.0,33.0,52.0
2,1959-08,71.0,42.0,56.5
3,1959-08,65.0,55.0,60.0
4,1959-08,68.0,56.0,62.0


In [8]:
# Create table with monthly averages and add dates
monthly_data = pd.DataFrame({"DATE": data["DATE"].unique()})

# Group by month and find average temperature, add to new table
monthly_fahr = data.groupby("DATE")["TAVG"].mean()
monthly_data = monthly_data.merge(monthly_fahr, on="DATE")

# Convert to celsius and rename column
monthly_data["TAVG"] = (monthly_data["TAVG"] - 32) / 1.8
monthly_data = monthly_data.rename(columns={"TAVG": "temp_celsius"})

monthly_data.head()

Unnamed: 0,DATE,temp_celsius
0,1959-08,11.424731
1,1959-09,3.796296
2,1959-10,-2.016129
3,1959-11,-7.101852
4,1959-12,-13.225806


In [9]:
# Calculate reference temps from 1959-1980

# Create new dataframe
reference_temps = pd.DataFrame()
monthly_data["DATE"] = pd.to_datetime(monthly_data["DATE"])
reference_temps["DATE"] = monthly_data["DATE"].dt.month.unique()

# Create a copy with filtered dates
reference_data = monthly_data[(monthly_data["DATE"].dt.year >= 1959) & (monthly_data["DATE"].dt.year <= 1980)].copy()

# Calculate averages for each month and add to new dataframe
ref_temp = reference_data.groupby(reference_data["DATE"].dt.month)["temp_celsius"].mean()
reference_temps = reference_temps.merge(ref_temp, on="DATE")

# Sort new dataframe by month and rename columns
reference_temps = reference_temps.sort_values(by="DATE", ascending=True).reset_index(drop=True)
reference_temps = reference_temps.rename(columns={"DATE": "Month", "temp_celsius": "ref_temp"})
reference_temps.head()

Unnamed: 0,Month,ref_temp
0,1,-16.153425
1,2,-16.21925
2,3,-11.184289
3,4,-4.104938
4,5,3.321386


1) Calculate temperature anomalies

In [11]:
# Join reference data to complete monthly data

# Clean dates for proper merging (will merge on mm)
# Convert reference temps from int type (m) to datetime then get months (mm)
reference_temps["Month"] = pd.to_datetime(reference_temps["Month"], format="%m")
reference_temps["Month"] = reference_temps["Month"].dt.strftime("%m")
# Convert monthly dates to string and grab month (mm), then merge
monthly_data["Month"] = monthly_data["DATE"].dt.strftime("%Y%m").str.slice(start=4, stop=6)
monthly_data = monthly_data.merge(reference_temps, on="Month")

# Calculate difference between monthly averages and reference temperature
monthly_data["Diff"] = monthly_data["temp_celsius"] - monthly_data["ref_temp"]

# Clean up dates and display results
monthly_data["DATE"] = monthly_data["DATE"].dt.strftime("%Y-%m")
monthly_data

Unnamed: 0,DATE,temp_celsius,Month,ref_temp,Diff
0,1959-08,11.424731,08,10.635753,0.788978
1,1959-09,3.796296,09,5.119444,-1.323148
2,1959-10,-2.016129,10,-1.918459,-0.097670
3,1959-11,-7.101852,11,-7.987963,0.886111
4,1959-12,-13.225806,12,-13.147401,-0.078405
...,...,...,...,...,...
675,2018-04,-1.814815,04,-4.104938,2.290123
676,2018-05,8.234767,05,3.321386,4.913381
677,2018-06,9.851852,06,10.246101,-0.394250
678,2018-07,17.706093,07,12.944727,4.761366


2. Calculate monthly difference between Sodankyla and Helsinki stations 