# Problem 4 (*optional*)

**Attention**: *As of 7:00 on 9 October 9th 2023, the NOAA Climate Data Online website is down, so it is not possible to complete this problem. Sorry :(.*

This optional problem is an opportunity to practice calculating weather anomalies for another location. You get to start from scratch and download the data yourself from NOAA.

## What to do

1. Start by downloading your own data (daily summaries for years **1959-2018 August**) for **Sodankyla Lokka** (notice the place name should be without the letter `ä`), from the [NOAA Climate Data Online Search](https://www.ncdc.noaa.gov/cdo-web/search?datasetid=GHCND). Make sure to select the starting day (and ending day) in the date selection panel after changing the year! After you have searched, click “Add to cart” for a selected station, then go to the cart. Select the ``Custom GHCN-Daily Text`` format for the resulting output file and hit continue.

    - From the `Station Detail & Data Flag Options` choose two of the following attributes: Station Name, Geographic Location. **Notice:** Do **NOT** include data flags because it makes the data difficult to read. Use **Standard** units.
    - Take also Precipitation and Temperature which are under a separate button below. 
    - From the next page, add your own email address where the weather data will be sent after a short moment.

2. After you have downloaded the data. you should first,

    - Calculate the average temperature using columns `TMAX` and `TMIN` and insert those values into a new column called `TAVG`.

3. Next, you should use the approaches learned during this week and used in Problem 3 to answer / do the following:

    - Calculate the temperature anomalies in Sodankylä, i.e., the difference between `reference_temps` and the average temperature for each month (see Problem 3).
    - Calculate the monthly temperature differences between the Sodankylä and Helsinki stations
        - How different are the summer temperatures (June, July, August) between Helsinki (used in Problems 1-3) and Sodankylä?
        - What were the summer mean temperatures for both of these stations?
        - What were the summer standard deviations for both of these stations?
    - Calculate the monthly differences in a DataFrame and save it (as `CSV` file) into your own Exercise repository for this week
4. Upload your notebook and data to GitHub

In [1]:
import os

def find_file(root_folder, filename):
    for root, dirs, files in os.walk(root_folder):
        if filename in files:
            return os.path.join(root, filename)
    return "File not found."
file_path = find_file(os.getcwd(), '3664866.txt')
file_path

'/home/jovyan/Exercise_geopython/exercise-6-rafimt/data/3664866.txt'

In [2]:
import pandas as pd
fp = r'/home/jovyan/Exercise_geopython/exercise-6-rafimt/data/3664866.txt'


data =pd.read_csv(fp,delim_whitespace=True, na_values=[-9999],skiprows=[1])

# YOUR CODE HERE
data.tail()

Unnamed: 0,Unnamed: 1,STATION,STATION_NAME,DATE,PRCP,TMAX,TMIN
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180827,0.04,55.0,43.0
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180828,0.0,59.0,31.0
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180829,0.0,65.0,32.0
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180830,0.02,65.0,48.0
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180831,0.0,59.0,46.0


In [3]:
print(data["TMIN"].max())
print()
print(data["TMIN"].max())

66.0

66.0


In [4]:
data["TAVG"] = (data["TMAX"] + data["TMIN"])/2
data["TAVG"].max()

75.0

In [5]:
def fahr_to_celsius(temp_fahrenheit):
    return (temp_fahrenheit -32) / 1.8
data["temp_celsius"] = data["TAVG"].apply(fahr_to_celsius)

data["MONTH"] = data["DATE"].astype(str)

    # Parse year and month and convert them to numbers
data["MONTH"] = data["MONTH"].str.slice(start=0, stop=6).astype(int)

monthly_data = data.groupby("MONTH")["temp_celsius"].mean().reset_index()
monthly_data


data.tail()

Unnamed: 0,Unnamed: 1,STATION,STATION_NAME,DATE,PRCP,TMAX,TMIN,TAVG,temp_celsius,MONTH
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180827,0.04,55.0,43.0,49.0,9.444444,201808
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180828,0.0,59.0,31.0,45.0,7.222222,201808
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180829,0.0,65.0,32.0,48.5,9.166667,201808
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180830,0.02,65.0,48.0,56.5,13.611111,201808
GHCND:FIE00146538,SODANKYLA,LOKKA,FI,20180831,0.0,59.0,46.0,52.5,11.388889,201808


In [6]:
reference_temps = monthly_data.rename(columns={"temp_celsius": "ref_temp", "MONTH": "month"})
reference_temps = reference_temps.loc[(reference_temps['month']>=195101) & (reference_temps['month']<198101)]
reference_temps["month"] = reference_temps["month"].astype(str)

    # Parse year and month and convert them to numbers
reference_temps["month"] = reference_temps["month"].str.slice(start=4, stop=6).astype(int)
reference_temps = reference_temps.groupby("month")["ref_temp"].mean().reset_index()

reference_temps

Unnamed: 0,month,ref_temp
0,1,-16.153425
1,2,-16.21925
2,3,-11.184289
3,4,-4.104938
4,5,3.423411
5,6,10.291667
6,7,12.934991
7,8,10.635753
8,9,5.119444
9,10,-1.918459


In [7]:
monthly_data['month'] = monthly_data['MONTH'].astype(str)
monthly_data['month'] = monthly_data["month"].str.slice(start=4, stop=6).astype(int)                                     
monthly_data = monthly_data.merge(reference_temps, on='month', how='outer')                                  
monthly_data['diff'] = monthly_data['temp_celsius'] - monthly_data['ref_temp']
monthly_data.head()

Unnamed: 0,MONTH,temp_celsius,month,ref_temp,diff
0,195901,,1,-16.153425,
1,196001,-19.121864,1,-16.153425,-2.968439
2,196101,-11.182796,1,-16.153425,4.970629
3,196201,-15.421147,1,-16.153425,0.732278
4,196301,-18.145161,1,-16.153425,-1.991736


In [8]:
monthly_data['diff'] = monthly_data['temp_celsius'] - monthly_data['ref_temp']
monthly_data.head()

Unnamed: 0,MONTH,temp_celsius,month,ref_temp,diff
0,195901,,1,-16.153425,
1,196001,-19.121864,1,-16.153425,-2.968439
2,196101,-11.182796,1,-16.153425,4.970629
3,196201,-15.421147,1,-16.153425,0.732278
4,196301,-18.145161,1,-16.153425,-1.991736


In [9]:
len(monthly_data)

687

In [10]:
# define output filename
Lokka = "Lokka_temperature_anomaly.csv"

# Save dataframe to csv
monthly_data.to_csv(Lokka, sep=",", index=False, float_format="%.1f")

In [11]:
file_path = find_file(os.getcwd(), "Helsinki_temperature_anomaly.csv")
file_path

'/home/jovyan/Exercise_geopython/exercise-6-rafimt/Helsinki_temperature_anomaly.csv'

In [12]:
fp = r'/home/jovyan/Exercise_geopython/exercise-6-rafimt/Helsinki_temperature_anomaly.csv'
monthly_data_Helsinki = pd.read_csv(fp)
monthly_data_Lokka = monthly_data_Helsinki.loc[(monthly_data_Helsinki["MONTH"]>=195901) & (monthly_data_Helsinki["MONTH"]<20170101)]
monthly_data_Lokka = monthly_data_Lokka.loc[(monthly_data_Lokka["MONTH"]>=195901) & (monthly_data_Lokka["MONTH"]<20170101)]
monthly_data_Lokka.tail()

Unnamed: 0,MONTH,temp_celsius,month,ref_temp,diff
785,201212,-6.6,12,-4.2,-2.5
786,201312,1.4,12,-4.2,5.5
787,201412,-1.1,12,-4.2,3.0
788,201512,2.2,12,-4.2,6.4
789,201612,-0.8,12,-4.2,3.4


In [13]:
monthly_data_Lokka = monthly_data
monthly_data_Lokka.tail()
print(len(monthly_data_Lokka))
print(len(monthly_data_Helsinki))

687
790


In [16]:
Temp_Diff = pd.DataFrame()
# Temp_Diff = monthly_data_Helsinki.merge(monthly_data_Lokka, on='MONTH', how='outer')
Temp_Diff["MONTH"] = monthly_data_Helsinki["MONTH"]

In [18]:
Temp_Diff["diff"] = monthly_data_Helsinki["diff"] - monthly_data_Lokka["diff"]

In [19]:
Temp_Diff.head()

Unnamed: 0,MONTH,diff
0,195201,
1,195301,3.368439
2,195401,-6.170629
3,195501,-0.332278
4,195601,-0.308264


In [21]:
# define output filename
Temp_Difference = "Helsinki_Lokka_temperature_anomaly.csv"

# Save dataframe to csv
Temp_Difference.to_csv(Temp_Diff, sep=",", index=False, float_format="%.2f")

AttributeError: 'str' object has no attribute 'to_csv'