In [1]:
import xarray as xr
import pandas as pd
from tqdm import tqdm
from functools import reduce



## Temperature Data

We used [Berkeley Earth Global-Warming Dataset](http://berkeleyearth.org/data/). 
The Berkeley Earth averaging process generates a variety of Output data including a set of gridded temperature fields, regional averages, and bias-corrected station data. Source data consists of the raw temperature reports that form the foundation of our averaging system. Source observations are provided as originally reported and will contain many quality control and redundancy issues. Intermediate data is constructed from the source data by merging redundant records, identifying a variety of quality control problems, and creating monthly averages from daily reports when necessary.

The dataset expresses temperatures as anomalies. It determines the average temperature between the years 70-80 as climatology, and the difference to this average temperature in the current date as anomaly.


## Defining the Target Zone
Define the latitude & longitude ranges of your target country, zone, etc. The example is for Turkey.

In [2]:
LAT_RANGE = (36, 42)
LON_RANGE = (26, 45)

We are importing the data-paths for min, max and avg temperatures. Defining suffixes for their column names.

In [3]:
data_files_min = [
                '../data/raw_data/Complete_TMIN_Daily_LatLong1_2010.nc',
                '../data/raw_data/Complete_TMIN_Daily_LatLong1_2020.nc',
             ]

data_files_max = [
                '../data/raw_data/Complete_TMAX_Daily_LatLong1_2010.nc',
                '../data/raw_data/Complete_TMAX_Daily_LatLong1_2020.nc',
             ]

data_files_avg = [
                '../data/raw_data/Complete_TAVG_Daily_LatLong1_2010.nc',
                '../data/raw_data/Complete_TAVG_Daily_LatLong1_2020.nc',
             ]

suffixes = ["_min", "_max", "_avg"]

We are iterating over our temperature datasets. We are only taking the zones that falls within our target latitude & longitude ranges.

We have the data on daily basis. We are extracting means from them to create monthly features. We add our 'temperature anomaly' to our 'climatology mean temperature' to find the actual temperature.

Finally, we combine the min, max and avg data chronologically.

In [4]:
temperature_dfs = []


for data_i, data_files in enumerate(tqdm([data_files_min, data_files_max, data_files_avg])):
    datasets = []

    for data_file in data_files:
        ds = xr.open_dataset(data_file)

        climatology_df = ds[["climatology"]]
        climatology_df = climatology_df.where((climatology_df.latitude >= LAT_RANGE[0]) &\
                          (climatology_df.latitude <= LAT_RANGE[1]) &\
                          (climatology_df.longitude >= LON_RANGE[0]) &\
                          (climatology_df.longitude <= LON_RANGE[1]),
                      drop=True)\
                      .to_dataframe().reset_index().dropna().rename(columns={"day_number":"day_of_year"})   
        climatology_df = climatology_df.groupby(["climatology", "longitude", "latitude"]).mean().reset_index()

        ds = ds[["temperature", "month", "year", "day_of_year"]]
        ds = ds.where((ds.latitude >= LAT_RANGE[0]) &\
                      (ds.latitude <= LAT_RANGE[1]) &\
                      (ds.longitude >= LON_RANGE[0]) &\
                      (ds.longitude <= LON_RANGE[1]),
                      drop=True)\
                      .to_dataframe().reset_index().dropna().drop(labels=["time", ], axis=1)
        ds = ds.groupby(["latitude", "longitude", "month", "year"]).mean().reset_index()

        climatology_df = climatology_df.merge(ds, how="left", on=["day_of_year", "latitude", "longitude"])\
                .drop(labels=["day_of_year"], axis=1)

        climatology_df["temperature"] += climatology_df["climatology"] 
        climatology_df.drop(labels=["climatology"], axis=1, inplace=True)
        climatology_df.rename(columns={"temperature":"temperature"+suffixes[data_i]}, inplace=True)  
        
        datasets.append(climatology_df.dropna())

    datasets = pd.concat(datasets, ignore_index=True)    
    temperature_dfs.append(datasets)

Cannot find the ecCodes library
100%|█████████████████████████████████████████████| 3/3 [00:10<00:00,  3.47s/it]


We combine min, max and avg temperature data as separate columns in a single dataframe by latitude, longitude, month and year.

In [5]:
full_temp_df = reduce(lambda df_left,df_right: pd.merge(df_left, df_right, 
                                              on=["latitude", "longitude", "month", "year"], 
                                              how='left'), 
                  temperature_dfs)

## Export
We are exporting the temperature data.

In [6]:
full_temp_df.to_csv("../data/processed_data/temperatures.csv", index=False)