# Download timeseries climate data from cities

This script downloads, cleans and saves timeseries data from GCM and ERA-Interim.

**Requires:**
* Baspy module https://github.com/scott-hosking/baspy
* Run `python setup.py install` to use modules

**Notes:**
* Data is saved at /data_directory/filetype/variable_label/City_rcpX.nc (for GCM) or City_ERAI.nc (for ERA) - you can specify the parent data_directory and variable_label in settings.
* Extracting one city for one GCM takes around 3-5 mins on JASMIN, quicker with ERA-Interim
* Downloading GCM data (e.g. from 1980 to 2050) concatenates historical and future RCP run into one timeseries


In [1]:
settings = {
    # --------------------------- #
    #  Location / variable        #
    # --------------------------- #
    # 'cities': ['London', 'NYC', 'Beijing', 'Tokyo', 'Madrid'],  # Cities to download
    'cities': ['London', 'NYC'],

    'variable_name_gcm': 'tasmax',  # GCM official variable name e.g. 'tas' (mean T), 'tasmax' (max T), 'tasmin' (min T)
    'variable_name_era': 't2max',   # ERAI official variable name e.g. 'T2' (mean T), 't2max' (max T), 't2min' (min T)
    'variable_label': 'max_temperature',  # Downloaded data will go in this folder; ensure accurate discription

    'no_leap_years': True,   # Remove 29th Feb's to get 365-days consistent calender

    # --------------------------- #
    #  GCM                        #
    # --------------------------- #
    'model': 'HadGEM2-CC',  # Climate model
    'future_rcp': ['rcp85', 'rcp45'],  # Model RCP
    'model_start': 1980,  # Model start year (inclusive)
    'model_end': 2050,  # Model end year (inclusive)

    # --------------------------- #
    #  ERA Interim                #
    # --------------------------- #
    'observed_start': 1980,  # ERA Interim start year (inclusive)
    'observed_end': 2017,  # ERA Interim end year (inclusive)

    # --------------------------- #
    #  Saving files               #
    # --------------------------- #
    'save_coords': True,  # Optional: save loaded coordinates from cities
    'filetype': 'netcdf',  # Climate data type to save 'netcdf' or 'df'
    'data_directory': './data/riskindex/'  # Directory to save created data
}

In [2]:
from scripts.get1D import ClimateDataProcessing
cd = ClimateDataProcessing(settings)

## Download clean data for all cities in one line:

### GCM

In [7]:
cd.save_clean_gcm_data()

Geopy found the following locations:
* 'London': London, Greater London, England, SW1A 2DX, United Kingdom
* 'NYC': New York, United States of America
* 'Beijing': 北京市, 东城区, 北京市, 100010, China 中国
* 'Tokyo': 東京都, 関東地方, 日本 (Japan)
* 'Madrid': Madrid, Área metropolitana de Madrid y Corredor del Henares, Comunidad de Madrid, 28001, España
Saved at ../../data/riskindex/coords
You can load this later with self.load_coords(filename=coords)
GCM settings: model = ['HadGEM2-CC'],               rcp = ['rcp85', 'rcp45'],               start = 1980,               end = 2050
Updating cached catalogue...
catalogue memory usage (MB): 28.786050999999997
>> Current cached values (can be extended by specifying additional values or by setting read_everything=True) <<
{'Experiment': ['rcp85', 'rcp26', 'historical', 'piControl', 'rcp45'], 'Frequency': ['mon', 'day']}

  Centre       Model  Experiment Frequency SubModel CMOR   RunID    Version  \
0   MOHC  HadGEM2-CC  historical       day    atmos  day  r1i1

KeyboardInterrupt: 

### ERA-Interim

In [None]:
cd.save_clean_era_data()

### If you want to manually specify coordinates:

In [None]:
coords = {}
coords['Some_city1'] = {'longitude': -10, 'latitude': 20}
coords['Some_city2'] = {'longitude': 0, 'latitude': 100}

cd.save_clean_gcm_data(coords=coords)
cd.save_clean_era_data(coords=coords)

### Load saved data

In [8]:
da_gcm = cd.load_data_at_location(city='London', rcp=45)
da_era = cd.load_data_at_location(city='London', model='era')

In [9]:
da_gcm

<xarray.DataArray (time: 25915)>
array([283.577637, 280.271443, 279.143461, ..., 279.336669, 278.550096,
       283.045349])
Coordinates:
  * time     (time) datetime64[ns] 1980-01-01 1980-01-02 ... 2050-12-31
    lon      float64 ...
    lat      float64 ...
Attributes:
    standard_name:  air_temperature
    long_name:      Daily Maximum Near-Surface Air Temperature
    units:          K
    original_name:  mo: m01s03i236
    cell_methods:   time: maximum
    cell_measures:  area: areacella
    city:           London
    rcp:            rcp45

In [10]:
da_era

<xarray.DataArray (time: 13870)>
array([273.09656, 273.35147, 283.5008 , ..., 280.2819 , 284.54987, 285.18195],
      dtype=float32)
Coordinates:
    surface      float32 ...
  * time         (time) datetime64[ns] 1980-01-01 1980-01-02 ... 2017-12-31
    day_of_year  (time) int64 ...
    year         (time) int64 ...
    lon          float64 ...
    lat          float64 ...
Attributes:
    long_name:     Maximum 2 metre temperature since previous post-processing
    units:         K
    cell_methods:  day_of_year: year: maximum
    city:          London
    rcp:           None
    model_type:    era

In [11]:
# To convert to pandas dataframe:
from scripts import dataprocessing
df_gcm = dataprocessing.da_to_df(da_gcm)
df_gcm.head()

Unnamed: 0_level_0,data
time,Unnamed: 1_level_1
1980-01-01,283.577637
1980-01-02,280.271443
1980-01-03,279.143461
1980-01-04,285.184903
1980-01-05,281.610492


## Behind ClimateDataProcessing