# Download timeseries climate data from cities
### (Nearest gridpoint to city location / 1D)

This script downloads, cleans and saves timeseries GCM and ERA-Interim data from JASMIN.

**Requires:**
* Access to `/bas_climate` group workspace
* Baspy module https://github.com/scott-hosking/baspy
* Run `pip install -e .` to install as package (+ dependencies) - creating a new conda environment (python virtual environment) is recommended
* Tested with python 3.6

**Notes:**
* Data is saved at /data_directory/filetype/variable_label/City_rcpX.nc (for GCM) or City_ERAI.nc (for ERA) - you can specify the parent data_directory and variable_label in settings.
* Extracting one city for one GCM takes around 3-5 mins on JASMIN, quicker with ERA-Interim
* Downloading GCM data (e.g. from 1980 to 2050) concatenates historical and future RCP run into one timeseries
* Requires cftime v1.0.1 - setup.py can sometimes override this and break the code; if this happens please (re)install the correct version of cftime.

In [2]:
settings = {
    # --------------------------- #
    #  Location / variable        #
    # --------------------------- #
    # 'cities': ['London', 'NYC', 'Beijing', 'Tokyo', 'Madrid'],  # Cities to download
    'cities': ['London', 'NYC'],

    'variable_name_gcm': 'tasmax',  # GCM official variable name e.g. 'tas' (mean T), 'tasmax' (max T), 'tasmin' (min T)
    'variable_name_era': 't2max',   # ERAI official variable name e.g. 'T2' (mean T), 't2max' (max T), 't2min' (min T)
    'variable_label': 'max_temperature',  # Downloaded data will go in this folder; ensure accurate discription

    'no_leap_years': True,   # Remove 29th Feb's to get 365-days consistent calender

    # --------------------------- #
    #  GCM                        #
    # --------------------------- #
    'model': 'HadGEM2-CC',  # Climate model
    'future_rcp': ['rcp85', 'rcp45'],  # Model RCP
    'model_start': 1980,  # Model start year (inclusive)
    'model_end': 2050,  # Model end year (inclusive)

    # --------------------------- #
    #  ERA Interim                #
    # --------------------------- #
    'observed_start': 1980,  # ERA Interim start year (inclusive)
    'observed_end': 2017,  # ERA Interim end year (inclusive)

    # --------------------------- #
    #  Saving files               #
    # --------------------------- #
    'save_coords': True,  # Optional: save loaded coordinates from cities
    'filetype': 'netcdf',  # Climate data type to save 'netcdf' or 'df'
    'data_directory': '../data/riskindex/'  # Directory to save created data
}

In [3]:
from downloader.get1D import ClimateDataProcessing
cd = ClimateDataProcessing(settings)

## Download clean data for all cities in one line:

### GCM

In [3]:
cd.save_clean_gcm_data()

Geopy found the following locations:
* 'London': London, Greater London, England, SW1A 2DX, United Kingdom
* 'NYC': New York, United States of America
Saved at ../data/riskindex/coords
You can load this later with self.load_coords(filename=coords)
GCM settings: model = ['HadGEM2-CC'],               rcp = ['rcp85', 'rcp45'],               start = 1980,               end = 2050
Updating cached catalogue...
catalogue memory usage (MB): 28.786099
>> Current cached values (can be extended by specifying additional values or by setting read_everything=True) <<
{'Experiment': ['piControl', 'rcp26', 'historical', 'rcp45', 'rcp85'], 'Frequency': ['mon', 'day']}

  Centre       Model  Experiment Frequency SubModel CMOR   RunID    Version  \
0   MOHC  HadGEM2-CC  historical       day    atmos  day  r1i1p1  v20110930   
1   MOHC  HadGEM2-CC       rcp45       day    atmos  day  r1i1p1  v20120531   
2   MOHC  HadGEM2-CC       rcp85       day    atmos  day  r1i1p1  v20120531   

      Var  StartDate  

will change. To retain the existing behavior, pass
combine='nested'. To use future default behavior, pass
combine='by_coords'. See
http://xarray.pydata.org/en/stable/combining.html#combining-multi

  ds = xr.open_mfdataset(files)
to use the new `combine_by_coords` function (or the
`combine='by_coords'` option to `open_mfdataset`) to order the datasets
before concatenation. Alternatively, to continue concatenating based
on the order the datasets are supplied in future, please use the new
`combine_nested` function (or the `combine='nested'` option to
open_mfdataset).
  from_openmfds=True,


Loading rcp85 data array...(2/3)


will change. To retain the existing behavior, pass
combine='nested'. To use future default behavior, pass
combine='by_coords'. See
http://xarray.pydata.org/en/stable/combining.html#combining-multi

  ds = xr.open_mfdataset(files)
to use the new `combine_by_coords` function (or the
`combine='by_coords'` option to `open_mfdataset`) to order the datasets
before concatenation. Alternatively, to continue concatenating based
on the order the datasets are supplied in future, please use the new
`combine_nested` function (or the `combine='nested'` option to
open_mfdataset).
  from_openmfds=True,


Loading rcp45 data array...(3/3)


will change. To retain the existing behavior, pass
combine='nested'. To use future default behavior, pass
combine='by_coords'. See
http://xarray.pydata.org/en/stable/combining.html#combining-multi

  ds = xr.open_mfdataset(files)
to use the new `combine_by_coords` function (or the
`combine='by_coords'` option to `open_mfdataset`) to order the datasets
before concatenation. Alternatively, to continue concatenating based
on the order the datasets are supplied in future, please use the new
`combine_nested` function (or the `combine='nested'` option to
open_mfdataset).
  from_openmfds=True,


Data arrays loaded.
Saving data for all specified cities and RCP.
Standardising coordinates
Rolling longitude 0/360 --> -180/180
Coordinate extracted
Converting to datetime64... (retrieving all values and interpolating if necessary, may take a few mins)
Saved at ../data/riskindex/netcdf/max_temperature/London_rcp85.nc
Standardising coordinates
Rolling longitude 0/360 --> -180/180
Coordinate extracted
Converting to datetime64... (retrieving all values and interpolating if necessary, may take a few mins)
Saved at ../data/riskindex/netcdf/max_temperature/London_rcp45.nc
Standardising coordinates
Rolling longitude 0/360 --> -180/180
Coordinate extracted
Converting to datetime64... (retrieving all values and interpolating if necessary, may take a few mins)
Saved at ../data/riskindex/netcdf/max_temperature/NYC_rcp85.nc
Standardising coordinates
Rolling longitude 0/360 --> -180/180
Coordinate extracted
Converting to datetime64... (retrieving all values and interpolating if necessary, may take

### ERA-Interim

In [4]:
cd.save_clean_era_data()

Geopy found the following locations:
* 'London': London, Greater London, England, SW1A 2DX, United Kingdom
* 'NYC': New York, United States of America
Saved at ../data/riskindex/coords
You can load this later with self.load_coords(filename=coords)


will change. To retain the existing behavior, pass
combine='nested'. To use future default behavior, pass
combine='by_coords'. See
http://xarray.pydata.org/en/stable/combining.html#combining-multi

to use the new `combine_by_coords` function (or the
`combine='by_coords'` option to `open_mfdataset`) to order the datasets
before concatenation. Alternatively, to continue concatenating based
on the order the datasets are supplied in future, please use the new
`combine_nested` function (or the `combine='nested'` option to
open_mfdataset).
  from_openmfds=True,


Standardising coordinates
Elapsed time: 57.006964 seconds.

ERA-Interim data loaded
Saving data for all specified cities and RCP.
Standardising coordinates
Rolling longitude 0/360 --> -180/180
Coordinate extracted
Saved at ../data/riskindex/netcdf/max_temperature/London_ERAI.nc
Standardising coordinates
Rolling longitude 0/360 --> -180/180
Coordinate extracted
Saved at ../data/riskindex/netcdf/max_temperature/NYC_ERAI.nc


### If you want to manually specify coordinates:

In [None]:
coords = {}
coords['Some_city1'] = {'longitude': -10, 'latitude': 20}
coords['Some_city2'] = {'longitude': 0, 'latitude': 100}

cd.save_clean_gcm_data(coords=coords)
cd.save_clean_era_data(coords=coords)

### Load saved data

In [5]:
da_gcm = cd.load_data_at_location(city='London', rcp=45)
da_era = cd.load_data_at_location(city='London', model='era')

In [6]:
da_gcm

In [7]:
da_era

In [8]:
# To convert to pandas dataframe:
from downloader import dataprocessing
df_gcm = dataprocessing.da_to_df(da_gcm)
df_gcm.head()

Unnamed: 0_level_0,data
time,Unnamed: 1_level_1
1980-01-01,283.577637
1980-01-02,280.271443
1980-01-03,279.143461
1980-01-04,285.184903
1980-01-05,281.610492
