**Aim:** Regridding prior model data to a common grid. Both downsampling to Echam and upsampling
**Date:** 09.09.22
**Resource:** xesmf package (apaarently it doesn't work on windows. There is some simple example in the docs https://xesmf.readthedocs.io/en/latest/notebooks/Dataset.html, or some explanations in Ryan Abernatheys great jupyter book: https://earth-env-data-science.github.io/lectures/working_with_gcm_data.html#part-1-model-validation-comparing-a-state-estimate-to-observations.
https://xesmf.readthedocs.io/en/latest/notebooks/Reuse_regridder.html explains how to use the regridder to save time.
**Method:** I start with bilinear interpolation, Ryans blogpost contains come ideas for what one should think about when using different methods. At first, bilinear interpolation is probably good enough. I also think of using upsampling to the highest resolution, because I think it could be useful for keeping more spatial degrees of freedom in the covariance patterns (I will have justify that of course).

Update: 'Bilinear' method seems to work without introducing zeroes.


**Data:** Cleaned model simulations for each variable. Also orographies!.

**Computational effort** To be estimated. Probably not that much.

----
Collection of thoughts about the superprior:
- which variables should be reproduced.
- superprior gives us more ensemble members to choose from, so I could use 5*100 instead of 100 from one simulation. Note, that in the parsons 2021 paper they did use the whole 850-1849 range for the prior, this is something I can not do due to the block-approach.

- The distribution of values will of course be subclustered into the different priors.
- I need to compute the prior estimates for each superprior member, therefore it's better to use a regridded orography.

- what are main changes for the code:
    - load all models and take average states
    - run psm through all models
    - create a values/names/--- vector for each model
    - create a prior-block for each model and then concatenate these in a smart way.
    - saving the prior estimates is probablya bit differen (need to check that)
    
    - when using pseudoproxies:
        - there is no way I can compute pseudoproxies from the superiprior itself, I just get a posterior.
        - still, I can set the model source to external and select one model and do the 

In [1]:
import xesmf as xe
import xarray as xr
import os
import tqdm

import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns

import cartopy.crs as ccrs
from scipy.stats import linregress
from cartopy.util import add_cyclic_point
import cartopy.feature as cfeature  

import glob
import tqdm

In [2]:
from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

%load_ext autoreload
%autoreload 2

In [46]:
ccsm='/home/ldap-server/draco/cleaned_data/final/CCSM_d18O_851_1850.nc'
cesm='/home/ldap-server/draco/cleaned_data/final/CESM_d18O_850_1850.nc'
echam='/home/ldap-server/draco/cleaned_data/final/ECHAM5_d18O_850_1849.nc'
giss='/home/ldap-server/draco/cleaned_data/final/GISS_d18O_850_1849.nc'
ihad='/home/ldap-server/draco/cleaned_data/final/iHADCM3_d18O_801_1952.nc'

In [6]:
print('ccsm:',xr.open_dataset(ccsm)['d18O'].shape)
print('cesm:',xr.open_dataset(cesm)['d18O'].shape)
print('echam:',xr.open_dataset(echam)['d18O'].shape)
print('giss:',xr.open_dataset(giss)['d18O'].shape)
print('ihad:',xr.open_dataset(ihad)['d18O'].shape)

ccsm: (12000, 94, 192)
cesm: (12000, 96, 144)
echam: (12000, 48, 96)
giss: (12000, 90, 144)
ihad: (13778, 73, 96)


In [12]:
paths={
'cesm':{'d18O':'/home/ldap-server/draco/cleaned_data/final/CESM_d18O_850_1850.nc',
'evap':'/home/ldap-server/draco/cleaned_data/final/CESM_evap_850_1850.nc',
'prec':'/home/ldap-server/draco/cleaned_data/final/CESM_prec_850_1850.nc',
'tsurf':'/home/ldap-server/draco/cleaned_data/final/CESM_tsurf_850_1850.nc',
'oro':'/home/ldap-server/draco/orographies/final/icesm_oro.nc'
       },
'ccsm':{'d18O':'/home/ldap-server/draco/cleaned_data/final/CCSM_d18O_851_1850.nc',
'evap':'/home/ldap-server/draco/cleaned_data/final/CCSM_evap_851_1850.nc',
'prec':'/home/ldap-server/draco/cleaned_data/final/CCSM_prec_851_1850.nc',
'tsurf':'/home/ldap-server/draco/cleaned_data/final/CCSM_tsurf_851_1850.nc',
'oro':'/home/ldap-server/draco/orographies/final/ccsm_oro.nc'
       },    
'echam':{'d18O':'/home/ldap-server/draco/cleaned_data/final/ECHAM5_d18O_850_1849.nc',
'evap':'/home/ldap-server/draco/cleaned_data/final/ECHAM5_evap_850_1849.nc',
'prec':'/home/ldap-server/draco/cleaned_data/final/ECHAM5_prec_850_1849.nc',
'tsurf':'/home/ldap-server/draco/cleaned_data/final/ECHAM5_tsurf_850_1849.nc',
'oro':'/home/ldap-server/draco/orographies/final/echam_oro.nc'
       },
'giss':{'d18O':'/home/ldap-server/draco/cleaned_data/final/GISS_d18O_850_1849.nc',
'evap':'/home/ldap-server/draco/cleaned_data/final/GISS_evap_850_1849.nc',
'prec':'/home/ldap-server/draco/cleaned_data/final/GISS_prec_850_1849.nc',
'tsurf':'/home/ldap-server/draco/cleaned_data/final/GISS_tsurf_850_1849.nc',
'slp': '/home/ldap-server/draco/cleaned_data/final/GISS_slp_850_1849.nc',
'oro':'/home/ldap-server/draco/orographies/final/giss_oro.nc'
       },
'ihad':{'d18O':'/home/ldap-server/draco/cleaned_data/final/iHADCM3_d18O_801_1952.nc',
'evap':'/home/ldap-server/draco/cleaned_data/final/iHADCM3_evap_801_1952.nc',
'prec':'/home/ldap-server/draco/cleaned_data/final/iHADCM3_prec_801_1952.nc',
'tsurf':'/home/ldap-server/draco/cleaned_data/final/iHADCM3_tsurf_801_1952.nc',
'slp':'/home/ldap-server/draco/cleaned_data/final/iHADCM3_slp_801_1952.nc',
'oro':'/home/ldap-server/draco/orographies/final/hadcm3_oro.nc'
       }
}   
    

In [3]:
for i,p in paths['ccsm'].items():
    print(i)
    print(xr.open_dataset(p))
    print('')

d18O
<xarray.Dataset>
Dimensions:      (time: 12000, lon: 192, lat: 94)
Coordinates:
  * time         (time) object 0851-01-31 00:00:00 ... 1850-12-31 00:00:00
  * lon          (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
  * lat          (lat) float64 88.54 86.65 84.75 82.85 ... -84.75 -86.65 -88.54
Data variables:
    d18O         (time, lat, lon) float32 ...
    spatial_ref  int64 ...
Attributes:
    CDI:          Climate Data Interface version ?? (http://mpimet.mpg.de/cdi)
    Conventions:  COARDS
    history:      Mon Nov 15 18:42:31 2021: cdo setname,dO18 CCSM_d18O_851_18...
    calendar:     standard
    comments:     file created by grads using lats4d available from http://da...
    model:        geos/das
    center:       gsfc
    CDO:          Climate Data Operators version 1.9.3 (http://mpimet.mpg.de/...

evap
<xarray.Dataset>
Dimensions:  (time: 12000, lon: 192, lat: 94)
Coordinates:
  * time     (time) object 0851-01-31 00:00:00 ... 1850-12-31 00:00:00
  

In [None]:
### METHOD
#create regridder once
#run a loop over all echam files and the orofile
#regrid
#save + put result into a dictionary such that it can be inspected

#do this for each model except ccsm.

#repeat the same thing the other way round for ECHAM


In [10]:
xr.open_dataset('/home/ldap-server/draco/cleaned_data/final/CCSM_d18O_851_1850.nc')['d18O'].attrs

{'long_name': 'd18O in precipitation',
 'comments': 'Unknown1 variable comment',
 'grid_name': 'grid-1',
 'level_description': 'Earth surface',
 'time_statistic': 'instantaneous',
 'units': 'per mil',
 'grid_mapping': 'spatial_ref'}

In [13]:
#upsampling!

ccsm=xr.open_dataset('/home/ldap-server/draco/cleaned_data/final/CCSM_d18O_851_1850.nc')

new_folder='/home/ldap-server/draco/cleaned_data/final/upsampled/'
new_folder_oro='/home/ldap-server/draco/orographies/final/upsampled/'

#try except only because I had to rerun the things several times and it doesn't let me overwrite the 
#old files somehow


for model, p in tqdm.tqdm(paths.items()):
    if model!='ccsm':
        current_mod=xr.open_dataset(p['d18O'])
        regridder=xe.Regridder(current_mod,ccsm,'bilinear',periodic=True) #ds_in, ds_out        
        for var,path in p.items():
            file=xr.open_dataset(path)
            try:
                regrided=regridder(file)
                #regrided=regrided.to_dataset(name=var)

                if var!='oro':
                    #useful to carry the original units
                    regrided[var].attrs=file[var].attrs
                    tail=os.path.split(path)[1]
                    new_path=new_folder+tail
                    regrided.to_netcdf(new_path)
                else:
                    tail=os.path.split(path)[1]
                    new_path=new_folder_oro+tail
                    regrided.to_netcdf(new_path)
            except:
                print('Check model: ',model, var)

#copy ccsm files into new folder directly


 20%|██        | 1/5 [01:12<04:48, 72.15s/it]

Check model:  cesm oro


100%|██████████| 5/5 [03:54<00:00, 46.86s/it]


In [22]:

#print dimensions
for p in glob.glob('/home/ldap-server/draco/cleaned_data/final/upsampled/*'):
    print(p)
    print(xr.open_dataset(p).dims)
    print('')

/home/ldap-server/draco/cleaned_data/final/upsampled/iHADCM3_slp_801_1952.nc
Frozen({'time': 13778, 'lat': 94, 'lon': 192})

/home/ldap-server/draco/cleaned_data/final/upsampled/iHADCM3_d18O_801_1952.nc
Frozen({'time': 13778, 'lat': 94, 'lon': 192})

/home/ldap-server/draco/cleaned_data/final/upsampled/CESM_prec_850_1850.nc
Frozen({'time': 12000, 'lat': 94, 'lon': 192})

/home/ldap-server/draco/cleaned_data/final/upsampled/ECHAM5_d18O_850_1849.nc
Frozen({'time': 12000, 'lat': 94, 'lon': 192})

/home/ldap-server/draco/cleaned_data/final/upsampled/CESM_evap_850_1850.nc
Frozen({'time': 12000, 'lat': 94, 'lon': 192})

/home/ldap-server/draco/cleaned_data/final/upsampled/CESM_d18O_850_1850.nc
Frozen({'time': 12000, 'lat': 94, 'lon': 192})

/home/ldap-server/draco/cleaned_data/final/upsampled/GISS_slp_850_1849.nc
Frozen({'time': 12000, 'lat': 94, 'lon': 192})

/home/ldap-server/draco/cleaned_data/final/upsampled/CCSM_d18O_851_1850.nc
Frozen({'time': 12000, 'lon': 192, 'lat': 94})

/home/ldap

In [19]:
#print dimensions
#print(glob.glob("/home/ldap-server/draco/orographies/final/upsampled/*"))
for p in glob.glob("/home/ldap-server/draco/orographies/final/upsampled/*"):
    print(p)
    print(xr.open_dataset(p).dims)
    print('')    

/home/ldap-server/draco/orographies/final/upsampled/giss_oro.nc
Frozen({'lat': 94, 'lon': 192})

/home/ldap-server/draco/orographies/final/upsampled/ccsm_oro.nc
Frozen({'lon': 192, 'lat': 94})

/home/ldap-server/draco/orographies/final/upsampled/echam_oro.nc
Frozen({'lat': 94, 'lon': 192})

/home/ldap-server/draco/orographies/final/upsampled/icesm_oro.nc
Frozen({'lat': 48, 'lon': 96})

/home/ldap-server/draco/orographies/final/upsampled/hadcm3_oro.nc
Frozen({'lat': 94, 'lon': 192})



In [14]:
#downsampling!
echam=xr.open_dataset('/home/ldap-server/draco/cleaned_data/final/ECHAM5_d18O_850_1849.nc')

new_folder='/home/ldap-server/draco/cleaned_data/final/downsampled/'
new_folder_oro='/home/ldap-server/draco/orographies/final/downsampled/'

for model, p in tqdm.tqdm(paths.items()):
    if model!='echam':
        current_mod=xr.open_dataset(p['d18O'])
        regridder=xe.Regridder(current_mod,echam,'bilinear',periodic=True) #ds_in, ds_out        
        for var,path in p.items():
            file=xr.open_dataset(path)
            regrided=regridder(file)
            #regrided=regrided.to_dataset(name=var)
            if var!='oro':
                regrided[var].attrs=file[var].attrs
                tail=os.path.split(path)[1]
                new_path=new_folder+tail
                regrided.to_netcdf(new_path)
            else:
                tail=os.path.split(path)[1]
                new_path=new_folder_oro+tail
                regrided.to_netcdf(new_path)
#copy echam files into new folder directly

100%|██████████| 5/5 [00:59<00:00, 11.92s/it]


In [24]:
#print dimensions
for p in glob.glob('/home/ldap-server/draco/cleaned_data/final/downsampled/*'):
    print(p)
    print(xr.open_dataset(p).dims)
    print('')

/home/ldap-server/draco/cleaned_data/final/downsampled/iHADCM3_slp_801_1952.nc
Frozen({'time': 13778, 'lat': 48, 'lon': 96})

/home/ldap-server/draco/cleaned_data/final/downsampled/iHADCM3_d18O_801_1952.nc
Frozen({'time': 13778, 'lat': 48, 'lon': 96})

/home/ldap-server/draco/cleaned_data/final/downsampled/CESM_prec_850_1850.nc
Frozen({'time': 12000, 'lat': 48, 'lon': 96})

/home/ldap-server/draco/cleaned_data/final/downsampled/ECHAM5_d18O_850_1849.nc
Frozen({'time': 12000, 'lon': 96, 'lat': 48})

/home/ldap-server/draco/cleaned_data/final/downsampled/CESM_evap_850_1850.nc
Frozen({'time': 12000, 'lat': 48, 'lon': 96})

/home/ldap-server/draco/cleaned_data/final/downsampled/CESM_d18O_850_1850.nc
Frozen({'time': 12000, 'lat': 48, 'lon': 96})

/home/ldap-server/draco/cleaned_data/final/downsampled/GISS_slp_850_1849.nc
Frozen({'time': 12000, 'lat': 48, 'lon': 96})

/home/ldap-server/draco/cleaned_data/final/downsampled/CCSM_d18O_851_1850.nc
Frozen({'time': 12000, 'lat': 48, 'lon': 96})

/h

In [21]:
#print dimensions

#print(glob.glob("/home/ldap-server/draco/orographies/final/upsampled/*"))
for p in glob.glob("/home/ldap-server/draco/orographies/final/downsampled/*"):
    print(p)
    print(xr.open_dataset(p).dims)
    print('')    

/home/ldap-server/draco/orographies/final/downsampled/giss_oro.nc
Frozen({'lat': 48, 'lon': 96})

/home/ldap-server/draco/orographies/final/downsampled/ccsm_oro.nc
Frozen({'lat': 48, 'lon': 96})

/home/ldap-server/draco/orographies/final/downsampled/echam_oro.nc
Frozen({'lat': 48, 'lon': 96})

/home/ldap-server/draco/orographies/final/downsampled/icesm_oro.nc
Frozen({'lat': 48, 'lon': 96})

/home/ldap-server/draco/orographies/final/downsampled/hadcm3_oro.nc
Frozen({'lat': 48, 'lon': 96})

