# Example with ERA5 high-resolution (~0.25deg) monthly means


# Table of Contents
<ul>
<li><a href="#introduction">1. Introduction</a></li>
<li><a href="#data_wrangling">2. Data Wrangling</a></li>
<li><a href="#exploratory">3. Exploratory Data Analysis</a></li>
<li><a href="#conclusion">4. Conclusion</a></li>
<li><a href="#references">5. References</a></li>
</ul>

# 1. Introduction <a id='introduction'></a>
Cloud feedbacks are a major contributor to the spread of climate sensitivity in global climate models (GCMs) [Zelinka et al. (2020)](https://doi-org.ezproxy.uio.no/10.1029/2019GL085782). Among the most poorly understood cloud feedbacks is the one associated with the cloud phase, which is expected to be modified with climate change [Bjordal et al. (2020)](https://doi-org.ezproxy.uio.no/10.1038/s41561-020-00649-1). Cloud phase bias, in addition, has significant implications for the simulation of radiative properties and glacier and ice sheet mass balances in climate models. 

In this context, this work aims to expand our knowledge on how the representation of the cloud phase affects snow formation in GCMs. Better understanding this aspect is necessary to develop climate models further and improve future climate predictions. 

* Load ERA5 data through 
* Regridd the ERA5 variables to the exact horizontal resolution with [`xesmf`](https://xesmf.readthedocs.io/en/latest/)
* Calculate and plot the seasonal mean of the variable

# 2. Data Wrangling <a id='data_wrangling'></a>

This study will compare surface snowfall, ice, and liquid water content from the Coupled Model Intercomparison Project Phase 6 ([CMIP6](https://esgf-node.llnl.gov/projects/cmip6/)) climate models (accessed through [Pangeo](https://pangeo.io/)) to the European Centre for Medium-Range Weather Forecast Re-Analysis 5 ([ERA5](https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5)) data from **1985 to 2014**. We conduct statistical analysis at the annual and seasonal timescales to determine the biases in cloud phase and precipitation (liquid and solid) in the CMIP6 models and their potential connection between them. The CMIP6 data analysis can be found in the [Jupyter Notebook for CMIP6](../cmip/CMIP6_hr_1985-2014.ipynb).

- Time period: 1985 to 2014
- horizonal resolution: ~0.25deg
- time resolution: monthly atmospheric data 
- Variables:
  
| shortname     |             Long name                   |      Units    |  levels |
| ------------- |:---------------------------------------:| -------------:|--------:|
| sf            |    snowfall                             |[m of water eq]| surface |
| msr           |    mean_snowfall_rate                   |[kg m-2 s-1]   | surface |
| cswc          |    specific_snow_water_content          | [kg kg-1]     |    pl   |
| clwc          |    specific_cloud_liquid_water_content  |   [kg kg-1]   |    pl   |
| clic          |    specific_cloud_ice_water_content     | [kg kg-1]     |    pl   |
| t             |    temperature                          |  [K]          |    pl   |
| 2t            |    2 metre temperature                  |  [K]          | surface |
| tclw          |   Total column cloud liquid water       |  [kg m-2]     | single  |
| tciw          |   Total column cloud ice water          |  [kg m-2]     | single  |
| tp            |   Total precipitation                   |  [m]          | surface |


## Import python packages
- `Python` environment requirements: file [globalsnow.yml](../globalsnow.yml) 
- load `python` packages from [imports.py](../utils/imports.py)
- load `functions` from [functions.py](../utils/functions.py)

In [18]:
# supress warnings
import warnings
warnings.filterwarnings('ignore') # don't output warnings

# import packages
import sys
sys.path.append('/uio/kant/geo-metos-u1/franzihe/Documents/Python/globalsnow/eosc-nordic-climate-demonstrator/work/utils')

from imports import(xr, intake, ccrs, cy, plt, glob, cm, fct)
xr.set_options(display_style='html')

<xarray.core.options.set_options at 0x7f060b6a1100>

In [19]:
# reload imports
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Open ERA5 variables
Get the data requried for the analysis. Beforehand we downloaded the monthly averaged data on single levels and pressure levels via the Climate Data Store (CDS) infrastructure. The github repository [Download ERA5](https://github.com/franzihe/download_ERA5) gives examples on how to download the data from the CDS. For the comparison between CMIP6 models and ERA5 values we used the Jupyter Notebooks [download_Amon_single_level](https://github.com/franzihe/download_ERA5/blob/main/download_Amon_single_level.ipynb) and [download_Amon_pressure_level](https://github.com/franzihe/download_ERA5/blob/main/download_Amon_pressure_level.ipynb). Both, download the monthly means for the variables mentioned above between 1985 and 2014.

> **_NOTE:_** To download from CDS a user has to have a CDS user account, if one does not have one, please create the account [here](https://cds.climate.copernicus.eu/user/register).


The ERA5 0.25deg data is located in the folder `input/ERA5/monthly_means/0.25deg`.



In [20]:
input_data = '/scratch/franzihe/input'
output_data = '/scratch/franzihe/output'
era_in = '{}/ERA5/monthly_means/0.25deg'.format(input_data)
era_out = '{}/ERA5/monthly_means/1deg'.format(output_data)

In [21]:
variable_id=[
            '2t',
            'clic',
            'clwc',
            'cswc',
            'msr',
            'sf',
            't', 
            'tciw',
            'tclw',
            'tp'
             ]

## Find liquid only, ice only, and mixed-phase clouds

In [48]:
 # Input data from ERA5 with a resolution of 0.25x0.25deg to be regridded
era_file_in = glob('{}/*_Amon_ERA5_*12.nc'.format(era_in, ))        
ds_era = xr.open_mfdataset(era_file_in)
ds_era = ds_era.sel(time = ds_era.time.dt.year.isin(year_range)).squeeze()


In [50]:
ds_era['tp'].attrs

{'units': 'm', 'long_name': 'Total precipitation'}

In [51]:
# The hydrological parameters have effective units of "m of water per day" and so they should be multiplied by 1000 to convert to kgm-2day-1 or mmday-1.
ds_all['tp'] = ds_all['tp']*1000
ds_all['tp'].attrs = {'units': 'mm day-1', 'long_name': 'Total precipitation'}

In [52]:
filter = ds_era['tp']>= 0.01  # pidx

In [76]:
ds_era = ds_era.where(filter)

In [None]:
for lat in ds_era.latitude.values:
    for lon in ds_era.longitude.values:
        # ds_era['clic'].sel
        

In [None]:
lat = 

In [None]:
ds_era['clic'].sum(keep_attrs=True)

In [None]:
# loop through each latitude

# loop through each longitude

# find time, where precipitation is >= 0.01 mm
ds_era

# find precipitation from liquid only clouds 
    # LWC amount of liquid-only clouds
    # number of liquid-only events
    
# remove water only from data

# find precipitation from ice only clouds
    # IWC amount of ice-only clouds
    # number of ice-only events
    
# determine homogeneous or heteorogeneous freezing for ice-only clouds
    # get temperature where only IWC exists

## Regrid ERA5 data to common NorESM2-MM grid <a id='regrid_hz'></a>

We want to conduct statistical analysis at the annual and seasonal timescales to determine the biases in cloud phase and precipitation (liquid and solid) for the CMIP6 models in comparison to ERA5. 

The ERA5 high resolution has approximately a nominal resolution of 0.25deg. The ERA5 data has a nominal resolution of 0.25$^{o}$ and has to be regridded to the same horizontal resolution as the NorESM2-MM. Hence we will make use of the python package `xesmf` and [decreasing resolution](https://xesmf.readthedocs.io/en/latest/notebooks/Compare_algorithms.html#Decreasing-resolution), [Limitations and warnings](https://xesmf.readthedocs.io/en/latest/notebooks/Masking.html?highlight=conservative#Limitations-and-warnings).  

$\rightarrow$ Define NorESM2-MM as the reference grid `ds_out`.

Save each regridded model to a `netcdf` datasets between 1985 an 2014, locally. 

In [27]:
starty = 1985; endy = 2014
year_range = range(starty, endy+1)

# Read in the output grid from NorESM
cmip_file = '/scratch/franzihe/input/cmip6_hist/1deg/grid_NorESM2-MM.nc'
ds_out = xr.open_dataset(cmip_file)

counter = 0
for var_id in variable_id:
    # select where data should be saved
    filename = '{}_Amon_1deg_{}01_{}12.nc'.format(var_id, starty, endy)
    era_file_out = era_out + '/Amon/' + filename
    files = glob(era_file_out)
            
    
    
    # Input data from ERA5 with a resolution of 0.25x0.25deg to be regridded
    era_file_in = glob('{}/{}_Amon_ERA5_*12.nc'.format(era_in, var_id,))        
    ds_in = xr.open_mfdataset(era_file_in)
    ds_in = ds_in.sel(time = ds_in.time.dt.year.isin(year_range)).squeeze()
            
    # Regrid data
    ds_in_regrid = fct.regrid_data(ds_in, ds_out)
                
    # Shift the longitude from 0-->360 to -180-->180 and sort by longitude and time
    ds_in_regrid = ds_in_regrid.assign_coords(lon=(((ds_in_regrid.lon + 180) % 360) - 180)).sortby('lon').sortby('time')
    
    if var_id == '2t':
        ds_in_regrid = ds_in_regrid.rename_vars({'t2m':var_id}, )
    if var_id == 'clic':
        ds_in_regrid = ds_in_regrid.rename_vars({'ciwc':var_id})
                
    if era_file_out in files:
        print('{} is downloaded'.format(era_file_out))
        counter += 1
        print('Have regridded in total: {:} files'.format(str(counter))) 
    else:           
        # Save to netcdf file
        ds_in_regrid.to_netcdf(era_file_out)
        ds_in.close(); ds_out.close()
        print('file written: {}'.format(era_file_out))
        
    # merge all variables
    ds_all = xr.merge([ds_all, ds_in_regrid[var_id]])

t2m True
/scratch/franzihe/output/ERA5/monthly_means/1deg/Amon/2t_Amon_1deg_198501_201412.nc is downloaded
Have regridded in total: 1 files
ciwc True
/scratch/franzihe/output/ERA5/monthly_means/1deg/Amon/clic_Amon_1deg_198501_201412.nc is downloaded
Have regridded in total: 2 files
clwc True
/scratch/franzihe/output/ERA5/monthly_means/1deg/Amon/clwc_Amon_1deg_198501_201412.nc is downloaded
Have regridded in total: 3 files
cswc True
/scratch/franzihe/output/ERA5/monthly_means/1deg/Amon/cswc_Amon_1deg_198501_201412.nc is downloaded
Have regridded in total: 4 files
msr True
/scratch/franzihe/output/ERA5/monthly_means/1deg/Amon/msr_Amon_1deg_198501_201412.nc is downloaded
Have regridded in total: 5 files
sf True
/scratch/franzihe/output/ERA5/monthly_means/1deg/Amon/sf_Amon_1deg_198501_201412.nc is downloaded
Have regridded in total: 6 files
t True
/scratch/franzihe/output/ERA5/monthly_means/1deg/Amon/t_Amon_1deg_198501_201412.nc is downloaded
Have regridded in total: 7 files
tciw True
/scr

## Assign attributes matching the CMIP6 data

In [33]:
for var_id in variable_id:
    if var_id == 'sf':
        ds_all[var_id] = ds_all[var_id]*1000
        ds_all[var_id].attrs = {'units': 'mm day-1', 'long_name': 'Snowfall',}
        
    if var_id == 'tciw':
        ds_all[var_id] = ds_all[var_id]*1000
        ds_all[var_id].attrs = {'units': 'g m-2', 'long_name': 'Total column cloud ice water'}
        
    if var_id == 'tclw':
        ds_all[var_id] = ds_all[var_id]*1000
        ds_all[var_id].attrs = {'units': 'g m-2', 'long_name': 'Total column cloud liquid water'}
        
    if var_id == 'tp':
        # The hydrological parameters have effective units of "m of water per day" and so they should be multiplied by 1000 to convert to kgm-2day-1 or mmday-1.
        ds_all[var_id] = ds_all[var_id]*1000
        ds_all[var_id].attrs = {'units': 'mm day-1', 'long_name': 'Total precipitation'}
    # if var_id == 'clwc':

# Create seasonal mean/spread of all regridded ERA5

...and plot seasonal mean of each individual model

In [None]:
for var_id in variable_id:
    ds_all[var_id+'_season_mean'] = ds_all[var_id].groupby('time.season').mean('time', keep_attrs = True)

    ds_all[var_id+'_season_std'] = ds_all[var_id].groupby('time.season').std('time', keep_attrs = True)

In [None]:
for var_id in variable_id:
    if var_id == 'sf':
        label='Snowfall (mm$\,$day$^{-1}$)'
        vmin = 0
        vmax = 2.5
        levels = 25
        add_colorbar=False
        vmin_std = vmin
        vmax_std= 0.6
    if var_id == 'tp':  
        label='Total precipitation (mm$\,$day$^{-1}$)' 
        vmin = 0
        vmax=9
        levels = 90
        add_colorbar=False
        vmin_std =vmin
        vmax_std = 2.4  
    if var_id == 'tciw':
        label='Ice Water Path (g$\,$m$^{-2}$)'
        vmin = 0
        vmax=100
        levels = 25
        add_colorbar = False
        vmin_std =vmin
        vmax_std = 20
    if var_id == 'tclw':
        label='Liquid Water Path (g$\,$m$^{-2}$)'
        vmin = 0
        vmax=100
        levels = 25
        add_colorbar = False
        vmin_std =vmin
        vmax_std = 20
    if var_id == '2t':
        label='2-m temperature (K)'
        vmin = 246
        vmax=300
        levels = 40
        add_colorbar = False
        vmin_std = 0
        vmax_std=6

    # Plot seasonal mean
    fig, axs, im = fct.plt_spatial_seasonal_mean(ds_all[var_id+'_season_mean'], vmin, vmax, levels, add_colorbar=False, title='ERA5 - high resolution (1985 - 2014)')

    fig.subplots_adjust(right=0.8)
    cbar_ax = fig.add_axes([1, 0.15, 0.025, 0.7])
    cb = fig.colorbar(im, cax=cbar_ax, orientation="vertical", fraction=0.046, pad=0.04)
    cb.set_label(label='MEAN - {}'.format(label), weight='bold')

    plt.tight_layout()

    # save figure to png
    figdir = '/uio/kant/geo-metos-u1/franzihe/Documents/Figures/ERA5/'
    figname = '{}_season_1deg_{}_{}.png'.format(var_id, starty, endy)
    plt.savefig(figdir + figname, format = 'png', bbox_inches = 'tight', transparent = False)

    # Plot seasonal mean and std
    fig, axs, im = fct.plt_spatial_seasonal_mean(ds_all[var_id+'_season_mean'], vmin, vmax, levels, add_colorbar=False, title='ERA5 - high resolution (1985 - 2014)')

    fig.subplots_adjust(right=0.8)
    cbar_ax = fig.add_axes([1, 0.15, 0.025, 0.7])
    cb = fig.colorbar(im, cax=cbar_ax, orientation="vertical", fraction=0.046, pad=0.04)
    cb.set_label(label='MEAN - {}'.format(label), weight='bold')

    for ax, i in zip(axs, ds_all[var_id+'_season_std'].season):
        sm = ds_all[var_id+'_season_std'].sel(season=i).plot.contour(ax=ax, transform=ccrs.PlateCarree(), 
                                                                        robust=True,
                                                                        vmin = vmin_std, vmax = vmax_std,
                                                                        levels = 6,
                                                                        cmap=cm.lajolla,
                                                                        add_colorbar=False)
        
    cbar_ax = fig.add_axes([1.10, 0.15, 0.025, 0.7])
    sb = fig.colorbar(sm, cax=cbar_ax, orientation="vertical", fraction=0.046, pad=0.04)
    sb.set_label(label='STD - {}'.format(label), weight='bold')


    plt.tight_layout()


In [None]:
# save to netcdf
filename = 'all_1deg_{}01_{}12.nc'.format(starty, endy)
nc_out = era_out + '/' + filename
files = glob(nc_out)


counter = 0 
# Save to netcdf file
if nc_out in files:
    print('{} is downloaded'.format(nc_out))
    counter += 1
    print('Have saved in total: {:} files'.format(str(counter)))
else:
    ds_all.to_netcdf(nc_out)
    print('file written: .{}'.format(nc_out))


Find liquid, ice only, mixed phase cloud. Plot histogram of observed snowfall amounts.