# Copyright Netherlands eScience Center <br>
** Function     : Packing netCDF for monthly sea ice concentration fields from NOAA/NSIDC (passive microwave)** <br>
** Author       : Yang Liu ** <br>
** First Built  : 2020.07.23 ** <br>
** Last Update  : 2020.07.23 ** <br>
Description     : This notebook aims to pack the sea ice concentration fields from NOAA/NSIDC.<br>
Return Values   : netCDF4 <br>
Caveat          : The data is further processed by certain algorithms from NASA Goddard Center. The dataset consists of mulitple variables, but we only use the Merged GSFC NASA Team/Bootstrap daily sea ice concentrations from 1978 through most recent processing (variable name:"goddard_merged_seaice_conc"). It combines the results from two algorithms by NASA The details about these two algorithms are provided via the link:<br>
https://nsidc.org/support/faq/nasa-team-vs-bootstrap-algorithm<br>

!!! It should be noticed that the data before 1987 is saved bi-daily and afterwards daily. We process them seperately and this script only works with structured data after 1987. For the missing dates, we simply use the historical mean.<br>

More information (incl. description of variables) is available through:<br>
https://nsidc.org/data/g02202<br>

The projection center coordinate (refernce for ygrid and xgrid) info is included in the netCDF file:<br>
Latitude: North pole (90 deg)<br>
Longitude: -45 deg w.r.t. 0 deg (greenwitch)<br>

The coordinate values (latitude & longitude) are the exact values.<br>

We also apply nearest neighbour interpolation to the ERA-Interim grid. We use the iris module.scipy function. An alternative can be the scipy module  "scipy.interpolate.NearestNDInterpolator".<br>

Reference
Meier, W. N., F. Fetterer, M. Savoie, S. Mallory, R. Duerr, and J. Stroeve. 2017. NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration, Version 3. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center. doi: https://doi.org/10.7265/N59P2ZTG.<br>
Peng, G., W. N. Meier, D. Scott, and M. Savoie. 2013. A long-term and reproducible passive microwave sea ice concentration data record for climate studies and monitoring, Earth Syst. Sci. Data. 5. 311-318. https://doi.org/10.5194/essd-5-311-2013<br>

In [1]:
import numpy as np
import scipy
from netCDF4 import Dataset
import os
import glob
import iris
import cartopy
import cartopy.crs as ccrs

In [9]:
################################   Input zone  ######################################
# specify starting and ending time
start_year = 1989
end_year = 2018
# specify data path
# SICpm fields
datapath = '/home/ESLT0068/WorkFlow/Core_Database_AMET_OMET_reanalysis/SIC_passive_microwave_NASA/daily'
datapath_coordinate = '/home/ESLT0068/WorkFlow/Core_Database_DeepLearn/ERA-Interim'
# sample
benchmark_key = Dataset(os.path.join(datapath,"2004","seaice_conc_daily_nh_f13_20040112_v03r01.nc"))
coordinate_key = Dataset(os.path.join(datapath_coordinate,"sic_weekly_erai_1979_2017.nc"))
# specify output path for figures
output_path = '/home/ESLT0068/WorkFlow/Core_Database_AMET_OMET_reanalysis/SIC_passive_microwave_NASA'
####################################################################################

In [3]:
#########################   Basic dimensions of NSIDC sic  #########################
latitude = benchmark_key.variables['latitude'][:]
longitude = benchmark_key.variables['longitude'][:]
ygrid = benchmark_key.variables['ygrid'][:]
xgrid = benchmark_key.variables['xgrid'][:]
sic_sample = benchmark_key.variables['goddard_merged_seaice_conc'][0,:,:]
print(sic_sample.shape)
print(latitude.shape)
print(longitude.shape)
print(ygrid.shape)
print(xgrid.shape)
print(np.amax(latitude))
print(np.amin(latitude))
print(np.amax(longitude))
print(np.amin(longitude))
print(np.amax(sic_sample))
print(np.amin(sic_sample))
####################################################################################

(448, 304)
(448, 304)
(448, 304)
(448,)
(304,)
89.8368159996152
31.1026717524309
179.813975395493
-180.0
1.0
0.0


In [4]:
#######################   Target coordinate for interpolation   ######################
latitude_ERAI = coordinate_key.variables['latitude'][:]
longitude_ERAI = coordinate_key.variables['longitude'][:]
#value_sample_coordinate = coordinate_key.variables['sic'][0,0,:,:]
#print(lat_coordinate)
#print(lon_coordinate)
#print(value_sample_coordinate[:3,:])
#reshape_test = np.reshape(value_sample_coordinate,[len(lat_coordinate)*len(lon_coordinate)], order='F')
#print(reshape_test[:100])
#reshape_back = np.reshape(reshape_test,[len(lat_coordinate),len(lon_coordinate)], order='F')
#print(reshape_back[:3,:])
######################################################################################

In [5]:
def var_key_retrieve(datapath, year, month, day):
    # get the path to each datasets
    print ("Start retrieving datasets {} (y) - {} (m) - {}(d)".format(year,namelist_month[month-1],namelist_day[day]))
    # Sea Ice Concentration
    datapath_key = glob.glob(os.path.join(datapath,"{}".format(year),"seaice_conc_daily_nh_*_{0}{1}{2}_v03r01.nc".format(year,namelist_month[month-1],namelist_day[day])))[0]
    # get the variable keys
    var_key = Dataset(datapath_key)

    print ("Retrieving datasets successfully and return the variable key!")
    return var_key

In [6]:
def interpolation(field, latitude, longitude, latitude_new, longitude_new):
    '''
    The input field should be 2D array include information on latitude and longitude,
    with latitude and longitude the unstructured grid. The target coordinate must be structured grid
    and they are represented as 1D array latitude_new and longitude_new.
    '''
    # basic dimensions for cube in iris
    lat_iris = iris.coords.AuxCoord(latitude, standard_name='latitude', long_name='latitude',
                                    var_name='lat', units='degrees')
    lon_iris = iris.coords.AuxCoord(longitude, standard_name='longitude', long_name='longitude',
                                    var_name='lon', units='degrees')
    # assembly the cube
    cube_iris = iris.cube.Cube(field, long_name='unstructured field', var_name='field', 
                                units='1', aux_coords_and_dims=[(lat_iris, (0,1)), (lon_iris, (0,1))])
    coord_sys = iris.coord_systems.GeogCS(iris.fileformats.pp.EARTH_RADIUS)
    cube_iris.coord('latitude').coord_system = coord_sys
    cube_iris.coord('longitude').coord_system = coord_sys
    projection = ccrs.PlateCarree()
    lat_grid = latitude_new
    lon_grid = longitude_new
    lat_aux = iris.coords.DimCoord(lat_grid, standard_name='latitude',
                                    units='degrees_north', coord_system='GeogCS')
    lon_aux = iris.coords.DimCoord(lon_grid, standard_name='longitude',
                                    units='degrees_east', coord_system='GeogCS')
    dummy_data = np.zeros((len(lat_grid), len(lon_grid)))
    cube_tar = iris.cube.Cube(dummy_data,dim_coords_and_dims=[(lat_aux, 0), (lon_aux, 1)])
    # create the coordinate system for the target cube
    cube_tar.coord('latitude').guess_bounds()
    cube_tar.coord('longitude').guess_bounds()
    cube_tar.coord('latitude').coord_system = coord_sys
    cube_tar.coord('longitude').coord_system = coord_sys
    # create a weight matrix for regridding
    weights = np.ones(cube_iris.shape)
    # get regridder from given cubes
    base = iris.analysis.UnstructuredNearest()
    regridder = base.regridder(cube_iris,cube_tar)
    # Transform cube to target projection
    cube_regrid = regridder(cube_iris)
    regrid_sic = cube_regrid.data
    regrid_sic[regrid_sic>1.000001] = 0
    
    return regrid_sic

In [7]:
def create_netcdf_point(pool_sic, output_path):
    print ('*******************************************************************')
    print ('*********************** create netcdf file*************************')
    print ('*******************************************************************')
    # wrap the datasets into netcdf file
    # 'NETCDF3_CLASSIC', 'NETCDF3_64BIT', 'NETCDF4_CLASSIC', and 'NETCDF4'
    data_wrap = Dataset(os.path.join(output_path, 'noaa_nsidc_weekly_regress_1989_2018_sic_passive_microwave.nc'),'w',format = 'NETCDF4')
    # create dimensions for netcdf data
    year_wrap_dim = data_wrap.createDimension('year',Dim_year)
    week_wrap_dim = data_wrap.createDimension('week', Dim_week)
    lat_wrap_dim = data_wrap.createDimension('latitude',Dim_latitude)
    lon_wrap_dim = data_wrap.createDimension('longitude',Dim_longitude)
    # create coordinate variables for 3-dimensions
    year_wrap_var = data_wrap.createVariable('year',np.int32,('year',))
    week_wrap_var = data_wrap.createVariable('month',np.int32,('week',))
    lat_wrap_var = data_wrap.createVariable('latitude',np.float32,('latitude',))
    lon_wrap_var = data_wrap.createVariable('longitude',np.float32,('longitude',))
    # create the actual 3-d variable
    sic_wrap_var = data_wrap.createVariable('sic',np.float32,('year','week','latitude','longitude'),zlib=True)
    # global attributes
    data_wrap.description = 'Weekly mean sea ice concentration with passive microwave by NOAA/NSIDC'
    # variable attributes
    lat_wrap_var.units = 'degree_north'
    lon_wrap_var.units = 'degree_east'

    sic_wrap_var.units = '1'

    sic_wrap_var.long_name = 'sea ice concentration with passive microwave'

    # writing data
    lat_wrap_var[:] = latitude_ERAI
    lon_wrap_var[:] = longitude_ERAI
    week_wrap_var[:] = index_week
    year_wrap_var[:] = period

    sic_wrap_var[:] = pool_sic

    # close the file
    data_wrap.close()
    print ("Create netcdf file successfully")

In [10]:
if __name__=="__main__":
    ####################################################################
    ######  Create time namelist matrix for variable extraction  #######
    ####################################################################
    # date and time arrangement
    # namelist of month and days for file manipulation
    namelist_month = ['01','02','03','04','05','06','07','08','09','10','11','12']
    namelist_day = ['01','02','03','04','05','06','07','08','09','10',
                    '11','12','13','14','15','16','17','18','19','20',
                    '21','22','23','24','25','26','27','28','29','30',
                    '31']
    # index of months
    index_days_long = np.arange(31)
    index_days_short = np.arange(30)
    index_days_Feb_short = np.arange(28)
    index_days_Feb_long = np.arange(29)
    long_month_list = np.array([1,3,5,7,8,10,12])
    leap_year_list = np.array([1976,1980,1984,1988,1992,1996,2000,2004,2008,2012,2016,2020])
    # index of months
    period = np.arange(start_year,end_year+1,1)
    index_month = np.arange(1,13,1)
    index_week = np.arange(1,49,1)
    ####################################################################
    ######       Extract invariant and calculate constants       #######
    ####################################################################
    # get invariant from benchmark file
    Dim_year = len(period)
    Dim_month = len(index_month)
    Dim_week = len(index_week)
    Dim_latitude = len(latitude_ERAI)
    Dim_longitude = len(longitude_ERAI)
    #############################################
    #####   Create space for stroing data   #####
    #############################################
    # data pool for zonal integral
    pool_sic_weekly_interpolate = np.zeros((Dim_year,Dim_week,Dim_latitude,Dim_longitude),dtype = float)
    # loop for calculation
    for i in period:
        for j in index_month:
            ###################################################################
            ######                   begin the month loop                ######
            ###################################################################
            # determine how many days are there in a month
            if j in long_month_list:
                days = index_days_long
            elif j == 2:
                if i in leap_year_list:
                    days = index_days_Feb_long
                else:
                    days = index_days_Feb_short
            else:
                days = index_days_short 
            pool_sic_daily = np.zeros((len(days),len(ygrid),len(xgrid)),dtype = float)
            for k in days:
                print(i,j,k)
                var_key = var_key_retrieve(datapath, i, j, k)
                pool_sic_daily[k,:,:] = var_key.variables['goddard_merged_seaice_conc'][0,:,:]
            # For the calculation of weekly fields, we assume each month consist of 4 weeks.
            # The first 3 weeks including 7 days. The 4th week contain the rest of the days in that month.
            pool_sic_weekly = np.zeros((4, len(ygrid),len(xgrid)),dtype=float)
            for w in np.arange(4):
                if w < 3:
                    pool_sic_weekly[w,:,:] = np.mean(pool_sic_daily[w*7:w*7+7,:,:],axis=0)
                else:
                    pool_sic_weekly[w,:,:] = np.mean(pool_sic_daily[w*7:,:,:],axis=0)
            # interpolation on the erai grid
            for w in np.arange(4):
                pool_sic_weekly_interpolate[i-start_year,j*4-4+w,:,:] = interpolation(pool_sic_weekly[w,:,:], latitude,
                                                                                      longitude, latitude_ERAI, longitude_ERAI)
    ####################################################################
    ######                 Data Wrapping (NetCDF)                #######
    ####################################################################
    create_netcdf_point(pool_sic_weekly_interpolate, output_path)
    print ('Packing 2D fields of NOAA is complete!!!')
    print ('The output is in sleep, safe and sound!!!')

1989 1 0
Start retrieving datasets 1989 (y) - 01 (m) - 01(d)
Retrieving datasets successfully and return the variable key!
1989 1 1
Start retrieving datasets 1989 (y) - 01 (m) - 02(d)
Retrieving datasets successfully and return the variable key!
1989 1 2
Start retrieving datasets 1989 (y) - 01 (m) - 03(d)
Retrieving datasets successfully and return the variable key!
1989 1 3
Start retrieving datasets 1989 (y) - 01 (m) - 04(d)
Retrieving datasets successfully and return the variable key!
1989 1 4
Start retrieving datasets 1989 (y) - 01 (m) - 05(d)
Retrieving datasets successfully and return the variable key!
1989 1 5
Start retrieving datasets 1989 (y) - 01 (m) - 06(d)
Retrieving datasets successfully and return the variable key!
1989 1 6
Start retrieving datasets 1989 (y) - 01 (m) - 07(d)
Retrieving datasets successfully and return the variable key!
1989 1 7
Start retrieving datasets 1989 (y) - 01 (m) - 08(d)
Retrieving datasets successfully and return the variable key!
1989 1 8
Start r