# Convert Polar Stereographic Files to Lat/Lon Files

This notebook provides a workflow for converting polar stereographic netcdf files to lat/lon coordinates. To get the data into the proper format, we needed to do the following:
 - utilize a template file as a base for modifying netcdf files of the existing mask arrays. The template file includes all proper GIS attributes.
 - python script reads the Geotiff file and creates a netcdf file with the x,y,lat,lon coordinate dimensions and variables

Scripts were originally created by John Truesdale and updated by Teagan King.
 
To Do:
------
 - use the standard ESMF mapping procedure to go from the projected stereographic polar mask grids to our regular gridded data from CESM. Use Steve's python code and John's mapping file commands to do this

Some notes from John on methods for regridding:
-----------------------------------------------
 - used QGIS to assign a standard stereographic coordinate and create a GeoTiff file version of one of the Polar jpg plots
 - Giving a standard location to each jpeg pixel basically consisted of finding similar features at the pixel level (the tip of an island, the most inset part of a promenent bay) between our polar jpegs and a map that is already georeferenced and has known locations for those pixels. If you choose 3-10 pixels in common you can create a linear regression that will map out all the rest of the pixels on your jpeg.
 - Once the GIS application was able to calculate the transform to go from pixel to a standard coordinate system, I saved all that information in a GeoTiff file.
 - The polar projection jpegs on which the climatenet masks are drawn were created from python matplotlib and you can grab the coordinate information from matplotlib; this was checked with QGIS
 - Because the LLNL polar jpegs are a projected coordinate, the underlying unit in a stereographic projection is meters.  The x and y variables on the GeoTiff and converted netcdf file contain meter offsets of every pixel (row,col) of the ar_mask array with respect to one of the standard south pole stereographic coordinate systems.
 -   When you look at the square projected polar image you see that the longitude lines converge at the pole and latitudes are a set of nested circles.  When you are describing this grid in lat/lon coordinates it is known as a curvilinear grid where each pixel (array location) requires a unique lat/lon pair to specify its position on a regular grid. A straight line along any row or column of the jpeg raster (or ar_mask array) will intersect different lat lon values for every pixel.  The coordinate information for our rectangular ar_mask array therefore contains lat and lon variables that are two dimensions and describe the entire grid of 1152x1152 points with unique lat/lon values for each pixel (array location) of ar_mask. The standard netcdf way of denoting a curvilinear coordinate is by creating the dimensions that define the size of the ar_mask array (x,y), adding lat/lon variables that are each dimensioned (x,y) containing the lat/lon coordinates of that point, and finally adding metadata to the ar_mask array noting that the coordinates for this variable are not the dimension variables x,y but the lat/lon variables.

In [245]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import numpy as np
import os
from rasterio.warp import transform
import urllib.request
import xarray as xr
import glob
from netCDF4 import date2num
import datetime as dt
import cftime

### Generate template file (only needs to be done once)

In [2]:
# # Read the data
# input_path = '/glade/work/tking/cgnet/polar_regridding/data-2003-04-29-02-0-copy-sav1.tif'

# da = xr.open_rasterio(input_path)
# yval=da['y']
# ryval=np.flip(yval)
# # Compute the lon/lat coordinates with rasterio.warp.transform
# ny, nx = len(da['y']), len(da['x'])
# x, y = np.meshgrid(da['x'], ryval)
# # Rasterio works with 1D arrays
# lon, lat = transform(da.crs, {'init': 'EPSG:4326'},
#                      x.flatten(), y.flatten())
# lon = np.asarray(lon).reshape((ny, nx))
# lat = np.asarray(lat).reshape((ny, nx))
# da.coords['lon'] = (('y', 'x'), lon)
# da.coords['lat'] = (('y', 'x'), lat)

In [3]:
# da.to_netcdf(path='/glade/work/tking/cgnet/polar_regridding/data-2003-04-29-02-0-sav1-rev-latlon.nc')
# # use for just antarctic

## regrid TMQ/IVT/PSL/PR data from lat/lon to polar

In [3]:
# bilinear interp for later, nearest neighbor ok for now
# submit scripts below in batch scripts, see example at /glade/scratch/tking/cgnet/high_lat_QC/prw

In [314]:
# module load gnu/9.1.0
# module load esmf_libs/8.0.0
# module load esmf-8.0.0-ncdfio-mpi-O
# module load nco/4.7.9

In [315]:
# set srcgrid=f09
# set dstgrid=sp_stereo
# set srcgridfile=/glade/p/cesmdata/inputdata/share/scripgrids/fv0.23x0.31_141008.nc
# set dstgridfile=/glade/u/home/tking/sp_stereographic_SCRIP.nc
# set srcinitfile=prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312100.nc
# set dstinitfile=polar_tmq/prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312100_polar_nearest.nc

# #create the map file
# ESMF_RegridWeightGen --ignore_unmapped --src_regional -m neareststod -w map_${srcgrid}_to_${dstgrid}_near.nc -s ${srcgridfile} -d ${dstgridfile}

# #use the mapfile to remap srcinitfile to dstinitfile
# ncremap -m ./map_${srcgrid}_to_${dstgrid}_near.nc -i ${srcinitfile} -o ${dstinitfile}


# currently running batch scripts on Thursday March 16th ~1:20pm!

## loop through mask files to generate converted mask files

In [29]:
tmq_dict = {2000: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312100.nc",
            2001: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200101010000-200112312100.nc",
            2002: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200201010000-200212312100.nc",
            2003: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200301010000-200312312100.nc",
            2004: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200401010000-200412312100.nc"}

ivt_dict = {2000: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001-200012.nc",
            2001: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200101-200112.nc",
            2002: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200201-200212.nc",
            2003: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200301-200312.nc",
            2004: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200401-200412.nc"}

psl_dict = {2000: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312100.nc",
            2001: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200101010000-200112312100.nc",
            2002: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200201010000-200212312100.nc",
            2003: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200301010000-200312312100.nc",
            2004: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200401010000-200412312100.nc"}

pr_dict = {2000: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312359.nc",
           2001: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200101010000-200112312359.nc",
           2002: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200201010000-200212312359.nc",
           2003: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200301010000-200312312359.nc",
           2004: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200401010000-200412312359.nc",
           2005: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200501010000-200512052359.nc"}

In [312]:
for year in [2001]:  # , 2002, 2003]:
    qa_antarctic = sorted(glob.glob('/glade/u/home/tking/work/cgnet/QA_xml/h5/qa*/antarctic/netcdfs/data-{}*'.format(year)))

    tmq = xr.open_dataset('/glade/scratch/tking/cgnet/high_lat_QC/prw/{}'.format(tmq_dict[year]))
    # lat: 768 lon: 1152 time: 2919 nbnd: 2
    psl = xr.open_dataset('/glade/scratch/tking/cgnet/high_lat_QC/psl/{}'.format(psl_dict[year]))
    # lat: 768 lon: 1152 time: 2919 nbnd: 2
    ivt = xr.open_dataset('/glade/scratch/tking/cgnet/high_lat_QC/ivt/{}'.format(ivt_dict[year]))
    # time: 2920 bound: 2 lat: 768 lon: 1152
    pr = xr.open_dataset('/glade/scratch/tking/cgnet/high_lat_QC/pr/{}'.format(pr_dict[year]))
    # height: 1 lat: 768 lon: 1152 time: 2920 nb2: 2
    # these have different coords (although tmq and psl are same, ivt just seems to be reordered/renamed,
    # pr has an extra height dimension (that is constant) and renamed other dims)
    time_index = 0
    for index in range(len(qa_antarctic[:3])):

        # read in mask file
        qa_aa_ds = xr.open_dataset(qa_antarctic[index])

        # read in temp file for use in notebook
        temp = xr.open_dataset('/glade/work/tking/cgnet/polar_regridding/data-2003-04-29-02-0-sav1-rev-latlon.nc')

        # rename the dimensions of the original ar_masks variable from phony_0/1 to y and x respectively
        qa_aa_ds = qa_aa_ds.rename({'phony_dim_0':'y','phony_dim_1':'x'})

        time = qa_antarctic[index].split('/')[-1].split('data-')[1].split('.nc')[0].split('-00-2')[0]
        time_year = int(time.split('-')[0])  # or = year
        time_month = int(time.split('-')[1])
        time_day = int(time.split('-')[2])
        time_hour = 0  # all files were 00

        # determine hours past a base date

        file_for_date = qa_antarctic[index].split('/')[-1].split('data-')[1].split('.nc')[0].split('-00-2')[-1]
        if file_for_date == '':
            sample_id = 0
        else:
            sample_id = int(file_for_date[1]) # format would be _#, want #
        if sample_id == 0:
            time_index += 1

        # todo: make this temp.time[time_index]...
        date_number = date2num(dt.datetime(time_year, time_month, time_day, time_hour, 0), 'hours since 1970-01-01')
        temp['time'] = date_number
        temp['time'] = temp['time'].assign_attrs({'long_name': 'time',
                                                  'units': 'hours since 1970-01-01',
                                                  'calendar': 'noleap'})


        # append the new x,y,lat,lon variables to mask using the GeoTiff temp file
        temp['ar_masks'] = qa_aa_ds.ar_masks

        temp['ar_masks'] = temp['ar_masks'].assign_coords({'sample_id': sample_id})
        # temp['ar_masks'] = temp['ar_masks'].assign_coords({'time': date_number})
        
        # add all the coordinate attributes to ar_mask
        temp['ar_masks'] = temp['ar_masks'].assign_attrs({'transform' : temp.__xarray_dataarray_variable__.transform,
                                                          'crs': temp.__xarray_dataarray_variable__.crs,
                                                          'res': temp.__xarray_dataarray_variable__.res,
                                                          'is_tiled': temp.__xarray_dataarray_variable__.is_tiled,
                                                          'nodatavals': temp.__xarray_dataarray_variable__.nodatavals,
                                                          'scales': temp.__xarray_dataarray_variable__.scales,
                                                          'offsets': temp.__xarray_dataarray_variable__.offsets,
                                                          'AREA_OR_POINT': temp.__xarray_dataarray_variable__.AREA_OR_POINT})
        
        if time_month < 10:
            time_m_formatted = '0'+str(time_month)
        else:
            time_m_formatted = str(time_month)
        if time_day < 10:
            time_d_formatted = '0'+str(time_day)
        else:
            time_d_formatted = str(time_day)
            
        temp['date'] = str(time_year)+time_m_formatted+time_d_formatted
        temp['date'] = temp['date'].assign_attrs({'long_name':'current date (YYYYMMDD)'})
        temp['datesec'] = str(sample_id)
        temp['datesec'] = temp['datesec'].assign_attrs({'long_name':'current seconds of current date'})

        # put in all the variable data that we received from Sol (TMQ, IVT, PSL, etc.)
        # These variables will be dimensioned (time,y,x) in your single output file and
        # have the same lat lon coordinate attribute as the ar_mask array.
        # construct the filenames for TMQ, IVT etc from the date info that you have
        # then open, read and close those files inside your processing loop.

        # open tmq file for the correct date and read tmq 
        tmq_subset = tmq.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, 0, 0, 0, 0, has_year_zero=True), method='nearest')
        psl_subset = psl.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, 0, 0, 0, 0, has_year_zero=True), method='nearest')
        ivt_subset = ivt.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, 0, 0, 0, 0, has_year_zero=True), method='nearest')
        pr_subset = pr.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, 0, 0, 0, 0, has_year_zero=True), method='nearest')
    
        temp['tmq'] = tmq_subset.drop('time_bnds').drop('time').to_array()[0]
        temp['psl'] = psl_subset.drop('time_bnds').drop('time').to_array()[0]
        temp['ivt'] = ivt_subset.drop('time').drop('time_bnds').drop('bounds_lat').drop_dims('bound').to_array()[0]
        temp['pr'] = pr_subset.drop('time_bnds').drop('time').to_array()[0][0]
        
        temp['tmq'] = temp['tmq'].assign_coords({'sample_id': sample_id})
        temp['psl'] = temp['psl'].assign_coords({'sample_id': sample_id})
        temp['ivt'] = temp['ivt'].assign_coords({'sample_id': sample_id})
        temp['pr'] = temp['pr'].assign_coords({'sample_id': sample_id})
        
        # todo: resolve issue with repeated dimension
        # temp['tmq'] = temp['tmq'].assign_coords({'time': date_number})
        # temp['psl'] = temp['psl'].assign_coords({'time': date_number})
        # temp['ivt'] = temp['ivt'].assign_coords({'time': date_number})
        # temp['pr'] = temp['pr'].assign_coords({'time': date_number})
        
        #tmq values seem to be present, other vars are missing values
        temp['tmq'] = temp['tmq'].assign_attrs({'crs': temp['__xarray_dataarray_variable__'].crs,
                                                'transform': temp['__xarray_dataarray_variable__'].transform, 
                                                'res':temp['__xarray_dataarray_variable__'].res,
                                                'is_tiled':temp['__xarray_dataarray_variable__'].is_tiled,
                                                'nodatavals':temp['__xarray_dataarray_variable__'].nodatavals,
                                                'scales':temp['__xarray_dataarray_variable__'].scales,
                                                'offsets': temp['__xarray_dataarray_variable__'].offsets,
                                                'AREA_OR_POINT': temp['__xarray_dataarray_variable__'].AREA_OR_POINT})
        temp['psl'] = temp['psl'].assign_attrs({'crs': temp['__xarray_dataarray_variable__'].crs,
                                                'transform': temp['__xarray_dataarray_variable__'].transform, 
                                                'res':temp['__xarray_dataarray_variable__'].res,
                                                'is_tiled':temp['__xarray_dataarray_variable__'].is_tiled,
                                                'nodatavals':temp['__xarray_dataarray_variable__'].nodatavals,
                                                'scales':temp['__xarray_dataarray_variable__'].scales,
                                                'offsets': temp['__xarray_dataarray_variable__'].offsets,
                                                'AREA_OR_POINT': temp['__xarray_dataarray_variable__'].AREA_OR_POINT})
        temp['pr'] = temp['pr'].assign_attrs({'crs': temp['__xarray_dataarray_variable__'].crs,
                                                'transform': temp['__xarray_dataarray_variable__'].transform, 
                                                'res':temp['__xarray_dataarray_variable__'].res,
                                                'is_tiled':temp['__xarray_dataarray_variable__'].is_tiled,
                                                'nodatavals':temp['__xarray_dataarray_variable__'].nodatavals,
                                                'scales':temp['__xarray_dataarray_variable__'].scales,
                                                'offsets': temp['__xarray_dataarray_variable__'].offsets,
                                                'AREA_OR_POINT': temp['__xarray_dataarray_variable__'].AREA_OR_POINT})
        temp['ivt'] = temp['ivt'].assign_attrs({'crs': temp['__xarray_dataarray_variable__'].crs,
                                                'transform': temp['__xarray_dataarray_variable__'].transform, 
                                                'res':temp['__xarray_dataarray_variable__'].res,
                                                'is_tiled':temp['__xarray_dataarray_variable__'].is_tiled,
                                                'nodatavals':temp['__xarray_dataarray_variable__'].nodatavals,
                                                'scales':temp['__xarray_dataarray_variable__'].scales,
                                                'offsets': temp['__xarray_dataarray_variable__'].offsets,
                                                'AREA_OR_POINT': temp['__xarray_dataarray_variable__'].AREA_OR_POINT})
        temp = temp.drop_vars('__xarray_dataarray_variable__')
        temp = temp.drop_dims('band')
        temp = temp.drop_vars('variable')
        # temp = temp.drop('band') # drop index, as well
        filename = qa_antarctic[index].split('/')[-1]
        temp.to_netcdf(path='/glade/u/home/tking/work/cgnet/QA_xml/all_antarctic_converted_masks/'+filename)
        print('made netcdf {}'.format(filename))

        qa_aa_ds.close()

    tmq.close()
    ivt.close()
    psl.close()
    pr.close()

made netcdf data-2001-01-01-00-2_0.nc
made netcdf data-2001-01-24-00-2_0.nc
made netcdf data-2001-01-24-00-2_1.nc


In [311]:
# this seems to work, but turns to all nan's when set temp['ivt'] equal to it...
# ivt_temp = ivt.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, 0, 0, 0, 0, has_year_zero=True), method='nearest').drop('time').drop('time_bnds').drop('bounds_lat').drop('bounds_lon').to_array()[0]
# temp['ivt'] = ivt_temp

# print(temp['ivt'])
# print(ivt_temp)

## try using different setup; create netcdf from scratch

In [316]:
from netCDF4 import Dataset
import numpy as np

In [351]:
temp_file = '/glade/work/tking/cgnet/polar_regridding/data-2003-04-29-02-0-sav1-rev-latlon.nc'
time_index = 0
for year in [2001]: # 2002, 2003
    mask_file_list = sorted(glob.glob('/glade/u/home/tking/work/cgnet/QA_xml/h5/qa*/antarctic/netcdfs/data-{}*'.format(year)))
    tmq_ds = xr.open_dataset('/glade/scratch/tking/cgnet/high_lat_QC/prw/{}'.format(tmq_dict[year]))
    # lat: 768 lon: 1152 time: 2919 nbnd: 2
    psl_ds = xr.open_dataset('/glade/scratch/tking/cgnet/high_lat_QC/psl/{}'.format(psl_dict[year]))
    # lat: 768 lon: 1152 time: 2919 nbnd: 2
    ivt_ds = xr.open_dataset('/glade/scratch/tking/cgnet/high_lat_QC/ivt/{}'.format(ivt_dict[year]))
    # time: 2920 bound: 2 lat: 768 lon: 1152
    pr_ds = xr.open_dataset('/glade/scratch/tking/cgnet/high_lat_QC/pr/{}'.format(pr_dict[year]))
    # height: 1 lat: 768 lon: 1152 time: 2920 nb2: 2
    # todo: use polar subdirectories to get regridded data!!!
    
    for antarctic_mask_file in mask_file_list[:1]:
        try: ncfile.close()  # just to be safe, make sure dataset is not already open.
        except: pass

        time = antarctic_mask_file.split('/')[-1].split('data-')[1].split('.nc')[0].split('-00-2')[0]
        time_year = int(time.split('-')[0])  # or = year
        time_month = int(time.split('-')[1])
        time_day = int(time.split('-')[2])
        time_hour = 0  # all files were 00
        date_number = date2num(dt.datetime(time_year, time_month, time_day, time_hour, 0), 'hours since 1970-01-01')
        
        if time_month < 10:
            time_m_formatted = '0'+str(time_month)
        else:
            time_m_formatted = str(time_month)
        if time_day < 10:
            time_d_formatted = '0'+str(time_day)
        else:
            time_d_formatted = str(time_day)
            
        sample_id = antarctic_mask_file[-4:-3]
        if sample_id == '0':
            time_index+=1
        
        qa_aa_ds = xr.open_dataset(antarctic_mask_file)

        ncfile = Dataset('/glade/u/home/tking/work/cgnet/QA_xml/all_antarctic_converted_masks/temp.nc',mode='w',format='NETCDF4_CLASSIC') 

        lat_dim = ncfile.createDimension('lat', 768)     # latitude axis
        lon_dim = ncfile.createDimension('lon', 1152)    # longitude axis
        y_dim = ncfile.createDimension('y', 1152)        # latitude axis
        x_dim = ncfile.createDimension('x', 1152)        # longitude axis
        time_dim = ncfile.createDimension('time', None)  # unlimited axis (can be appended to)
        time_index_dim = ncfile.createDimension('time_index', 350)
        sample_id_dim = ncfile.createDimension('sample_id', 4)

        # include time variable and relevant attributes
        time = ncfile.createVariable('time', np.float64, ('time',))
        time.units = 'hours since 1970-01-01'
        time.calendar = 'noleap'
        time.long_name = 'time'
        
        date = ncfile.createVariable('date', np.float64, ('time',))
        date.long_name = 'current date (YYYYMMDD)'
        datesec = ncfile.createVariable('datesec', np.float64, ('time',))
        date.long_name = 'current seconds of current date'
        
        date[:] = str(time_year)+time_m_formatted+time_d_formatted
        datesec[:] = '00'

        # include ar_mask variable and relevant attributes
        ar_mask = ncfile.createVariable('ar_mask', np.float64, ('time','sample_id','y','x'))
        ar_mask.units = 'hours since 1970-01-01'
        ar_mask.calendar = 'noleap'
        ar_mask.long_name = 'time'
        ar_mask[time_index,sample_id,:,:]=qa_aa_ds.ar_masks
        
        # include tmq
        tmq = ncfile.createVariable('tmq',np.float64,('time','lat','lon','time_index','sample_id'))

        # include ivt
        ivt = ncfile.createVariable('ivt',np.float64,('time','lat','lon','time_index','sample_id'))

        # include psl
        psl = ncfile.createVariable('psl',np.float64,('time','lat','lon','time_index','sample_id'))

        # include pr
        pr = ncfile.createVariable('pr',np.float64,('time','lat','lon','time_index','sample_id'))

        # open tmq file for the correct date and read tmq 
        tmq_subset = tmq_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, 0, 0, 0, 0, has_year_zero=True), method='nearest')
        psl_subset = psl_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, 0, 0, 0, 0, has_year_zero=True), method='nearest')
        ivt_subset = ivt_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, 0, 0, 0, 0, has_year_zero=True), method='nearest')
        pr_subset = pr_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, 0, 0, 0, 0, has_year_zero=True), method='nearest')

        # tmq[time_index,sample_id,:,:]=tmq_subset
        # psl[time_index,sample_id,:,:]=psl_subset
        # ivt[time_index,sample_id,:,:]=ivt_subset
        # pr[time_index,sample_id,:,:]=pr_subset

        # tmq_subset.drop('time_bnds').drop('time').to_array()[0]
        # psl_subset.drop('time_bnds').drop('time').to_array()[0]
        # ivt_subset.drop('time').drop('time_bnds').drop('bounds_lat').drop_dims('bound').to_array()[0]
        # pr_subset.drop('time_bnds').drop('time').to_array()[0][0]
        
        # todo: include tmq, psl, ivt, pr data

        print(ncfile)


<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    dimensions(sizes): lat(768), lon(1152), y(1152), x(1152), time(2), time_index(350), sample_id(4)
    variables(dimensions): float64 time(time), float64 date(time), float64 datesec(time), float64 ar_mask(time, sample_id, y, x), float64 tmq(time, lat, lon, time_index, sample_id), float64 ivt(time, lat, lon, time_index, sample_id), float64 psl(time, lat, lon, time_index, sample_id), float64 pr(time, lat, lon, time_index, sample_id)
    groups: 


In [325]:
temp

# PART 2:

### use the standard ESMF mapping procedure to go from the projected stereographic polar mask grids to our regular gridded data from CESM. Use Steve's python code and John's mapping file commands to do this

In [35]:
# We want to do this to get masks and state information on two different grids
# need new mapping file to do this, then run through Steve's routine

In [None]:
# Steve Yeager has a utility function for remapping CAM-SE output (see remap_camse function below):
#     https://github.com/sgyeager/mypyutils/blob/main/mypyutils/regrid_utils.py

import xarray as xr
import numpy as np
import scipy.sparse as sps
import cf_xarray

def remap_camse(ds, dsw, varlst=[]):
    #dso = xr.full_like(ds.drop_dims('ncol'), np.nan)
    dso = ds.drop_dims('ncol').copy()
    lonb = dsw.xc_b.values.reshape([dsw.dst_grid_dims[1].values, dsw.dst_grid_dims[0].values])
    latb = dsw.yc_b.values.reshape([dsw.dst_grid_dims[1].values, dsw.dst_grid_dims[0].values])
    weights = sps.coo_matrix((dsw.S, (dsw.row-1, dsw.col-1)), shape=[dsw.dims['n_b'], dsw.dims['n_a']])
    if not varlst:
        for varname in list(ds):
            if 'ncol' in(ds[varname].dims):
                varlst.append(varname)
        if 'lon' in varlst: varlst.remove('lon')
        if 'lat' in varlst: varlst.remove('lat')
        if 'area' in varlst: varlst.remove('area')
    for varname in varlst:
        shape = ds[varname].shape
        invar_flat = ds[varname].values.reshape(-1, shape[-1])
        remapped_flat = weights.dot(invar_flat.T).T
        remapped = remapped_flat.reshape([*shape[0:-1], dsw.dst_grid_dims[1].values,
                                          dsw.dst_grid_dims[0].values])
        dimlst = list(ds[varname].dims[0:-1])
        dims={}
        coords={}
        for it in dimlst:
            dims[it] = dso.dims[it]
            coords[it] = dso.coords[it]
        dims['lat'] = int(dsw.dst_grid_dims[1])
        dims['lon'] = int(dsw.dst_grid_dims[0])
        coords['lat'] = latb[:,0]
        coords['lon'] = lonb[0,:]
        remapped = xr.DataArray(remapped, coords=coords, dims=dims, attrs=ds[varname].attrs)
        dso = xr.merge([dso, remapped.to_dataset(name=varname)])
    return dso

In [None]:
# Here is a notebook demonstrating how this is used:
#     /glade/u/home/yeager/analysis/python/toshare/CLM_field_regrid.ipynb