# Convert Lat/Lon Files to Polar Stereographic Files

This notebook provides a workflow for converting lat/lon netcdf files to polar stereographic coordinates. Then, we combine the polar stereographic underlying data files and mask files in order to use as input to cgnet. To get the data into the proper format, we needed to do the following:
 - utilize a template file as a base for modifying netcdf files of the existing mask arrays. The template file includes all proper GIS attributes.
 - python script reads the Geotiff file and creates a netcdf file with the x,y,lat,lon coordinate dimensions and variables

Authors:
--------
 - John Truesdale
 - Teagan King

Prerequisites:
--------------
* Create geoTiff file (see instructions below)
* Create SCRIP File
* Generate remapped IVT/TMQ/etc underlying data (see instructions below)
* Generate template files for NH & SH (see instructions below)
* regrid TMQ/IVT/PSL/PR data from lat/lon to polar (see instructions below)
* rename regridded files (see instructions below)

Using QGIS to generate a GeoTIFF file for our polar jpgs:
---------------------------------------------------------
 - Download [QGIS](https://qgis.org/)
 - Download a background North Polar Stereographic GeoTIFF file, eg from Polar Geospatial Center (PGC) Map Catalog 
 - Open up QGIS, drag the background North Polar Stereographic GeoTIFF file into a new project, and ensure that the project coordinates match the imported image's coordinates (and are the same coordinates that you want to use in the georeferencing)
 - Assign a standard stereographic coordinate by using the Georeferencer tool in QGIS. Add the Polar jpg image used in climate contours into the georeferencer frame, and mark clear points (eg, islands or peninsulas or other landmarks) on the jpg and their corresponding points on the background North Polar Stereographic GeoTIFF.
 - When georeferencing, be sure to allow transforms rather than just linear adjustments.
 - Save information in a GeoTiff file by exporting the new layer.

A few other notes from John:
----------------------------
 - The polar projection jpegs on which the climatenet masks are drawn were created from python matplotlib and you can grab the coordinate information from matplotlib; this was checked with QGIS
 - Because the LLNL polar jpegs are a projected coordinate, the underlying unit in a stereographic projection is meters.  The x and y variables on the GeoTiff and converted netcdf file contain meter offsets of every pixel (row,col) of the ar_mask array with respect to one of the standard south pole stereographic coordinate systems.
 -   When you look at the square projected polar image you see that the longitude lines converge at the pole and latitudes are a set of nested circles.  When you are describing this grid in lat/lon coordinates it is known as a curvilinear grid where each pixel (array location) requires a unique lat/lon pair to specify its position on a regular grid. A straight line along any row or column of the jpeg raster (or ar_mask array) will intersect different lat lon values for every pixel.  The coordinate information for our rectangular ar_mask array therefore contains lat and lon variables that are two dimensions and describe the entire grid of 1152x1152 points with unique lat/lon values for each pixel (array location) of ar_mask. The standard netcdf way of denoting a curvilinear coordinate is by creating the dimensions that define the size of the ar_mask array (x,y), adding lat/lon variables that are each dimensioned (x,y) containing the lat/lon coordinates of that point, and finally adding metadata to the ar_mask array noting that the coordinates for this variable are not the dimension variables x,y but the lat/lon variables.

In [3]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import numpy as np
import os
from rasterio.warp import transform
import urllib.request
import xarray as xr
import glob
from netCDF4 import date2num
from netCDF4 import Dataset
import numpy as np
import datetime as dt
import cftime
import pdb

## Generate template file for SH & NH (only needs to be done once)

In [1]:
# # TEMPLATE FILE FOR SH

# # Read the data
# input_path = '/glade/work/tking/cgnet/polar_regridding/data-2003-04-29-02-0-copy-sav1.tif'

# da = xr.open_rasterio(input_path)
# yval=da['y']
# ryval=np.flip(yval)
# # Compute the lon/lat coordinates with rasterio.warp.transform
# ny, nx = len(da['y']), len(da['x'])
# x, y = np.meshgrid(da['x'], ryval)
# # Rasterio works with 1D arrays
# lon, lat = transform(da.crs, {'init': 'EPSG:4326'},
#                      x.flatten(), y.flatten())
# lon = np.asarray(lon).reshape((ny, nx))
# lat = np.asarray(lat).reshape((ny, nx))
# da.coords['lon'] = (('y', 'x'), lon)
# da.coords['lat'] = (('y', 'x'), lat)

# da.to_netcdf(path='/glade/work/tking/cgnet/polar_regridding/data-2003-04-29-02-0-sav1-rev-latlon.nc')

In [2]:
# # TEMPLATE FILE FOR NH

# # Read the data
# # input_path = '/glade/work/tking/cgnet/polar_regridding/NH.tif' #nh-rev-latlon.nc
# input_path = '/glade/derecho/scratch/tking/cgnet/NPS/rendered.tif'

# da = xr.open_rasterio(input_path) # may need to change to rioxarray.open_rasterio soon
# yval=da['y']
# ryval=np.flip(yval)
# # Compute the lon/lat coordinates with rasterio.warp.transform
# ny, nx = len(da['y']), len(da['x'])
# x, y = np.meshgrid(da['x'], ryval)
# # Rasterio works with 1D arrays
# lon, lat = transform(da.crs, {'init': 'EPSG:4326'},
#                      x.flatten(), y.flatten())
# lon = np.asarray(lon).reshape((ny, nx))
# lat = np.asarray(lat).reshape((ny, nx))
# da.coords['lon'] = (('y', 'x'), lon)
# da.coords['lat'] = (('y', 'x'), lat)

# da.to_netcdf(path='/glade/work/tking/cgnet/polar_regridding/nh-rev-latlon_2.nc')

## set up dask

In [3]:
# Import dask
import dask

# Use dask jobqueue
from dask_jobqueue import PBSCluster

# Import a client
from dask.distributed import Client

# Setup your PBSCluster
nmem='25GB' # specify memory here so it duplicates below
cluster = PBSCluster(
    cores=1, # The number of cores you want
    memory=nmem, # Amount of memory
    processes=1, # How many processes
    queue='casper', # The type of queue to utilize (/glade/u/apps/dav/opt/usr/bin/execcasper)
    local_directory='/glade/derecho/scratch/$USER/local_dask', # Use your local directory
    resource_spec='select=1:ncpus=1:mem='+nmem, # Specify resources
    account='P93300313', # Input your project ID here, previously this was known as 'project', now is 'account'
    walltime='08:00:00', # Amount of wall time
    # interface='ib0', # Interface to use
)

# Scale up
cluster.scale(10)

# Change your url to the dask dashboard so you can see it
dask.config.set({'distributed.dashboard.link':'https://jupyterhub.hpc.ucar.edu/stable/user/{USER}/proxy/{port}/status'})

# Setup your client
client = Client(cluster)

In [4]:
client

0,1
Connection method: Cluster object,Cluster type: dask_jobqueue.PBSCluster
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/tking/proxy/8787/status,

0,1
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/tking/proxy/8787/status,Workers: 0
Total threads: 0,Total memory: 0 B

0,1
Comm: tcp://128.117.208.98:40613,Workers: 0
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/tking/proxy/8787/status,Total threads: 0
Started: Just now,Total memory: 0 B


## define dictionary of file names

In [4]:
tmq_dict = {2000: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312100.nc",
            2001: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200101010000-200112312100.nc",
            2002: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200201010000-200212312100.nc",
            2003: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200301010000-200312312100.nc",
            2004: "prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200401010000-200412312100.nc"}

ivt_dict = {2000: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001-200012.nc",
            2001: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200101-200112.nc",
            2002: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200201-200212.nc",
            2003: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200301-200312.nc",
            2004: "windhusavi_3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200401-200412.nc"}

psl_dict = {2000: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312100.nc",
            2001: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200101010000-200112312100.nc",
            2002: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200201010000-200212312100.nc",
            2003: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200301010000-200312312100.nc",
            2004: "psl_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200401010000-200412312100.nc"}

pr_dict = {2000: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312359.nc",
           2001: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200101010000-200112312359.nc",
           2002: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200201010000-200212312359.nc",
           2003: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200301010000-200312312359.nc",
           2004: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200401010000-200412312359.nc",
           2005: "pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200501010000-200512052359.nc"}

## Change to 2-dimensional lat/lon before regridding, then regrid using this section below:

In [None]:
# # only do once-- and I think we used the 2dlatlon directory in the end?

# for var in ['ivt']: # pr , 'psl', 'tmq'
#     if var=='pr':
#         dictionary = pr_dict
#     elif var=='psl':
#         dictionary = psl_dict
#     elif var=='ivt':
#         dictionary = ivt_dict
#     elif var=='tmq':
#         dictionary = tmq_dict
#     for year in [2000]: #2001,2002,2003,2004]:
#         ds_before_regrid = xr.open_dataset('/glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/{}'.format(dictionary[year]))
#         ds_before_regrid
#         # we want this to be two dimensional lat/lon instead of 1

#         # mesh is one useful tool but could do by hand
#         # duplicate lat/lon array for lon/lat number of times
#         # in order to have lat (y,x) and lon (y,x)

#         # y is lat
#         # x is lon
#         # dimensions should be time, x, y, 
#         # follow example here: https://xesmf.readthedocs.io/en/latest/notebooks/Curvilinear_grid.html
#         y_len = ds_before_regrid.lon.shape[0]
#         x_len = ds_before_regrid.lat.shape[0]

#         ds_before_regrid['lat_val'] = (('y','x'), np.tile(ds_before_regrid.lat, (y_len,1)))
#         ds_before_regrid['lon_val'] = (('y','x'), np.transpose(np.tile(ds_before_regrid.lon, (x_len,1))))

#         if var = 'ivt':
#             ds_before_regrid[var] = ds_before_regrid.windhusavi.swap_dims({'lat':'x','lon':'y'})
#         elif var = 'psl':
#             ds_before_regrid[var] = ds_before_regrid.psl.swap_dims({'lat':'x','lon':'y'})
#         elif var = 'tmq':
#             ds_before_regrid[var] = ds_before_regrid.prw.swap_dims({'lat':'x','lon':'y'})
#         elif var = 'pr':
#             ds_before_regrid[var] = ds_before_regrid.pr.swap_dims({'lat':'x','lon':'y'})

#         ds_before_regrid = ds_before_regrid.drop_dims('lat')
#         ds_before_regrid = ds_before_regrid.drop_dims('lon')

#         ds_before_regrid.to_netcdf('/glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/updated_latlon/{}'.format(dictionary[year]))
#         print("done with {} year".format(var), str(year))

# Create SCRIP file

See [details on using curvilinear_to_SCRIP](https://www.ncl.ucar.edu/Document/Functions/ESMF/curvilinear_to_SCRIP.shtml) (example script pasted below)

OR 

[Use ESMF_regrid](https://www.ncl.ucar.edu/Document/Functions/ESMF/ESMF_regrid.shtml)

In [1]:
# curvilinear_to_SCRIP.ncl
# ------
# load "$NCARG_ROOT/lib/ncarg/nclscripts/esmf/ESMF_regridding.ncl"

# begin
# ;---Interpolation methods
#     methods      = (/"bilinear","patch","conserve"/)

# ;---Input file
#     srcFileName  = "nh-rev-latlon.nc"

# ;---Output (and input) files
#     srcGridName  = "np_stereographic_SCRIP.nc"
#     dstGridName  = "nh_dst_SCRIP.nc"

# ;---Get data and lat/lon grid from CMIP5 Grid
#     sfile       = addfile(srcFileName,"r")
#     lat2d = sfile->lat
#     lon2d = sfile->lon
#     latlon_dims = dimsizes(lon2d)
#     Opt         = True
#     curvilinear_to_SCRIP(srcGridName,lat2d,lon2d, Opt)
# end

## regrid TMQ/IVT/PSL/PR data from lat/lon to polar

In [None]:
# SH SCRIPT

# module load gnu/9.1.0
# module load esmf_libs/8.0.0
# module load esmf-8.0.0-ncdfio-mpi-O
# module load nco/4.7.9

# set srcgrid=f09
# set dstgrid=sp_stereo
# set srcgridfile=/glade/p/cesmdata/inputdata/share/scripgrids/fv0.23x0.31_141008.nc
# set dstgridfile=/glade/u/home/jet/sp_stereographic_SCRIP.nc
# set srcinitfile=prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312100.nc
# set dstinitfile=polar_tmq/prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312100_polar_nearest.nc

# create the map file
# ESMF_RegridWeightGen --ignore_unmapped --src_regional -m neareststod -w map_${srcgrid}_to_${dstgrid}_near.nc -s ${srcgridfile} -d ${dstgridfile}

# submit ncremap scripts below in batch scripts, see example at
#     /glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/sh_scripts/remap_script and batch_remap.sh

# the above batch script uses the mapfile to remap srcinitfile to dstinitfile as shown below:
# ncremap -m ./map_${srcgrid}_to_${dstgrid}_near.nc -i ${srcinitfile} -o ${dstinitfile}

In [None]:
# # # NH SCRIPT (tcsh)

# module load gnu/9.1.0
# module load esmf_libs/8.0.0
# module load esmf-8.0.0-ncdfio-mpi-O
# module load nco/4.7.9

# set srcgrid=f09
# set dstgrid=np_stereo
# set srcgridfile=/glade/p/cesmdata/inputdata/share/scripgrids/fv0.23x0.31_141008.nc
# set dstgridfile=/glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/nh_scripts/np_stereographic_SCRIP_2.nc
# set srcinitfile=prw.nc
# set dstinitfile=/glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/nh_polar/tmq/prw_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312100_polar_nearest.nc

# # create the map file
# # ESMF_RegridWeightGen --ignore_unmapped --src_regional -m neareststod -w map_${srcgrid}_to_${dstgrid}_near.nc -s ${srcgridfile} -d ${dstgridfile}
# ESMF_RegridWeightGen --ignore_unmapped --src_regional -m neareststod -w map_fv0.23x0.31_to_${dstgrid}_near_2.nc -s ${srcgridfile} -d ${dstgridfile}


# use the mapfile to remap srcinitfile to dstinitfile using batch scripts like those below which include commands such as "ncremap -m ./map_${srcgrid}_to_${dstgrid}_near.nc -i ${srcinitfile} -o ${dstinitfile}"
# submit ncremap scripts in batch scripts with qsub, see example:
#     /glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/nh_scripts/remap_script and batch_remap.sh


In [28]:
# # Make North Polar Stereographic Blue Marble Image (I actually did this in QGIS in the end...)

# import numpy as np
# import matplotlib.pyplot as plt
# import cartopy.crs as ccrs
# from PIL import Image
# from cartopy import img_transform

# # Load the image
# img = Image.open("/glade/work/tking/cgnet/ClimateNet/climatenet/bluemarble_fake/BM_for_NP.jpeg")

# # Define projections
# src_proj = ccrs.PlateCarree()  # Equirectangular input
# dst_proj = ccrs.NorthPolarStereo()  # North Polar Stereographic

# # Create figure
# fig = plt.figure(figsize=(8, 8))
# ax = plt.axes(projection=dst_proj)

# # Set extent for the North Polar Stereographic projection
# ax.set_extent([-180, 180, 45, 90], crs=src_proj)  # Crop to Northern Hemisphere

# # Reproject the image to the new projection using imshow
# ax.imshow(img, origin="upper", transform=src_proj, extent=[-180, 180, -90, 90])

# # Save and show output
# plt.savefig("/glade/work/tking/cgnet/ClimateNet/climatenet/bluemarble_fake/BM_NP.jpeg", dpi=300, bbox_inches="tight")
# plt.show()

In [1]:
# import numpy as np
# import matplotlib.pyplot as plt
# import cartopy.crs as ccrs
# from PIL import Image
# import cartopy.feature as cfeature

# # Load the image
# img = Image.open("/glade/work/tking/cgnet/ClimateNet/climatenet/bluemarble_fake/BM_for_NP.jpeg")

# # Define projections
# src_proj = ccrs.PlateCarree()  # Equirectangular input
# dst_proj = ccrs.NorthPolarStereo()  # North Polar Stereographic

# # Create figure and axis with the dst_proj (polar projection)
# fig = plt.figure(figsize=(8, 8))
# ax = plt.axes(projection=dst_proj)

# # Add geographic features to help visualize the projection
# ax.add_feature(cfeature.LAND)
# ax.add_feature(cfeature.OCEAN)

# # Set extent for the North Polar Stereographic projection
# # Adjust the extent to match the Northern Hemisphere
# ax.set_extent([-180, 180, 45, 90], crs=src_proj)  # Crop to Northern Hemisphere

# # Ensure that the image is transformed correctly
# # Here we need to properly set the extent and use the projection of the image (src_proj)
# ax.imshow(img, origin="upper", transform=src_proj, extent=[-180, 180, -90, 90])

# # Save and show output
# plt.savefig("/glade/work/tking/cgnet/ClimateNet/climatenet/bluemarble_fake/BM_NP.jpeg", dpi=300, bbox_inches="tight")
# plt.show()

## Rename and format regridded files for training and inference

For instance:

1) Rename lat/lon dimensions so as not to be overwritten: `ncrename -d lon,x -d lat,y VAR/$FILENAME renamed/VAR/$FILENAME`
2) For any files with variables that need renaming:

    `ncrename -v prw,tmq $FILENAME`

    `ncrename -v windhusavi,ivt $FILENAME`
5) `ncks -6 $FILENAME` ? This converts to netcdf4 classic (maybe not necessary? I think this was also only done for one variable. Skipping in NH polar process...)

Steps to include a file in new_latlon dir:
6) Rename lon variable to longitude (ideally would be with this but it fails: `ncrename -v lon,longitude $FILENAME`) so use the following steps INSTEAD:
`ncap2 -s 'longitude=lon' $FILENAME $FILENAME_tmp.nc`
`ncks -C -x -v lon $FILENAME_tmp.nc $FILENAME_renamed.nc`
7) Rename lat variable to latitude: `ncrename -v lat,latitude $FILENAME`
8) Rename lat/lon dimensions back after not overwritten: `ncrename -d x,lon -d y,lat $FILENAME` (maybe this was skipped psl?)

9) flip horizontally with: `python /glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/nh_polar/renamed/ivt/flip_image.py <INPUT FILE> <OUTPUT_FILE>`

10) any cropping necessary: /glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/nh_polar/renamed/ivt/zoom.py (use 450)

11) Split files:
   `cdo splitsel,1 $FILENAME YEAR_split_`

In [8]:
# Check a file:
pr_ds = xr.open_dataset('/glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/nh_polar/renamed/pr/new_latlon/pr_A3hr_CAM5-1-025degree_All-Hist_est1_v1-0_run002_200001010000-200012312359.nc')
pr_ds

## Generate combined mask/underlying data files by creating netcdf from scratch

### run the next cell if temp.nc (temporary file to fill) already exists -- SPECIFY ANTARCTIC OR ARCTIC!

In [36]:
rm /glade/u/home/tking/work/cgnet/QA_xml/all_arctic_converted_masks/temp.nc

### Note on time adjustment
We'll need to use the bug fix included in the below cells for the first two rounds of QC'd data; this issue has been fixed in the masks of following datasets

### Create temp file with correct attributes

In [27]:
# SET THESE THINGS:

# is temp file for inference or not?
inference=False
region = 'arctic' # 'antarctic'

round_val = 2  # change this value to indicate whether or not bug fix from round 1 is used
year_val = 2004

In [28]:
if region == 'antarctic':
    temp_file = '/glade/work/tking/cgnet/polar_regridding/data-2003-04-29-02-0-sav1-rev-latlon.nc'
    if not inference:
        ncfile = Dataset(f'/glade/u/home/tking/work/cgnet/QA_xml/all_antarctic_converted_masks/{year_val}_round{round_val}.nc',mode='w',format='NETCDF4_CLASSIC') 

if region == 'arctic':
    temp_file = '/glade/work/tking/cgnet/polar_regridding/nh.nc'
    if not inference:
        ncfile = Dataset(f'/glade/u/home/tking/work/cgnet/QA_xml/all_arctic_converted_masks/{year_val}_round{round_val}.nc',mode='w',format='NETCDF4_CLASSIC') 
    if inference:
        ncfile = Dataset(f'/glade/u/home/tking/work/cgnet/QA_xml/all_arctic_converted_masks/inference.nc',mode='w',format='NETCDF4_CLASSIC') 

temp = xr.open_dataset(temp_file)

# Create dimensions
y_dim = ncfile.createDimension('y', 1152)        # vertical displacement axis
x_dim = ncfile.createDimension('x', 1152)        # horizontal displacement axis
time_dim = ncfile.createDimension('time', None)  # unlimited axis (can be appended to)
sample_id_dim = ncfile.createDimension('sample_id', 6)

# Include time variable and relevant attributes
time = ncfile.createVariable('time', np.float64, ('time', ))
time.units = 'hours since 1970-01-01'
time.calendar = 'noleap'
time.long_name = 'time'

# Include date information
date = ncfile.createVariable('date', np.float64, ('time', ))
date.long_name = 'current date'
datesec = ncfile.createVariable('datesec', np.float64, ('time', ))
datesec.long_name = 'current seconds of current date'

if not inference:
    # Include ar_mask variable and relevant attributes
    ar_mask = ncfile.createVariable('ar_mask', np.float64, ('time', 'sample_id', 'y', 'x'))
    ar_mask.long_name = "Atmospheric River Mask"
    ar_mask.standard_name = "AR flag"
    ar_mask.flag_values = 0, 1
    ar_mask.flag_meanings = "Background Atmospheric_River"

# Include underlying data
tmq = ncfile.createVariable('tmq', np.float64,('time', 'y', 'x'))
ivt = ncfile.createVariable('ivt', np.float64,('time', 'y', 'x'))
psl = ncfile.createVariable('psl', np.float64,('time', 'y', 'x'))
pr = ncfile.createVariable('pr', np.float64,('time', 'y', 'x'))

# include y, x, lat, & lon from temp file
y = ncfile.createVariable('y', np.float64,('y'))
y.long_name = 'vertical offset from pole'
y.units = 'meters'

x = ncfile.createVariable('x',np.float64,('x'))
x.long_name = 'horizontal offset from pole'
x.units = 'meters'

lat = ncfile.createVariable('lat', np.float64,('y','x'))
lat.units = 'degrees_north'
lat.long_name = 'latitude'

lon = ncfile.createVariable('lon', np.float64,('y','x'))
lon.units = 'degrees_east'
lon.long_name = 'longitude'

# Add the y, x, lat, & lon data values to the netcdf file
ncfile['y'][:] = temp.y
ncfile['x'][:] = temp.x
ncfile['lat'][:,:] = temp.lat
ncfile['lon'][:,:] = temp.lon

# Copy temp file metadata
ivt.transform = temp.__xarray_dataarray_variable__.transform
ivt.crs = temp.__xarray_dataarray_variable__.crs
ivt.coordinates = 'lat lon'
tmq.transform = temp.__xarray_dataarray_variable__.transform
tmq.crs = temp.__xarray_dataarray_variable__.crs
tmq.coordinates = 'lat lon'
psl.transform = temp.__xarray_dataarray_variable__.transform
psl.crs = temp.__xarray_dataarray_variable__.crs
psl.coordinates = 'lat lon'
pr.transform = temp.__xarray_dataarray_variable__.transform
pr.crs = temp.__xarray_dataarray_variable__.crs
pr.coordinates = 'lat lon'

# Copy global metadata
ncfile.transform = temp.__xarray_dataarray_variable__.transform
ncfile.crs = temp.__xarray_dataarray_variable__.crs
ncfile.res = temp.__xarray_dataarray_variable__.res
ncfile.nodatavals = temp.__xarray_dataarray_variable__.nodatavals
ncfile.scales = temp.__xarray_dataarray_variable__.scales
ncfile.offsets = temp.__xarray_dataarray_variable__.offsets
ncfile.AREA_OR_POINT = temp.__xarray_dataarray_variable__.AREA_OR_POINT
ncfile.coordinates = "lat lon"

### Loop through mask files and underlying data files; add both to temp file

My process has been to adjust the year in the for loop, move temp.nc to a new name, check the results, and then run for a different year. One could also rename temp.nc above to correspond with the year and then not bother with renaming the files. 

In [29]:
if not inference:  # ADD MASKS AND UNDERLYING DATA TO NCFILE FOR TRAINING AND TESTING
    print('starting at {}'.format(dt.datetime.now()))
    if region == 'antarctic':
        directory_of_underlying_data = "/glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/sh_polar/renamed/"
    elif region == 'arctic':
        directory_of_underlying_data = "/glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/nh_polar/renamed/"
    time_index = -1
    

    # The years below correspond to the mask's listed years (ie, incorrect years from round 1)
    # For processing data, I recommend running one year at a time and then renaming temp.nc to match whatever that year is
    for year in [year_val]:
        # gather data from a particular round of QC
        # some of these required different data processing steps which are adjusted based on the round_val below
        if region == 'antarctic':
            if round_val == 1:
                mask_file_list = sorted(glob.glob('/glade/work/tking/cgnet/QA_xml/round_1/h5/qa*/antarctic/netcdfs/data-{}-*'.format(year)))
            elif round_val == 2:
                mask_file_list = sorted(glob.glob('/glade/work/tking/cgnet/QA_xml/round_2/h5/qa*/antarctic/netcdfs/data-{}-*'.format(year)))
            elif round_val == 3:
                mask_file_list = sorted(glob.glob('/glade/work/tking/cgnet/QA_xml/round_3/h5/qa*/antarctic/netcdfs/data-{}-*'.format(year)))
        if region == 'arctic':
            if round_val == 1:
                mask_file_list = sorted(glob.glob('/glade/work/tking/cgnet/QA_xml/round_1/h5/qa*/arctic/netcdfs/data-{}-*'.format(year)))
            elif round_val == 2:
                mask_file_list = sorted(glob.glob('/glade/work/tking/cgnet/QA_xml/round_2/h5/qa*/arctic/netcdfs/data-{}-*'.format(year)))
            elif round_val == 3:
                mask_file_list = sorted(glob.glob('/glade/work/tking/cgnet/QA_xml/round_3/h5/qa*/arctic/netcdfs/data-{}-*'.format(year)))

        if region == 'arctic' and round_val == 2:
            round_val = 1  # Treat arctic round 2 as round 1 for years to line up!
        
        if round_val == 1:
            shifted_year = year - 1
        else:
            shifted_year = year

        # Bug fix from 2000 data being pulled in previously:
        if round_val == 1:
            shifted_year = 2000

        # get underlying data
        tmq_ds = xr.open_dataset(directory_of_underlying_data+'tmq/{}'.format(tmq_dict[shifted_year]))
        psl_ds = xr.open_dataset(directory_of_underlying_data+'psl/{}'.format(psl_dict[shifted_year]))
        ivt_ds = xr.open_dataset(directory_of_underlying_data+'ivt/{}'.format(ivt_dict[shifted_year]))
        pr_ds = xr.open_dataset(directory_of_underlying_data+'pr/{}'.format(pr_dict[shifted_year]))

        # loop through mask files, get corresponding underlying data, and add to temporary file
        for mask_file in mask_file_list[:]:
            time = mask_file.split('/')[-1].split('data-')[1].split('.nc')[0].split('-00-2')[0].split('_')[0]
            time_year = int(time.split('-')[0])  # or = year

            # For round 1, assume that the nc file is ~named~ 2001, but underlying data is 2000
            if round_val == 1:
                time_year = 2001

            time_month = int(time.split('-')[1])
            time_day = int(time.split('-')[2])
            if round_val == 1 or round_val == 2:
                time_hour = 22  # all files were 00
                time_mins = 30
            if round_val == 3:
                time_hour = int(time.split('-')[4])*3-2  # 1.5 hours off of time provided... so subtract 2 hr and add a half hour
                time_mins = 30
                if time_year == 2000:
                    if time_month >= 3:  # Account for leap day in march and later months for year 2000
                        # if time_day < 31 or 30:
                        time_day = time_day+1
                        # if time_day==31 or 30:
                        #     time_day = 1
                        #     time_month = time_month +1
                        # we would need a slightly more robust way of doing this if there were masks incorrectly labelled from the last day of the month,
                        #     but this works for now
            try:
                date_number = date2num(dt.datetime(time_year, time_month, time_day, time_hour, time_mins), 'hours since 1970-01-01')
            except:
                if time_month == 6 and time_day ==31:
                    continue
            # ---------------------------------------------------------------------------------
            # Fix for indexing bugs is now in all_chey_arctic.ipynb and all_chey_antarctic.ipynb,
            # So this fix will not be needed after round 1 data is processed.
            # In this fix, if data is from 2000, adjust days by 4, otherwise, adjust days by 5 (to account for leap days)
            if round_val == 1:
                if time_year == 2000:
                    leap_year_adjustment = 4
                if time_year >= 2001:
                    leap_year_adjustment = 5
                if time_month in [1, 3, 5, 7, 8, 10, 12]:  # months with 31 days
                    days_in_month = 31
                    # adjust year for indexing issue unless last few days in file (because of leap year, these got included in correct file)
                    if time_month == 12 and time_day < (days_in_month - leap_year_adjustment):
                        time_year = time_year - 1
                    else:
                        time_year = time_year - 1
                    if time_day < (days_in_month - leap_year_adjustment):
                        time_day = time_day + leap_year_adjustment
                    else:
                        time_day = ((time_day + leap_year_adjustment) - days_in_month) + 1  # add 4 days for leap year, subtract days in month, add 1 because month starts at 1 not 0.
                        time_month += 1  # use one of the first few days in the next month
                        if time_month == 13:  # no 13th month, so loop back to January
                            time_month = 1
                elif time_month in [4, 6, 9, 11]:  # months with 30 days
                    time_year = time_year - 1
                    days_in_month = 30
                    if time_day < (days_in_month - leap_year_adjustment):
                        time_day = time_day + leap_year_adjustment
                    else:
                        time_day = ((time_day + leap_year_adjustment) - days_in_month) + 1
                        time_month += 1  # use one of the first few days in the next month
                elif time_month == 2:
                    time_year = time_year - 1
                    days_in_month = 28
                    if time_day < (days_in_month - leap_year_adjustment):
                        time_day = time_day + leap_year_adjustment
                    else:
                        time_day = ((time_day + leap_year_adjustment) - days_in_month) + 1
                        time_month += 1  # use one of the first few days in the next month
            # ---------------------------------------------------------------------------------

            # format strings for use in netcdf date attribute
            if time_month < 10:
                time_m_formatted = '0'+str(time_month)
            else:
                time_m_formatted = str(time_month)
            if time_day < 10:
                time_d_formatted = '0'+str(time_day)
            else:
                time_d_formatted = str(time_day)

            # Some files (mostly 2002) start at _0 and sometimes just end with -2 and if repeat date, include _1.nc
            # only increase time index if moving to a new time; otherwise additional masks can be in a new sample_id of the same time index
            if mask_file[-5:-4] == '_':
                sample_id = int(mask_file[-4:-3])
            else:
                sample_id = 0

            if sample_id == 0:
                time_index += 1

            qa_aa_ds = xr.open_dataset(mask_file)

            # reorientation needed for viewing with matplotlib
            qa_aa_ds.ar_masks['phony_dim_0'] = qa_aa_ds.ar_masks['phony_dim_0'][::-1]
            qa_aa_ds.reindex(phony_dim_0=list(reversed(qa_aa_ds.phony_dim_0)))

            # fill in date and datesec
            date[time_index] = str(time_year)+time_m_formatted+time_d_formatted
            if round_val == 1 or round_val == 2:
                datesec[time_index] = '81000'  # corresponds to 22:30
            if round_val == 3:
                datesec[time_index] = str((3600*time_hour)+(60*time_mins))
            # fill in ar_mask
            ar_mask[time_index, sample_id, :, :] = qa_aa_ds.ar_masks
            # get data arrays for underlying data
            pr = xr.DataArray(pr)
            psl = xr.DataArray(psl)
            tmq = xr.DataArray(tmq)
            ivt = xr.DataArray(ivt)

            # fill in underlying data parts of temporary ncfile
            if region == 'antarctic':
                if round_val == 1 or round_val == 2:
                    ncfile['pr'][time_index, :, :] = pr_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').isel(height=0).to_array().dropna('variable').dropna('nvertices').dropna('nb2')[5,:,:,1,1]
                    ncfile['psl'][time_index, :, :] = psl_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('nbnd')[5,:,:,1,1]
                    ncfile['tmq'][time_index, :, :] = tmq_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('nbnd')[5,:,:,1,1]
                    ncfile['ivt'][time_index, :, :] = ivt_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('bound')[7,:,:,1,1]
                if round_val == 3:
                    ncfile['pr'][time_index, :, :] = pr_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').isel(height=0).to_array().dropna('variable').dropna('nvertices').dropna('nb2')[3,:,:,1,1]
                    ncfile['psl'][time_index, :, :] = psl_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('nbnd')[3,:,:,1,1]
                    ncfile['tmq'][time_index, :, :] = tmq_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('nbnd')[3,:,:,1,1]
                    ncfile['ivt'][time_index, :, :] = ivt_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('bound')[6,:,:,1,1]
            if region == 'arctic':
                try:
                    ncfile['pr'][time_index, :, :] = pr_ds['pr'].sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True)).values
                except:
                    _, index = np.unique(pr_ds['time'], return_index=True)
                    pr_ds = pr_ds.isel(time=index)
                    ncfile['pr'][time_index, :, :] = pr_ds['pr'].sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True),method='nearest').values
                try:
                    ncfile['psl'][time_index, :, :] = psl_ds['psl'].sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True)).values
                except:
                    _, index = np.unique(psl_ds['time'], return_index=True)
                    psl_ds = psl_ds.isel(time=index)
                    ncfile['psl'][time_index, :, :] = psl_ds['psl'].sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').values
                try:
                    ncfile['tmq'][time_index, :, :] = tmq_ds['tmq'].sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True)).values
                except:
                    _, index = np.unique(tmq_ds['time'], return_index=True)
                    tmq_ds = tmq_ds.isel(time=index)
                    ncfile['tmq'][time_index, :, :] = tmq_ds['tmq'].sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').values
                try:
                    ncfile['ivt'][time_index, :, :] = ivt_ds['ivt'].sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True)).values    
                except:
                    _, index = np.unique(ivt_ds['time'], return_index=True)
                    ivt_ds = ivt_ds.isel(time=index)
                    ncfile['ivt'][time_index, :, :] = ivt_ds['ivt'].sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest' ).values                            
            print('all data added for {}'.format(str(time_year)+' '+time_m_formatted+' '+time_d_formatted))

            # time reported on netcdf frame:
            time_val = cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True) - cftime.DatetimeNoLeap(1970, 1, 1, 0, 0, 0, 0, has_year_zero=True)
            ncfile['time'][time_index] = (time_val.days * 24) + (time_val.seconds / 3600)

            # print(time_m_formatted)
            # if time_m_formatted == '04':  # TODO: BREAK FOR TESTING PURPOSES on 4th layer- REMOVE ME WHEN ACTUALLY RUNNING!!!
            #     # ncfile.close()
            #     break

        # write netcdf
        ncfile.close()
    print('wrote netcdf at {}'.format(dt.datetime.now()))

starting at 2025-05-08 10:55:26.640462
wrote netcdf at 2025-05-08 10:55:26.697333


Once these files are all created, run split_script with batch_split.sh. This creates one file per timestep.

Then, there's a bunch of nco commands shown below to run on every file (where `$i` is the filename) that has been split:

```
foreach i (*.nc)
    echo $i
    Command(s) below (do in chunks as shown below, either one or a few commands at a time)
    end

ncrename -O -v ar_mask,LABELS $i
ncatted -O -a coordinates,ivt,d,, $i
ncatted -O -a coordinates,tmq,d,, $i
ncatted -O -a coordinates,pr,d,, $i
ncatted -O -a coordinates,psl,d,, $i

ncap2 -s “LABELS=int(LABELS)” $i labels/$i
    * Make sure use the right quotes

cd labels

ncrename -v lon,longitude -v lat,latitude $i

ncrename -v x,lon -v y,lat $i
ncrename -d x,lon -d y,lat $i

ncatted -O -a flag_meanings,LABELS,d,, $i
ncatted -O -a flag_values,LABELS,d,, $i
ncatted -O -a long_name,LABELS,d,, $i
ncatted -O -a standard_name,LABELS,d,, $i
ncatted -a description,LABELS,c,c,“0: Background, 1: Atmospheric_River” $i
    * Make sure use the right quotes

ncks -d sample_id,0 $i sample_0/$i
    Do through sample_5
    Determine which files are all zeroes & remove (these masks were not written)
        Eg, for sample_id 1: ls ../../../../../../round_3/h5/qa1/antarctic/*_1.h5
        Ls ../../../../../../round_3/h5/qa1/antarctic/*-?.h5
        Find which dates correspond with which file number
        Move those to a keep dir
        Remove everything else and then remove keep dir
ncap2 -s tmq=float(tmq) -s latitude=float(latitude) -s longitude=float(longitude) -s pr=float(pr) -s psl=float(psl) -s ivt=float(ivt) -s datesec=float(datesec) -s time=float(time) -s date=float(date) $i  ncap/$i # add quotes!
ncwa -O -a sample_id $i $i

ncpdq -M dbl_flt "$f" "../$f"

```

And finally separate randomly into test/train directories (20/80)!

## make INFERENCE dataset (unlabeled; no masks included)

In [None]:
if inference:
    print('starting at {}'.format(dt.datetime.now()))
    directory_of_underlying_data = "/glade/derecho/scratch/tking/cgnet/high_lat_QC/from_nersc/2dlatlon/sh_polar/renamed/"
    time_index = -1
    round_val = 3 # change this value to indicate whether or not bug fix from round 1 is used

    # The years below correspond to the mask's listed years (ie, incorrect years from round 1)
    # For processing data, I recommend running one year at a time and then renaming temp.nc to match whatever that year is
    for year in [2000]:
        if round_val == 3:
            mask_file_list = sorted(glob.glob('/glade/work/tking/cgnet/QA_xml/round_3/h5/qa*/antarctic/netcdfs/data-{}-*'.format(year)))
            train_or_test_dates = []
            for mask in mask_file_list:
                train_or_test_dates.append(mask.split('/')[-1].split('data-')[1].split('.nc')[0].split('-00-2')[0].split('_')[0])
            shifted_year = year

        # determine list of all possible dates and then remove dates that were previously used
        possible_dates = []
        for month in range(1, 13):
            if month < 10:
                month = '0'+str(month)
            else:
                month = str(month)
            for day in range(1, 29):
                if day < 10:
                    day = '0'+str(day)
                else:
                    day = str(day)
                for hour in range(1, 9):
                    hour = str(hour) + '0'
                    possible_dates.append(f'data-2000-{month}-{day}-{hour}')
        not_in_labeled_dates = []
        for specific_time in possible_dates:
            if possible_dates not in train_or_test_dates:
                not_in_labeled_dates.append(specific_time)

        # get underlying data
        tmq_ds = xr.open_dataset(directory_of_underlying_data+'tmq/{}'.format(tmq_dict[shifted_year]))
        psl_ds = xr.open_dataset(directory_of_underlying_data+'psl/{}'.format(psl_dict[shifted_year]))
        ivt_ds = xr.open_dataset(directory_of_underlying_data+'ivt/{}'.format(ivt_dict[shifted_year]))
        pr_ds = xr.open_dataset(directory_of_underlying_data+'pr/{}'.format(pr_dict[shifted_year]))

        # loop through times, get corresponding underlying data, and add to temporary file
        for time in not_in_labeled_dates:
            time = time.split('-')[1:]
            print(time)
            time_year = int(time[0])

            time_month = int(time[1])
            time_day = int(time[2])
            if round_val==3:
                time_hour = int(time[3][0])*3-2  # 1.5 hours off of time provided... so subtract 2 hr and add a half hour
                time_mins = 30
                if time_year == 2000:
                    if time_month >= 3:  # Account for leap day in march and later months for year 2000
                        # if time_day < 31 or 30:
                        time_day = time_day+1

            date_number = date2num(dt.datetime(time_year, time_month, time_day, time_hour, time_mins), 'hours since 1970-01-01')

            # ---------------------------------------------------------------------------------

            # format strings for use in netcdf date attribute
            if time_month < 10:
                time_m_formatted = '0'+str(time_month)
            else:
                time_m_formatted = str(time_month)
            if time_day < 10:
                time_d_formatted = '0'+str(time_day)
            else:
                time_d_formatted = str(time_day)

            sample_id = 0

            if sample_id == 0:
                time_index += 1

            # fill in date and datesec
            date[time_index] = str(time_year)+time_m_formatted+time_d_formatted
            if round_val == 1 or round_val == 2:
                datesec[time_index] = '81000'  # corresponds to 22:30
            if round_val == 3:
                datesec[time_index] = str((3600*time_hour)+(60*time_mins))
            # get data arrays for underlying data
            pr = xr.DataArray(pr)
            psl = xr.DataArray(psl)
            tmq = xr.DataArray(tmq)
            ivt = xr.DataArray(ivt)

            # fill in underlying data parts of temporary ncfile
            if round_val == 1 or round_val == 2:
                ncfile['pr'][time_index, :, :] = pr_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').isel(height=0).to_array().dropna('variable').dropna('nvertices').dropna('nb2')[5,:,:,1,1]
                ncfile['psl'][time_index, :, :] = psl_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('nbnd')[5,:,:,1,1]
                ncfile['tmq'][time_index, :, :] = tmq_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('nbnd')[5,:,:,1,1]
                ncfile['ivt'][time_index, :, :] = ivt_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('bound')[7,:,:,1,1]
            if round_val==3:
                ncfile['pr'][time_index, :, :] = pr_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').isel(height=0).to_array().dropna('variable').dropna('nvertices').dropna('nb2')[3,:,:,1,1]
                ncfile['psl'][time_index, :, :] = psl_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('nbnd')[3,:,:,1,1]
                ncfile['tmq'][time_index, :, :] = tmq_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('nbnd')[3,:,:,1,1]
                ncfile['ivt'][time_index, :, :] = ivt_ds.sel(time=cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True), method='nearest').to_array().dropna('variable').dropna('nvertices').dropna('bound')[6,:,:,1,1]

            print('all data added for {}'.format(str(time_year)+time_m_formatted+time_d_formatted))

            # time reported on netcdf frame:
            time_val = cftime.DatetimeNoLeap(time_year, time_month, time_day, time_hour, time_mins, 0, 0, has_year_zero=True) - cftime.DatetimeNoLeap(1970, 1, 1, 0, 0, 0, 0, has_year_zero=True)
            ncfile['time'][time_index] = (time_val.days * 24) + (time_val.seconds / 3600)
        # write netcdf
        ncfile.close()
    print('wrote netcdf at {}'.format(dt.datetime.now()))

starting at 2025-05-07 16:58:13.060281
['2000', '01', '01', '10']
all data added for 20000101
['2000', '01', '01', '20']
all data added for 20000101
['2000', '01', '01', '30']
all data added for 20000101
['2000', '01', '01', '40']
all data added for 20000101
['2000', '01', '01', '50']
all data added for 20000101
['2000', '01', '01', '60']
all data added for 20000101
['2000', '01', '01', '70']
all data added for 20000101
['2000', '01', '01', '80']
all data added for 20000101
['2000', '01', '02', '10']
all data added for 20000102
['2000', '01', '02', '20']
all data added for 20000102
['2000', '01', '02', '30']
all data added for 20000102
['2000', '01', '02', '40']
all data added for 20000102
['2000', '01', '02', '50']
all data added for 20000102
['2000', '01', '02', '60']
all data added for 20000102
['2000', '01', '02', '70']
all data added for 20000102
['2000', '01', '02', '80']
all data added for 20000102
['2000', '01', '03', '10']
all data added for 20000103
['2000', '01', '03', '20']