Converting .mat files to netcdf following NSIDC requirements

Required attributes:
* Global NetCDF Attributes
  * "title"
  * "summary"
  * "id" (DOI)
  * "licence" = `Access Constraint: These data are freely, openly, and fully accessible, provided that you are logged into your NASA Earthdata profile (https://urs.earthdata.nasa.gov/). Use Constraint: These data are freely, openly, and fully available to use without restrictions, provided that you cite the data according to the recommended citation at https://nsidc.org/about/use_copyright.html. For more information on the NASA EOSDIS Data Use Policy, see https://earthdata.nasa.gov/earth-observation-data/data-use-policy.`
  * "acknowledgment" - funding source, program, grant number e.g. `These data are produced and supported by the NASA National Snow and Ice Data Center Distributed Active Archive Center.`
  * "product_version" e.g. `v1.0`
  * "source" = `Where the data is from etc`
  * "instrument" = `The Name of the Instrument etc` use [Global Change Master Directory](https://earthdata.nasa.gov/earth-observation-data/find-data/idn/gcmd-keywords) standards?
  * "platform" use [Global Change Master Directory](https://earthdata.nasa.gov/earth-observation-data/find-data/idn/gcmd-keywords) standards?
  * "keywords" use [Global Change Master Directory (GCMD) nomenclature](https://wiki.earthdata.nasa.gov/display/CMR/GCMD+Keyword+Access)
  * "references" 
  * "history"
  * "metadata_link" DOI for data product, same as "id"
  * "date_created" date that this NetCDF was created
  * "cdm_data_type" (grid, image, or swath)
  * "processing_level" e.g. Level 3
  * "comment" more info/metadata
* CRS (Coordinate Reference System) Variable - proj4 string, or other format readable by GDAL and other GIS
  * "crs" - here we are using EPSG:32612, which is a CRS used by other SnowEx2020 datasets collected at Grand Mesa, CO
* Dimension Variables (time, x, y, ...)
  * "standard_name" - use `time`, `projection_x_coordinate`, and `projection_y_coordinate` from [cf conventions](https://cfconventions.org/Data/cf-standard-names/79/build/cf-standard-name-table.html)
  * "coverage_content_type" - use `image`, `coordinate`, `coordinate`
  * "long_name" - e.g. `ANSI date`, `x`, `y`
  * "units" - e.g. `days since 1970-01-01 00:00:00`, `meters`, `meters`
  * "valid_range" - `No valid range specified for time`, `-3850000.0, 3750000.0`, `-5350000.0, 5850000.0`
* Data Variable NetCDF Attributes
  * "_FillValue" e.g. -9999
  * "units" e.g. K
  * "long_name" e.g. Brightness Temperature
  * "standard_name" e.g. brightness_temperature
  * "grid_mapping* e.g. "crs" (name of the crs variable for this data variable)
  * "valid_range" e.g. 0, 9999
  * "flag_values" (leave blank, we don't have flag values)
  * "flag_meanings" (leave blank, we don't have flag values)

In [1]:
#!pip install mat73

In [2]:
import numpy as np
from matplotlib import pyplot as plt
import xarray as xr
import rioxarray as rxr
import scipy.io as sio
import datetime
import mat73

---
**Thermal infrared images**

Filepath list of .mat files

In [3]:
ir_files = [r'C:\Users\steve\OneDrive\Documents\School Stuff\UW\Mountain Hydrology Research Group\IR_PLANE_PROCESSED\mat\SNOWEX2020_IR_PLANE_2020Feb08_mosaicked.mat',
            r'C:\Users\steve\OneDrive\Documents\School Stuff\UW\Mountain Hydrology Research Group\IR_PLANE_PROCESSED\mat\SNOWEX2020_IR_PLANE_2020Feb10_mosaicked.mat',
            r'C:\Users\steve\OneDrive\Documents\School Stuff\UW\Mountain Hydrology Research Group\IR_PLANE_PROCESSED\mat\SNOWEX2020_IR_PLANE_2020Feb11_mosaicked.mat',
            r'C:\Users\steve\OneDrive\Documents\School Stuff\UW\Mountain Hydrology Research Group\IR_PLANE_PROCESSED\mat\SNOWEX2020_IR_PLANE_2020Feb12_mosaicked.mat'
            ]

Load a .mat file from the filepath list.

In [4]:
aircraft_ir_data_mat = sio.loadmat(ir_files[0])  

Display the contents of what we have loaded. This is a dictionary containing metadata and numpy arrays of the data variables.

In [5]:
aircraft_ir_data_mat

{'__header__': b'MATLAB 5.0 MAT-file, Platform: MACI64, Created on: Wed Aug 19 18:54:43 2020',
 '__version__': '1.0',
 '__globals__': [],
 'Eutm': array([[218150, 218155, 218160, ..., 238990, 238995, 239000]]),
 'Nutm': array([[4314100],
        [4314105],
        [4314110],
        ...,
        [4331990],
        [4331995],
        [4332000]]),
 'STCtemp': array([[[      nan,       nan,       nan, ...,       nan,       nan,
                nan],
         [      nan,       nan,       nan, ...,       nan,       nan,
                nan],
         [      nan,       nan,       nan, ...,       nan,       nan,
                nan],
         ...,
         [      nan,       nan,       nan, ...,       nan,       nan,
                nan],
         [      nan,       nan,       nan, ...,       nan,       nan,
                nan],
         [      nan,       nan,       nan, ...,       nan,       nan,
                nan]],
 
        [[      nan,       nan,       nan, ...,       nan,       nan,
  

Create an xarray dataset with this data

In [6]:
# convert time in epoch to isoformat
time = [datetime.datetime.utcfromtimestamp(this_time).isoformat() for this_time in aircraft_ir_data_mat['time'].squeeze()]


# create dataset
ds = xr.Dataset({
    'STCtemp': xr.DataArray(
                data   = aircraft_ir_data_mat['STCtemp'], # scaled IR temperature data
                dims   = ['y', 'x', 'time'],
                coords = {'x': aircraft_ir_data_mat['Eutm'].squeeze(), # easting coordinates (MGRS-UTM, m
                          'y': aircraft_ir_data_mat['Nutm'].squeeze(), # northing grid coordinates (MGRS-UTM, 
                          'time': time}, # time in isoformat

                attrs  = {
                    '_FillValue'  : np.nan,
                    'units'       : 'Celsius',
                    'description' : 'land and snow surface brightness temperature - no emissivity correction',
                    'timezone'    : 'time in UTC'
                    }
                ),
    #'zDEM': xr.DataArray(
    #            data   = aircraft_data_mat['zDEM'], # elevation (meters)
    #            dims   = ['y', 'x'],
    #            coords = {'x': aircraft_data_mat['Eutm'].squeeze(),
    #                      'y': aircraft_data_mat['Nutm'].squeeze()},
    #            attrs  = {
    #                '_FillValue'  : np.nan,
    #                'units'       : 'meters',
    #                'description' : 'digital elevation map used for georectification',
    #                'source'      : 'https://doi.org/10.5066/F7PR7TFT',
    #                'horizontal datum' : 'WGS84',
    #                'vertical datum' : 'EGM96',
    #                'original resolution' : '30 m / 1 arc-second'
    #                }
    #        )
},
        attrs = {'description': 'Airborne thermal infrared imagery from University of Washington Applied Physics Lab Compact Airborne System for Imaging the Environment instrument. Imagery of Grand Mesa, Colorado from SnowEx 2020 field campaign.'}
    )



In [7]:
time

['2020-02-08T15:07:17.946189',
 '2020-02-08T15:16:44.712175',
 '2020-02-08T15:28:32.337137',
 '2020-02-08T15:43:02.133228',
 '2020-02-08T15:55:59.022164',
 '2020-02-08T16:07:54.639167',
 '2020-02-08T18:07:37.808115',
 '2020-02-08T18:19:15.110315',
 '2020-02-08T18:29:16.175215',
 '2020-02-08T18:40:56.807211',
 '2020-02-08T18:50:20.909175',
 '2020-02-08T19:01:09.260000',
 '2020-02-08T19:06:22.280167',
 '2020-02-08T19:18:49.199265',
 '2020-02-08T19:31:35.765195',
 '2020-02-08T19:44:28.325184',
 '2020-02-08T19:56:16.283170']

Specify our spatial dimensions and coordinate reference system

In [8]:
ds.rio.set_spatial_dims('y', 'x', inplace=True)
ds.rio.write_crs("epsg:26913", inplace=True) # UTM Zone 13N / NAD83
ds = ds.transpose('time', 'y', 'x').rio.reproject("epsg:32612") # epsg:32612 is used throughout the other SnowEx datasets

For each timestamp, save a separate netcdf of geotiff file.

In [9]:
for t, _ in enumerate(time):
    time_str = time[t].replace(':','').split('.')[0]
    ds.isel(time=t).to_netcdf('SNOWEX2020_IR_PLANE_2020Feb08_mosaicked_{}.nc'.format(time_str))
    ds.isel(time=t).rio.to_raster('SNOWEX2020_IR_PLANE_2020Feb08_mosaicked_{}.tif'.format(time_str))

In [10]:
# spatial ref shouldn't have coords of time??? why is spatial_ref a coordinate??

In [11]:
newds = xr.open_dataset('SNOWEX2020_IR_PLANE_2020Feb08_mosaicked_2020-02-08T195616.nc')

In [12]:
newds

---

**Visible images**

Filepath list of .mat files

In [13]:
vis_files = [r'C:\Users\steve\OneDrive\Documents\School Stuff\UW\Mountain Hydrology Research Group\IR_PLANE_PROCESSED\mat\SNOWEX2020_EO_PLANE_2020Feb08_mosaicked.mat']

Load a .mat file from the filepath list. (need to use the [mat73](https://github.com/skjerns/mat7.3) library for this version of matlab file)

In [14]:
aircraft_vis_data_mat = mat73.loadmat(vis_files[0])

Display the contents of what we have loaded. This is a dictionary containing metadata and numpy arrays of the data variables.

In [15]:
aircraft_vis_data_mat

{'SRGB': array([[[[nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan]],
 
         [[nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan]],
 
         [[nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan]],
 
         ...,
 
         [[nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan]],
 
         [[nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan]],
 
         [[nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan]]],
 
 
        [[[nan, nan, nan, ..., nan, nan, nan],
          [nan, nan, nan, ..., nan, nan, nan],
          [nan,

Create an xarray dataset with this data

In [16]:
# convert time in epoch to isoformat
time = [datetime.datetime.utcfromtimestamp(this_time).isoformat() for this_time in aircraft_vis_data_mat['time'].squeeze()]

# scale image color data to between 0 and 1
#original_max = np.nanmax(aircraft_vis_data_mat['SRGB'])
#original_min = np.nanmin(aircraft_vis_data_mat['SRGB'])
#scaled_RGB_data = (aircraft_vis_data_mat['SRGB'] - original_min) / (original_max - original_min) * 255
#scaled_RGB_data = scaled_RGB_data.astype('int')

# create dataset
ds = xr.Dataset({
    'SRGB': xr.DataArray(
                data   = aircraft_vis_data_mat['SRGB'], # Red, Green, Blue image bands
                dims   = ['y', 'x', 'band', 'time'],
                coords = {'x': aircraft_vis_data_mat['Xs'][0,:], # easting coordinates (MGRS-UTM, m)
                          'y': aircraft_vis_data_mat['Ys'][:,0], # northing grid coordinates (MGRS-UTM, m)
                          'band': ['r', 'g', 'b'], # labels for red, green, blue image bands
                          'time': time}, # time in isoformat

                attrs  = {
                    '_FillValue'  : np.nan,
                    'units'       : 'digital numbers',
                    'description' : 'visible imagery',
                    'timezone'    : 'time in UTC'
                    }
                ),
    #'zDEM': xr.DataArray(
    #            data   = aircraft_data_mat['zDEM'], # elevation (meters)
    #            dims   = ['y', 'x'],
    #            coords = {'x': aircraft_data_mat['Eutm'].squeeze(),
    #                      'y': aircraft_data_mat['Nutm'].squeeze()},
    #            attrs  = {
    #                '_FillValue'  : np.nan,
    #                'units'       : 'meters',
    #                'description' : 'digital elevation map used for georectification',
    #                'source'      : 'https://doi.org/10.5066/F7PR7TFT',
    #                'horizontal datum' : 'WGS84',
    #                'vertical datum' : 'EGM96',
    #                'original resolution' : '30 m / 1 arc-second'
    #                }
    #        )
},
        attrs = {'description': 'Airborne visible imagery from University of Washington Applied Physics Lab Compact Airborne System for Imaging the Environment instrument. Imagery of Grand Mesa, Colorado from SnowEx 2020 field campaign.'}
    )



Specify our spatial dimensions and coordinate reference system

In [17]:
ds.rio.set_spatial_dims('y', 'x', inplace=True)
ds.rio.write_crs("epsg:26913", inplace=True) # UTM Zone 13N / NAD83

For each timestamp, save a separate netcdf or geotiff file.

In [18]:
for t, _ in enumerate(time):
    time_str = time[t].replace(':','').split('.')[0]
    ds.isel(time=t,band=2).transpose('y', 'x').rio.reproject("epsg:32612").to_netcdf('SNOWEX2020_EO_PLANE_2020Feb08_mosaicked_{}.nc'.format(time_str))
    _ds = ds.isel(time=t,band=2).SRGB.transpose('y', 'x').rio.reproject("epsg:32612")
    _ds.rio.to_raster('SNOWEX2020_EO_PLANE_2020Feb08_mosaicked_{}.tif'.format(time_str))