The aop_h5refl2array function provided by NEON to open their h5 files has more parts to it than I need. Here, I'm going to simplify the function. I want the function to:
1. Access the file's reflectance data
2. Access the file's metadata
3. Pull out the file's no data value
4. Pull out the file's scale factor
5. Find and replace no data values with NA
6. Apply the scale factor to the data
7. Match the reflectance arrays (bands) with the corresponding wavelength in a pandas dataframe.

I'm going to build the function step by step and then put it all together.

In [2]:
# Import required packages
import os

import pandas as pd
import earthpy as et
import numpy as np
import h5py
from shapely.geometry import box


# Set working directory
directory_path = os.path.join(et.io.HOME, "earth-analytics")
os.chdir(directory_path)

In [3]:
# Create path to hdf5 file
h5_path = os.path.join("data", "earthpy-downloads", "NEON_D13_NIWO_DP3_454000_4432000_reflectance.h5")

In [4]:
# Read in file
h5 = h5py.File(h5_path, "r")

In [5]:
# Access the reflectance "folder"
niwo_refl = h5["NIWO"]["Reflectance"]

# Print result
print(niwo_refl)

<HDF5 group "/NIWO/Reflectance" (2 members)>


The two members of the HDF5 group /NIWO/Reflectance are Metadata and Reflectance_Data. Let's save the reflectance data as the variable niwo_reflArray:

In [6]:
# Assign reflectance array to a variable
niwo_refl_array = niwo_refl["Reflectance_Data"]

# Assgign reflectance values to a variable
refl_raw = niwo_refl["Reflectance_Data"][:]

NEON hyperspectral data contain around 426 spectral bands, and when working with tiled data, the spatial dimensions are 1000 x 1000, where each pixel represents 1 meter. Now let's take a look at the wavelength values. First, we will extract wavelength information from the niwo_refl variable that we created:

In [7]:
# Define the wavelengths variable
wavelengths = niwo_refl["Metadata"]["Spectral_Data"]["Wavelength"]

In [8]:
# Extract no data value & scale factor
scale_factor = niwo_refl_array.attrs["Scale_Factor"]
no_data_value = niwo_refl_array.attrs["Data_Ignore_Value"]

In [9]:
# Apply no data value
refl_clean = refl_raw.astype(float)
arr_size = refl_clean.shape
if no_data_value in refl_raw:
    print("% No Data: ",np.round(np.count_nonzero(refl_clean==metadata["data ignore value"])*100/(arr_size[0]*arr_size[1]*arr_size[2]),1))
    nodata_ind = np.where(refl_clean==no_data_value)
    refl_clean[nodata_ind]=np.nan

In [10]:
# Apply scale factor
refl_array = refl_clean/scale_factor

In [11]:
refl_array

array([[[0.    , 0.0099, 0.0076, ..., 0.    , 0.    , 0.    ],
        [0.    , 0.0026, 0.0032, ..., 0.    , 0.    , 0.    ],
        [0.0777, 0.0037, 0.0018, ..., 0.    , 0.    , 0.    ],
        ...,
        [0.0168, 0.0213, 0.0138, ..., 0.0195, 0.0128, 0.0143],
        [0.0205, 0.0338, 0.0167, ..., 0.0498, 0.0406, 0.0348],
        [0.0137, 0.0281, 0.0124, ..., 0.0339, 0.0311, 0.0285]],

       [[0.    , 0.0036, 0.0002, ..., 0.    , 0.    , 0.    ],
        [0.    , 0.0055, 0.0086, ..., 0.0009, 0.    , 0.    ],
        [0.0706, 0.0276, 0.0025, ..., 0.0017, 0.0038, 0.    ],
        ...,
        [0.0171, 0.0195, 0.0088, ..., 0.0072, 0.0087, 0.0035],
        [0.0046, 0.0163, 0.0099, ..., 0.0099, 0.0091, 0.0065],
        [0.0944, 0.0221, 0.021 , ..., 0.0125, 0.0133, 0.0078]],

       [[0.0061, 0.0033, 0.0051, ..., 0.    , 0.    , 0.    ],
        [0.0098, 0.0002, 0.0102, ..., 0.    , 0.    , 0.    ],
        [0.0714, 0.012 , 0.0154, ..., 0.001 , 0.    , 0.0002],
        ...,
        [0.0

In [12]:
refl_array.shape

(1000, 1000, 426)

Now we'll match the wavelength values to the reflectance arrays in a pandas dataframe.

In [13]:
# Loop through reflectance array again to grab whole arrays
# Create empty list
full_refl_array = []
for band in np.arange(refl_array.shape[2]):
        refl_band = refl_array[:,:,band]
        full_refl_array.append(refl_band)

In [14]:
# Make another dataframe with wavelength and full reflectance array
refl_array_df = pd.DataFrame()
refl_array_df["wavelength"] = wavelengths
refl_array_df["reflectance"] = full_refl_array
refl_array_df

Unnamed: 0,wavelength,reflectance
0,381.666992,"[[0.0, 0.0, 0.0777, 0.0011, 0.0, 0.0, 0.0538, ..."
1,386.674988,"[[0.0099, 0.0026, 0.0037, 0.0016, 0.0003, 0.01..."
2,391.683014,"[[0.0076, 0.0032, 0.0018, 0.0058, 0.0094, 0.00..."
3,396.691101,"[[0.0073, 0.006, 0.0014, 0.0059, 0.0042, 0.010..."
4,401.699097,"[[0.0062, 0.006, 0.0056, 0.0064, 0.0075, 0.012..."
...,...,...
421,2490.045410,"[[0.0, 0.0, 0.0, 0.0071, 0.0, 0.0, 0.0097, 0.0..."
422,2495.053223,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0049, 0.0122, 0.0..."
423,2500.061279,"[[0.0, 0.0, 0.0, 0.0, 0.0025, 0.0054, 0.0129, ..."
424,2505.069336,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0043, 0.0192..."


In [None]:
# Expand dataframe
df_explode = (refl_array_df.explode("reflectance")).explode("reflectance")

We'll use this as a stopping point and create the function using the above steps.

In [None]:
def clean_h5_refl_df(file_path):
    """Reads in a NEON AOP reflectance h5 file and returns a pandas dataframe
        containing one cleaned reflectance array (no data value and scale 
        factor applied) value per row matched to the corresponding wavelength 
        along with a site column.

    --------
    Parameters
    file_path: string
        Full or relative path and name of reflectance h5 file
    --------
    Returns 
    --------
    df_explode: pandas DataFrame
        DataFrame containing clean reflectance array values matched to their
        corresponding wavelengths and a site column.
    """
    
    # Read in file
    h5 = h5py.File(file_path, "r")
    
    # Get the site name
    file_attrs_string = str(list(h5.items()))
    file_attrs_string_split = file_attrs_string.split("'")
    sitename = file_attrs_string_split[1]

    # Access the reflectance "folder"
    site_refl = h5["NIWO"]["Reflectance"]
    
    # Assign reflectance array to a variable
    site_refl_array = site_refl["Reflectance_Data"]

    # Assgign reflectance values to a variable
    refl_raw = site_refl["Reflectance_Data"][:]
    
    # Define the wavelengths variable
    wavelengths = site_refl["Metadata"]["Spectral_Data"]["Wavelength"]
    
    # Extract no data value & scale factor
    scale_factor = site_refl_array.attrs["Scale_Factor"]
    no_data_value = site_refl_array.attrs["Data_Ignore_Value"]
    
    # Apply no data value
    refl_clean = refl_raw.astype(float)
    arr_size = refl_clean.shape
    if no_data_value in refl_raw:
        print("% No Data: ",np.round(np.count_nonzero(refl_clean==metadata["data ignore value"])*100/(arr_size[0]*arr_size[1]*arr_size[2]),1))
        nodata_ind = np.where(refl_clean==no_data_value)
        refl_clean[nodata_ind]=np.nan
    
    # Apply scale factor
    refl_array = refl_clean/scale_factor
    
    # Loop through reflectance array again to grab whole arrays
    # Create empty list
    full_refl_array = []
    for band in np.arange(refl_array.shape[2]):
        refl_band = refl_array[:,:,band]
        full_refl_array.append(refl_band)
        
    # Make dataframe with wavelength, full reflectance array, and site column
    refl_array_df = pd.DataFrame()
    refl_array_df["wavelength"] = wavelengths
    refl_array_df["reflectance"] = full_refl_array
    refl_array_df["site"] = sitename
    
    # Expand reflectance arrays so that there is one reflectance value from 
    # the array per row
    df_explode = (refl_array_df.explode("reflectance")).explode("reflectance")
    
    return df_explode

In [None]:
# Test the function
func_test = clean_h5_refl_df(h5_path)

In [None]:
# Look at test output
func_test