# CALIPSO HDF4 to HDF5 Conversion Demo

This notebook demonstrates the conversion of CALIPSO HDF4 files to HDF5 format using the `calipso_tool` package.

Also prints the variables that are within the file for later selection.

[DATA SOURCE](https://asdc.larc.nasa.gov/project/CALIPSO/CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20_V4-20)

In [1]:
# Import required libraries
from pathlib import Path
import h5py
import numpy as np
from calipso_tool.converter import h4_to_h5

In [2]:
# Define input and output paths
input_file = Path("CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.hdf")
output_file = input_file.with_suffix(".h5")

# Check if input file exists
if input_file.exists():
    print(f"Input file found: {input_file}")
else:
    print(f"Input file not found: {input_file}")
    print("Please ensure the HDF4 file is in the current directory")

Input file found: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.hdf


In [3]:
# Perform the HDF4 to HDF5 conversion
try:
    print(f"Converting {input_file} to HDF5 format...")
    result = h4_to_h5(input_file, output_file)
    print(f"Conversion successful! Output file: {result}")
except Exception as e:
    print(f"Conversion failed: {e}")

Converting CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.hdf to HDF5 format...
Conversion successful! Output file: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.h5


In [4]:
# Verify the conversion by reading the HDF5 file
if output_file.exists():
    with h5py.File(output_file, 'r') as f:
        print("HDF5 file structure:")
        print("="*50)
        
        def print_structure(name, obj):
            indent = name.count('/') * '  '
            if isinstance(obj, h5py.Dataset):
                print(f"{indent}{name.split('/')[-1]} - Dataset {obj.shape} {obj.dtype}")
            elif isinstance(obj, h5py.Group):
                print(f"{indent}{name.split('/')[-1]}/")
        
        f.visititems(print_structure)
else:
    print("Output file not found. Conversion may have failed.")

HDF5 file structure:
AOD_63_Percent_Below - Dataset (85, 72) >f4
AOD_90_Percent_Below - Dataset (85, 72) >f4
AOD_Mean - Dataset (85, 72) >f4
AOD_Mean_Dust - Dataset (85, 72) >f4
AOD_Mean_Elevated_Smoke - Dataset (85, 72) >f4
AOD_Mean_Polluted_Dust - Dataset (85, 72) >f4
Aerosol_Type - Dataset (85, 72, 208, 7) >i2
Altitude_Midpoint - Dataset (1, 208) >f4
Days_Of_Month_Observed - Dataset (85, 72) >u4
Extinction_Coefficient_532_Mean - Dataset (85, 72, 208) >f4
Extinction_Coefficient_532_Mean_Dust - Dataset (85, 72, 208) >f4
Extinction_Coefficient_532_Mean_Elevated_Smoke - Dataset (85, 72, 208) >f4
Extinction_Coefficient_532_Mean_Polluted_Dust - Dataset (85, 72, 208) >f4
Extinction_Coefficient_532_Percentiles - Dataset (85, 72, 208, 11) >f4
Extinction_Coefficient_532_Standard_Deviation - Dataset (85, 72, 208) >f4
Extinction_Coefficient_532_Standard_Deviation_Dust - Dataset (85, 72, 208) >f4
Extinction_Coefficient_532_Standard_Deviation_Elevated_Smoke - Dataset (85, 72, 208) >f4
Extinction_

In [5]:
# Explore specific datasets in the HDF5 file
if output_file.exists():
    with h5py.File(output_file, 'r') as f:
        # List all top-level groups
        print("Top-level groups:")
        for key in f.keys():
            print(f"  - {key}")
        
        # Example: Access a specific dataset (adjust path as needed)
        # This will depend on the actual structure of your CALIPSO file
        # Common CALIPSO variables might include:
        # - Extinction_Coefficient_532
        # - Temperature
        # - Pressure
        # - Latitude
        # - Longitude

Top-level groups:
  - metadata_t
  - metadata
  - Longitude_Midpoint
  - fakeDim0
  - fakeDim1
  - Latitude_Midpoint
  - fakeDim3
  - Altitude_Midpoint
  - fakeDim5
  - Pressure_Mean
  - Pressure_Standard_Deviation
  - Temperature_Mean
  - Temperature_Standard_Deviation
  - Relative_Humidity_Mean
  - Relative_Humidity_Standard_Deviation
  - Tropopause_Height_Minimum
  - Tropopause_Height_Maximum
  - Tropopause_Height_Median
  - Tropopause_Height_Mean
  - Tropopause_Height_Standard_Deviation
  - Meteorological_Profiles_Averaged
  - Surface_Elevation_Minimum
  - Surface_Elevation_Maximum
  - Surface_Elevation_Median
  - Land_Samples
  - Water_Samples
  - Days_Of_Month_Observed
  - Initial_Aerosol_Lidar_Ratio_532
  - fakeDim49
  - Initial_Aerosol_Lidar_Ratio_Uncertainty_532
  - Extinction_Coefficient_532_Mean
  - Extinction_Coefficient_532_Standard_Deviation
  - Extinction_Coefficient_532_Percentiles
  - fakeDim61
  - Samples_Searched
  - Samples_Aerosol_Detected_Accepted
  - Samples_Aero

In [6]:
# Example: Read and display metadata
if output_file.exists():
    with h5py.File(output_file, 'r') as f:
        print("File attributes:")
        for attr in f.attrs:
            print(f"  {attr}: {f.attrs[attr]}")

File attributes:
  coremetadata: b'\nGROUP                  = INVENTORYMETADATA\n  GROUPTYPE            = MASTERGROUP\n\n  GROUP                  = GRANULE\n\n    OBJECT                 = GRANULEID\n      NUM_VAL              = 1\n      VALUE                = "CAL_LID_L3_Tropospheric_APro_AllSky"\n    END_OBJECT             = GRANULEID\n\n    OBJECT                 = GRANULENAME\n      NUM_VAL              = 1\n      VALUE                = "CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.hdf"\n    END_OBJECT             = GRANULENAME\n\n    OBJECT                 = GRANULEVERSION\n      NUM_VAL              = 1\n      VALUE                = "V4-20"\n    END_OBJECT             = GRANULEVERSION\n\n    OBJECT                 = DAYNIGHT\n      NUM_VAL              = 1\n      VALUE                = "D"\n    END_OBJECT             = DAYNIGHT\n\n    OBJECT                 = BROWSE\n      NUM_VAL              = 1\n      VALUE                = "N"\n    END_OBJECT             = BROWS

## Next Steps

After converting to HDF5, you can:

1. Export specific variables to ASCII format
2. Convert to LAS format using PDAL
3. Create Cloud-Optimized Point Clouds (COPC)
