# HDF5 to Text Conversion Demo

This notebook demonstrates how to convert CALIPSO HDF5 files to space-delimited text format using the `h5_to_txt` function.

In [1]:
# Import required libraries
from pathlib import Path
import h5py
import pandas as pd
from calipso_tool.h5_to_txt import h5_to_txt

In [6]:
# Define the HDF5 file path
h5_file = Path("CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-21.2020-12D.h5")

# Check if the file exists
if h5_file.exists():
    print(f"HDF5 file found: {h5_file}")
else:
    print(f"HDF5 file not found: {h5_file}")
    print("Please run the HDF4 to HDF5 conversion first.")

HDF5 file found: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-21.2020-12D.h5


In [7]:
# Explore the HDF5 file structure to find available variables
if h5_file.exists():
    with h5py.File(h5_file, 'r') as f:
        print("Available datasets in the HDF5 file:")
        print("=" * 50)
        
        def list_datasets(name, obj):
            if isinstance(obj, h5py.Dataset):
                print(f"{name}: shape={obj.shape}, dtype={obj.dtype}")
        
        f.visititems(list_datasets)

Available datasets in the HDF5 file:
AOD_63_Percent_Below: shape=(85, 72), dtype=>f4
AOD_90_Percent_Below: shape=(85, 72), dtype=>f4
AOD_Mean: shape=(85, 72), dtype=>f4
AOD_Mean_Dust: shape=(85, 72), dtype=>f4
AOD_Mean_Elevated_Smoke: shape=(85, 72), dtype=>f4
AOD_Mean_Polluted_Dust: shape=(85, 72), dtype=>f4
Aerosol_Type: shape=(85, 72, 208, 7), dtype=>i2
Altitude_Midpoint: shape=(1, 208), dtype=>f4
Days_Of_Month_Observed: shape=(85, 72), dtype=>u4
Extinction_Coefficient_532_Mean: shape=(85, 72, 208), dtype=>f4
Extinction_Coefficient_532_Mean_Dust: shape=(85, 72, 208), dtype=>f4
Extinction_Coefficient_532_Mean_Elevated_Smoke: shape=(85, 72, 208), dtype=>f4
Extinction_Coefficient_532_Mean_Polluted_Dust: shape=(85, 72, 208), dtype=>f4
Extinction_Coefficient_532_Percentiles: shape=(85, 72, 208, 11), dtype=>f4
Extinction_Coefficient_532_Standard_Deviation: shape=(85, 72, 208), dtype=>f4
Extinction_Coefficient_532_Standard_Deviation_Dust: shape=(85, 72, 208), dtype=>f4
Extinction_Coefficie

In [8]:
# Check for required coordinate datasets
if h5_file.exists():
    with h5py.File(h5_file, 'r') as f:
        print("Checking for coordinate datasets:")
        required_coords = ["Latitude_Midpoint", "Longitude_Midpoint", "Altitude_Midpoint"]
        
        for coord in required_coords:
            if coord in f:
                print(f"✓ {coord} found: shape={f[coord].shape}")
            else:
                print(f"✗ {coord} NOT found")

Checking for coordinate datasets:
✓ Latitude_Midpoint found: shape=(1, 85)
✓ Longitude_Midpoint found: shape=(1, 72)
✓ Altitude_Midpoint found: shape=(1, 208)


In [9]:
# Convert HDF5 to text format
# You need to replace 'var_to_grab' with an actual variable name from your HDF5 file
# Common CALIPSO variables might include:
# - "Extinction_Coefficient_532"
# - "Temperature_Met"
# - "Pressure_Met"
# - "Samples_Averaged"

variable_name = "Samples_Averaged_Polluted_Dust"  # UPDATE THIS with your actual variable name
output_txt = h5_file.with_suffix('.txt')

try:
    print(f"Converting {h5_file} to text format...")
    print(f"Extracting variable: {variable_name}")
    
    result = h5_to_txt(
        input_h5=h5_file,
        output_txt=output_txt,
        variable_name=variable_name,
        altitude_units="km"  # Will convert km to meters
    )
    
    print(f"\nConversion successful!")
    print(f"Output file: {result}")
    
except KeyError as e:
    print(f"\nError: {e}")
    print("\nPlease update the 'variable_name' with one of the available variables from the HDF5 file.")
except Exception as e:
    print(f"\nConversion failed: {e}")

Converting CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-21.2020-12D.h5 to text format...
Extracting variable: Samples_Averaged_Polluted_Dust
Converted CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-21.2020-12D.h5 to CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-21.2020-12D.txt
Output contains 1272960 points

Conversion successful!
Output file: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-21.2020-12D.txt


In [10]:
# Read and display the first few lines of the text file
if output_txt.exists():
    print(f"Reading {output_txt}...\n")
    
    # Read with pandas
    df = pd.read_csv(output_txt, sep=' ', nrows=10)
    print("First 10 rows:")
    print(df)
    
    # Get file statistics
    df_full = pd.read_csv(output_txt, sep=' ')
    print(f"\nFile statistics:")
    print(f"Total points: {len(df_full):,}")
    print(f"Columns: {list(df_full.columns)}")
    print(f"\nData ranges:")
    for col in df_full.columns:
        print(f"  {col}: [{df_full[col].min():.3f}, {df_full[col].max():.3f}]")

Reading CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-21.2020-12D.txt...

First 10 rows:
       X     Y           Z  Samples_Averaged_Polluted_Dust
0 -177.5 -84.0 -380.048070                             0.0
1 -177.5 -84.0 -320.144230                             0.0
2 -177.5 -84.0 -260.240400                             0.0
3 -177.5 -84.0 -200.336550                             0.0
4 -177.5 -84.0 -140.432680                             0.0
5 -177.5 -84.0  -80.528830                             0.0
6 -177.5 -84.0  -20.625002                             0.0
7 -177.5 -84.0   39.278828                             0.0
8 -177.5 -84.0   99.182686                             0.0
9 -177.5 -84.0  159.086550                             0.0

File statistics:
Total points: 1,272,960
Columns: ['X', 'Y', 'Z', 'Samples_Averaged_Polluted_Dust']

Data ranges:
  X: [-177.500, 177.500]
  Y: [-84.000, 84.000]
  Z: [-380.048, 12020.048]
  Samples_Averaged_Polluted_Dust: [0.000, 2018.000]


## Using the Command Line Interface

You can also use the `h5_to_txt` module directly from the command line:

```bash
# Basic usage
python -m calipso_tool.h5_to_txt input.h5

# Specify output file
python -m calipso_tool.h5_to_txt input.h5 -o output.txt

# Specify variable name
python -m calipso_tool.h5_to_txt input.h5 -v Extinction_Coefficient_532

# Specify altitude units (if already in meters)
python -m calipso_tool.h5_to_txt input.h5 --alt-units m
```

## Next Steps

The generated text file can be used with PDAL pipelines to create LAS files:

```bash
pdal pipeline h5tolas.json --readers.text.filename=points.txt --writers.las.filename=output.las
```

And then convert to COPC:

```bash
pdal pipeline las2copc.json --readers.las.filename=output.las --writers.copc.filename=output.copc.las
```