# Chained HDF4 → HDF5 → Text Conversion Demo

This notebook demonstrates how to chain together the HDF4 to HDF5 and HDF5 to text conversions in a single operation.

In [2]:
# Import required libraries
from pathlib import Path
import pandas as pd
from calipso_tool.converter import h4_to_h5, h5_to_txt, h4_to_txt

## Method 1: Using the Chained Function

The easiest way is to use the `h4_to_txt` function that handles both conversions automatically.

In [3]:
# Define input HDF4 file
hdf4_file = Path("CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.hdf")

# Check if file exists
if hdf4_file.exists():
    print(f"Input file found: {hdf4_file}")
else:
    print(f"Input file not found: {hdf4_file}")
    print("Please ensure the HDF4 file is in the current directory")

Input file found: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.hdf


In [4]:
# Perform chained conversion
# Note: You need to update 'variable_name' with the actual variable from your HDF file
variable_to_extract = "Extinction_Coefficient_532_Mean"  # UPDATE THIS!

try:
    # Convert HDF4 → HDF5 → Text in one call
    txt_file, h5_file = h4_to_txt(
        input_h4=hdf4_file,
        variable_name=variable_to_extract,
        altitude_units="km",  # Will convert to meters
        keep_h5=True  # Keep intermediate HDF5 file
    )
    
    print(f"\nConversion complete!")
    print(f"Text file: {txt_file}")
    print(f"HDF5 file: {h5_file}")
    
except Exception as e:
    print(f"\nConversion failed: {e}")
    print("Make sure to update 'variable_to_extract' with the correct variable name")

Step 1: Converting HDF4 to HDF5...
  ✓ Created: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.h5

Step 2: Converting HDF5 to text...
Converted CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.h5 to CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.txt
Output contains 1272960 points
  ✓ Created: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.txt

Conversion complete!
Text file: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.txt
HDF5 file: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.h5


## Method 2: Manual Step-by-Step Conversion

You can also perform the conversions separately for more control.

In [5]:
# Step 1: Convert HDF4 to HDF5
h5_output = hdf4_file.with_suffix('.h5')

try:
    print("Step 1: Converting HDF4 to HDF5...")
    h4_to_h5(hdf4_file, h5_output)
    print(f"✓ Created: {h5_output}")
except Exception as e:
    print(f"✗ HDF4 to HDF5 conversion failed: {e}")

Step 1: Converting HDF4 to HDF5...
✓ Created: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.h5


In [6]:
# Step 2: Convert HDF5 to Text
txt_output = hdf4_file.with_suffix('.txt')

try:
    print("Step 2: Converting HDF5 to text...")
    h5_to_txt(
        input_h5=h5_output,
        output_txt=txt_output,
        variable_name=variable_to_extract,
        altitude_units="km"
    )
    print(f"✓ Created: {txt_output}")
except Exception as e:
    print(f"✗ HDF5 to text conversion failed: {e}")

Step 2: Converting HDF5 to text...
Converted CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.h5 to CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.txt
Output contains 1272960 points
✓ Created: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.txt


## Method 3: Using a Pipeline Function

You can create your own pipeline function for custom processing.

In [8]:
def convert_calipso_pipeline(hdf4_path, variable_name, cleanup=False):
    """
    Complete pipeline: HDF4 → HDF5 → Text → Analysis
    """
    hdf4_path = Path(hdf4_path)
    
    # Define output paths
    h5_path = hdf4_path.with_suffix('.h5')
    txt_path = hdf4_path.with_suffix('.txt')
    
    # Step 1: HDF4 to HDF5
    print("Converting HDF4 to HDF5...")
    h4_to_h5(hdf4_path, h5_path)
    
    # Step 2: HDF5 to Text
    print("Converting HDF5 to text...")
    h5_to_txt(h5_path, txt_path, variable_name)
    
    # Step 3: Load and analyze
    print("\nLoading text data...")
    df = pd.read_csv(txt_path, sep=' ')
    
    print(f"\nData summary:")
    print(f"- Total points: {len(df):,}")
    print(f"- Columns: {list(df.columns)}")
    print(f"\nStatistics:")
    print(df.describe())
    
    # Cleanup if requested
    if cleanup:
        h5_path.unlink()
        print(f"\nCleaned up intermediate file: {h5_path}")
    
    return df

# Example usage (uncomment to run)
df = convert_calipso_pipeline(hdf4_file, variable_to_extract, cleanup=False)

Converting HDF4 to HDF5...
Converting HDF5 to text...
Converted CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.h5 to CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.txt
Output contains 1272960 points

Loading text data...

Data summary:
- Total points: 1,272,960
- Columns: ['X', 'Y', 'Z', 'Extinction_Coefficient_532_Mean']

Statistics:
                  X             Y             Z  \
count  1.272960e+06  1.272960e+06  1.272960e+06   
mean   0.000000e+00  0.000000e+00  5.820000e+03   
std    1.039131e+02  4.907140e+01  3.596852e+03   
min   -1.775000e+02 -8.400000e+01 -3.800481e+02   
25%   -8.875000e+01 -4.200000e+01  2.719976e+03   
50%    0.000000e+00  0.000000e+00  5.820000e+03   
75%    8.875000e+01  4.200000e+01  8.920025e+03   
max    1.775000e+02  8.400000e+01  1.202005e+04   

       Extinction_Coefficient_532_Mean  
count                     1.272960e+06  
mean                     -1.901435e+03  
std                       3.923909e+03  
min          

## Advanced: Batch Processing

Process multiple CALIPSO files in a directory.

In [9]:
from concurrent.futures import ProcessPoolExecutor, as_completed

def batch_convert_calipso(directory, variable_name, pattern="*.hdf", max_workers=4):
    """
    Convert all HDF4 files in a directory to text format.
    """
    directory = Path(directory)
    hdf4_files = list(directory.glob(pattern))
    
    print(f"Found {len(hdf4_files)} HDF4 files to process")
    
    results = []
    failed = []
    
    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        # Submit all conversions
        future_to_file = {
            executor.submit(h4_to_txt, f, variable_name=variable_name): f 
            for f in hdf4_files
        }
        
        # Process completed conversions
        for future in as_completed(future_to_file):
            file = future_to_file[future]
            try:
                txt_file, h5_file = future.result()
                results.append((file, txt_file))
                print(f"✓ Converted: {file.name}")
            except Exception as e:
                failed.append((file, str(e)))
                print(f"✗ Failed: {file.name} - {e}")
    
    print(f"\nSummary:")
    print(f"- Successful: {len(results)}")
    print(f"- Failed: {len(failed)}")
    
    return results, failed

# Example usage (uncomment to run)
results, failed = batch_convert_calipso(".", "Extinction_Coefficient_532_Mean")

Found 9 HDF4 files to process
Step 1: Converting HDF4 to HDF5...Step 1: Converting HDF4 to HDF5...Step 1: Converting HDF4 to HDF5...

Step 1: Converting HDF4 to HDF5...

  ✓ Created: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.h5

Step 2: Converting HDF5 to text...
  ✓ Created: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2019-03D.h5

Step 2: Converting HDF5 to text...
  ✓ Created: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2019-02D.h5

Step 2: Converting HDF5 to text...
  ✓ Created: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2019-01N.h5

Step 2: Converting HDF5 to text...
Converted CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.h5 to CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.txt
Output contains 1272960 points
  ✓ Created: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.txt
Step 1: Converting HDF4 to HDF5...
✓ Converted: CAL_LID_L3_Tropospheric_APro_AllSky-Standard-V4-20.2018-12D.hdf
Converted CAL_LID_L3_T

## Next Steps: PDAL Pipeline

After creating text files, you can use PDAL to create point clouds:

```bash
# Text → LAS
pdal pipeline h5tolas.json --readers.text.filename=output.txt --writers.las.filename=output.las

# LAS → COPC
pdal pipeline las2copc.json --readers.las.filename=output.las --writers.copc.filename=output.copc.las
```