# 04 - Generic Data I/O: HDF5 and MATLAB .MAT Files

This notebook demonstrates the use of generic Input/Output (I/O) utilities provided by the `diffusemri` library for handling HDF5 (`.h5`) and MATLAB (`.mat`) files.

These utilities are useful for:
*   **HDF5:** Storing multiple datasets, including large NumPy arrays, in a single, structured, and hierarchical file. HDF5 is efficient for both storage and partial I/O.
*   **MATLAB .MAT Files:** Exchanging data with MATLAB environments.

The library provides simple functions to save a Python dictionary of NumPy arrays (and some other basic types for .MAT files) to these formats and to load them back.

In [None]:
import os
import shutil
import numpy as np

# diffusemri library imports
from data_io.generic_utils import (
    save_dict_to_hdf5, load_dict_from_hdf5,
    save_dict_to_mat, load_dict_from_mat
)

# Setup a temporary directory for example files
TEMP_DIR = "temp_generic_io_example"
if os.path.exists(TEMP_DIR):
    shutil.rmtree(TEMP_DIR)
os.makedirs(TEMP_DIR)

print(f"Temporary directory for examples: {os.path.abspath(TEMP_DIR)}")

## Part 1: HDF5 I/O (`.h5`)

HDF5 (Hierarchical Data Format version 5) is a versatile file format designed to store and organize large amounts of scientific data.

### Creating Sample Data for HDF5

In [None]:
# Sample data: a dictionary where keys are dataset names and values are NumPy arrays.
data_to_save_h5 = {
    'array_integers': np.arange(20, dtype=np.int32).reshape(4, 5),
    'array_floats': np.random.rand(2, 3, 2).astype(np.float64), # A 3D array
    'array_booleans': np.array([[True, False, True], [False, True, False]]),
    'scalar_value': np.array(3.14159) # A single scalar value stored as a 0-dim array
}

print("Original data dictionary for HDF5:")
for key, value_array in data_to_save_h5.items():
    print(f"  '{key}': shape={value_array.shape}, dtype={value_array.dtype}")

### Saving to HDF5

In [None]:
hdf5_filepath = os.path.join(TEMP_DIR, "my_data_archive.h5")
try:
    save_dict_to_hdf5(data_to_save_h5, hdf5_filepath)
    print(f"Data successfully saved to HDF5 file: {hdf5_filepath}")
except Exception as e:
    print(f"An error occurred while saving to HDF5: {e}")

### Loading from HDF5

In [None]:
if not os.path.exists(hdf5_filepath):
    print(f"HDF5 file {hdf5_filepath} not found. Skipping loading example.")
else:
    try:
        loaded_data_h5 = load_dict_from_hdf5(hdf5_filepath)
        print("\nData loaded from HDF5 file:")
        for key, value_array in loaded_data_h5.items():
            print(f"  '{key}': shape={value_array.shape}, dtype={value_array.dtype}")
        
        # Verification step (good practice for notebooks)
        print("\nVerifying HDF5 data integrity...")
        all_match = True
        for key in data_to_save_h5:
            if key not in loaded_data_h5:
                print(f"  ERROR: Key '{key}' missing in loaded data.")
                all_match = False
                continue
            if not np.array_equal(data_to_save_h5[key], loaded_data_h5[key]):
                print(f"  ERROR: Data mismatch for key '{key}'.")
                all_match = False
            if data_to_save_h5[key].dtype != loaded_data_h5[key].dtype:
                print(f"  ERROR: Dtype mismatch for key '{key}'. Original: {data_to_save_h5[key].dtype}, Loaded: {loaded_data_h5[key].dtype}")
                all_match = False
        
        if all_match:
            print("HDF5 data integrity verified successfully.")
        else:
            print("HDF5 data integrity verification failed for some items.")
            
    except Exception as e:
        print(f"An error occurred while loading or verifying HDF5 data: {e}")

## Part 2: MATLAB .MAT File I/O (`.mat`)

MAT files are commonly used for saving data in a format that can be easily loaded into MATLAB. The `scipy.io.savemat` and `scipy.io.loadmat` functions are used under the hood.

### Creating Sample Data for .MAT

In [None]:
# Sample data for .MAT files. Can include NumPy arrays and other Python types.
data_to_save_mat = {
    'matrix_A': np.array([[10.5, 20.2, 30.0], [40.0, 50.8, 60.1]], dtype=np.double),
    'vector_B': np.array([11, 22, 33, 44], dtype=np.int16),
    'scalar_int': 100,
    'scalar_float': 75.25,
    'string_C': "MATLAB_Example_Data",
    'boolean_D': True
}

print("Original data dictionary for .MAT:")
for key, value in data_to_save_mat.items():
    if isinstance(value, np.ndarray):
        print(f"  '{key}': shape={value.shape}, dtype={value.dtype}")
    else:
        print(f"  '{key}': type={type(value)}, value='{value}'")

### Saving to .MAT

In [None]:
mat_filepath = os.path.join(TEMP_DIR, "my_matlab_data.mat")
try:
    save_dict_to_mat(data_to_save_mat, mat_filepath)
    print(f"Data successfully saved to .MAT file: {mat_filepath}")
except Exception as e:
    print(f"An error occurred while saving to .MAT: {e}")

### Loading from .MAT

Note: `load_dict_from_mat` filters out MATLAB-internal variables like `__header__`, `__version__`, `__globals__`.

In [None]:
if not os.path.exists(mat_filepath):
    print(f".MAT file {mat_filepath} not found. Skipping loading example.")
else:
    try:
        loaded_data_mat = load_dict_from_mat(mat_filepath)
        print("\nData loaded from .MAT file (filtered):")
        for key, value in loaded_data_mat.items():
            if isinstance(value, np.ndarray):
                # SciPy may save 1D vectors as 2D row/column vectors, .flatten() helps for comparison
                print(f"  '{key}': shape={value.shape}, dtype={value.dtype}, value={value.flatten() if value.ndim > 1 else value}")
            else:
                print(f"  '{key}': type={type(value)}, value='{value}'")

        # Verification (handle potential type/shape differences from MATLAB format)
        print("\nVerifying .MAT data integrity...")
        assert np.array_equal(data_to_save_mat['matrix_A'], loaded_data_mat['matrix_A']), "matrix_A mismatch"
        # .mat files might save 1D arrays as 2D (e.g., (1,4) instead of (4,)). Flatten for comparison.
        assert np.array_equal(data_to_save_mat['vector_B'], loaded_data_mat['vector_B'].flatten()), "vector_B mismatch"
        assert data_to_save_mat['string_C'] == loaded_data_mat['string_C'], "string_C mismatch"
        # Scalar numbers might be loaded as arrays by scipy.io.loadmat
        assert data_to_save_mat['scalar_int'] == loaded_data_mat['scalar_int'].item(), "scalar_int mismatch"
        assert np.isclose(data_to_save_mat['scalar_float'], loaded_data_mat['scalar_float'].item()), "scalar_float mismatch"
        assert data_to_save_mat['boolean_D'] == loaded_data_mat['boolean_D'].item(), "boolean_D mismatch"
        print(".MAT data integrity verified successfully.")

    except Exception as e:
        print(f"An error occurred while loading or verifying .MAT data: {e}")

## Cleanup

Remove the temporary directory and its contents.

In [None]:
if os.path.exists(TEMP_DIR):
    shutil.rmtree(TEMP_DIR)
    print(f"Cleaned up temporary directory: {TEMP_DIR}")