# Inspecting .datx / HDF5 Files
This notebook demonstrates opening Zygo NX2 data files in '.datx' format using the HDF Python library h5py.  The basic outline is:
1. Open an HDF5-based `.datx` file.
2. Recursively traverse the groups and datasets.
3. Collect and display for each object:
   - Name and full path
4. Present the results in Python dict and in a dataframe form including serializing attributes as needed.


In [1]:
# Dependencies
import h5py
import json
import pandas as pd
import numpy as np

## 1. Define inspection function
Traverse the file to collect paths, types, shapes, dtypes, and raw attrs.

In [6]:
def inspect_hdf5(file_path):
    
    """
    Inspect an HDF5 file (.h5, .datx) and return a dict describing
    all groups and datasets, including paths, types, shapes, dtypes, and attrs.
    """
    descriptors = {}

    def visitor(name, obj):
        entry = { 'path': name }
        if isinstance(obj, h5py.Group):
            entry['type'] = 'Group'
        else:
            entry['type'] = 'Dataset'
            entry['shape'] = obj.shape
            entry['dtype'] = str(obj.dtype)
        # copy attributes as a raw dict
        entry['attrs'] = dict(obj.attrs)
        descriptors[name] = entry

    with h5py.File(file_path, 'r') as f:
        f.visititems(visitor)

    return descriptors


## 2. Example usage
Update the path and run to see a concise list of all objects in the file.

In [3]:
file_name = '/Users/elbert/prysm_play/FS cuts/Middle.datx'  # an example file; update this for any file of interest

metadata = inspect_hdf5(file_name)

print(f"Found {len(metadata)} objects in '{file_name}':")
for p in sorted(metadata):
    print(" -", p)

Found 16 objects in '/Users/elbert/prysm_play/FS cuts/Middle.datx':
 - Attributes
 - Attributes/System
 - Attributes/{5CB51FA7-9361-4A66-AAB3-EE9EE1D96588}
 - Data
 - Data/Intensity
 - Data/Intensity/{26EB0B6C-1F64-4AB9-BE85-C007079144B5}
 - Data/Quality
 - Data/Quality/{D7DB3063-CC29-45D6-AE08-38703E184D38}
 - Data/Saturation Counts
 - Data/Saturation Counts/{0F6F4648-2693-46FA-8BB2-894616AEBA82}
 - Data/Surface
 - Data/Surface/{86FCB133-E4C2-4B29-8094-A086AA5A3DEF}
 - Data/{3AD5CF24-EE4A-49B2-B550-DAEF948C76A6}
 - Data/{CF16554A-2933-469B-A8BF-CEB3B19FF82C}
 - Measurement
 - MetaData


## 3. Tabular View
Serialize attributes (NumPy → list, bytes → str) and display in a DataFrame.

In [4]:
def serialize_attr(v):
    if isinstance(v, np.ndarray):
        return serialize_attr(v.tolist())
    if isinstance(v, (list, tuple)):
        return [serialize_attr(x) for x in v]
    if isinstance(v, np.generic):
        return v.item()
    if isinstance(v, bytes):
        return v.decode(errors='ignore')
    return v

rows = []
for p, info in metadata.items():
    attrs = {k: serialize_attr(val) for k, val in info['attrs'].items()}
    rows.append({
        'Path': p,
        'Type': info['type'],
        'Shape': info.get('shape', ''),
        'Dtype': info.get('dtype', ''),
        'Attrs': json.dumps(attrs, ensure_ascii=False)
    })

df = pd.DataFrame(rows)
df


Unnamed: 0,Path,Type,Shape,Dtype,Attrs
0,Attributes,Group,,,"{""File Layout Version"": [1]}"
1,Attributes/System,Group,,,{}
2,Attributes/{5CB51FA7-9361-4A66-AAB3-EE9EE1D96588},Group,,,"{""Data Context.Data Attributes.AGC"": [0], ""Dat..."
3,Data,Group,,,{}
4,Data/Intensity,Group,,,{}
5,Data/Intensity/{26EB0B6C-1F64-4AB9-BE85-C00707...,Dataset,"(1000, 1000)",int32,"{""Coordinates"": [[0, 0, 1000, 1000]], ""Group N..."
6,Data/Quality,Group,,,{}
7,Data/Quality/{D7DB3063-CC29-45D6-AE08-38703E18...,Dataset,"(1000, 1000)",float64,"{""Coordinates"": [[0, 0, 1000, 1000]], ""Group N..."
8,Data/Saturation Counts,Group,,,{}
9,Data/Saturation Counts/{0F6F4648-2693-46FA-8BB...,Dataset,"(1000, 1000)",int32,"{""Coordinates"": [[0, 0, 1000, 1000]], ""Group N..."


## 4. Next Steps
- Use `metadata` and the DataFrame `df` to decide which datasets you want in your final dict.
- For each chosen path, load data via:
  ```python
  with h5py.File(file_name, 'r') as f:
      data = f[path][()]
      attrs = metadata[path]['attrs']