# MTH5 File Structure Documentation

MTH5 (Magnetotelluric HDF5) is a standardized file format for storing magnetotelluric time series data, metadata, and derived products. This notebook documents the hierarchical structure of MTH5 files for both version 0.1.0 and 0.2.0.

## Overview

MTH5 uses HDF5 (Hierarchical Data Format 5) as the underlying storage format, organizing data into **groups** (similar to folders) and **datasets** (arrays of data). Each group contains metadata that describes the data and follows the MT metadata standards.

## Key Concepts

- **Groups**: Hierarchical containers that can hold other groups or datasets
- **Datasets**: Arrays of time series data or tabular information
- **Metadata**: Structured information about the data, following mt_metadata standards
- **Summary Tables**: Each group contains a summary table with HDF5 references for quick access

---

## MTH5 Version 0.2.0 Structure

Version 0.2.0 introduces **ExperimentGroup** as the top-level container, allowing multiple surveys within a single file. This is the current and recommended version.

### Hierarchical Structure

```
MTH5 File (/)
└── ExperimentGroup (root)
    ├── ReportsGroup
    ├── StandardsGroup
    │   └── summary (Dataset)
    └── SurveysGroup
        └── SurveyGroup (e.g., "my_survey")
            ├── FiltersGroup
            │   ├── ZPKGroup (Pole-Zero filters)
            │   ├── FAPGroup (Frequency lookup table filters)
            │   ├── FIRGroup (Finite Impulse Response filters)
            │   ├── CoefficientGroup (Coefficient filters)
            │   └── TimeDelayGroup (Time delay filters)
            ├── ReportsGroup
            └── StationsGroup
                └── StationGroup (e.g., "mt001")
                    ├── FeaturesGroup (Data quality features)
                    ├── FourierCoefficientsGroup (Frequency domain data)
                    ├── TransferFunctionsGroup (MT responses)
                    └── RunGroup (e.g., "001")
                        ├── ChannelDataset (ex, ey, hx, hy, hz, etc.)
                        └── ... (multiple channels)
```

### Group Descriptions (v0.2.0)

#### **ExperimentGroup** (Root)
The top-level group representing an entire magnetotelluric experiment, which can contain multiple surveys.

**Metadata**: Basic experiment information including:
- Experiment ID and name
- Time period
- Principal investigators
- Archive information

**Contains**: ReportsGroup, StandardsGroup, SurveysGroup

---

#### **ReportsGroup**
Stores reports and documentation related to the experiment.

**Metadata**: Report metadata following mt_metadata standards
- Title and author
- Date
- Summary

**Contains**: Report datasets or subgroups

---

#### **StandardsGroup**
Documents the metadata standards used in the file.

**Metadata**: Standards information
- MT metadata version
- Schema versions
- Citations

**Contains**: 
- `summary` dataset with standards information

---

#### **SurveysGroup** 
Container for all surveys in the experiment.

**Metadata**: Summary of all surveys

**Contains**: One or more SurveyGroup objects

---

#### **SurveyGroup**
Represents a single MT survey, typically defined by time period or geographic location.

**Metadata**: Survey-level information including:
- Survey ID and name
- Geographic bounding box
- Time period (start/end)
- Project information
- Country, state, county
- Datum and coordinate system
- Citation and references
- Acquired by (person/organization)
- Release license

**Contains**: FiltersGroup, ReportsGroup, StationsGroup

---

#### **FiltersGroup**
Contains all filters used in data acquisition and processing.

**Metadata**: Filter collection metadata

**Contains**: Multiple filter type groups
- **ZPKGroup**: Zero-Pole-K (analog) filters
- **FAPGroup**: Frequency-Amplitude-Phase lookup tables
- **FIRGroup**: Finite Impulse Response (digital) filters  
- **CoefficientGroup**: General coefficient filters
- **TimeDelayGroup**: Time delay/shift filters

Each filter has comprehensive metadata:
- Filter name and type
- Units (input/output)
- Calibration date
- Comments
- Filter-specific parameters (poles, zeros, gain, etc.)

---

#### **StationsGroup**
Container for all stations in the survey.

**Metadata**: Station collection summary

**Contains**: One or more StationGroup objects

---

#### **StationGroup**
Represents a single MT station (measurement location).

**Metadata**: Station information including:
- Station ID and name
- Geographic location (latitude, longitude, elevation)
- Orientation (declination, coordinate system)
- Time period
- Data quality notes
- Provenance information
- Archive ID

**Contains**: 
- **FeaturesGroup**: Data quality metrics and features
- **FourierCoefficientsGroup**: Frequency domain representations
- **TransferFunctionsGroup**: MT transfer functions (impedance, tipper)
- **RunGroup(s)**: Time series data runs

---

#### **RunGroup**
A continuous block of time series data collection at a station.

**Metadata**: Run information including:
- Run ID
- Time period (start/end)
- Sample rate
- Data logger information
- Channels recorded
- Data quality ratings
- Comments

**Contains**: 
- **ChannelDataset(s)**: Individual channel time series
- Each dataset stores the actual time series data as an HDF5 array

---

#### **ChannelDataset**
A single channel of time series data (e.g., Ex, Ey, Hx, Hy, Hz).

**Metadata**: Channel information including:
- Component (ex, ey, hx, hy, hz, temperature, etc.)
- Type (electric, magnetic, auxiliary)
- Measurement azimuth and tilt
- Units
- Sample rate
- Time period
- Sensor information (type, ID, manufacturer)
- Dipole length (for electric channels)
- Filter applied
- Data quality

**Data**: 1D array of time series values

---

## Example: Working with MTH5 v0.2.0

Let's create an MTH5 file and explore its structure with executable code.

In [15]:
# Import necessary modules
from mth5.mth5 import MTH5
from mt_metadata.timeseries import Survey, Station, Run, Electric, Magnetic, Auxiliary
import numpy as np
from pathlib import Path

# Create a temporary MTH5 file
h5_path = Path("example_v020.h5")
if h5_path.exists():
    h5_path.unlink()

print("Creating MTH5 v0.2.0 file...")

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'example_v020.h5'

In [2]:
# Initialize MTH5 object and open file in write mode
m = MTH5(file_version='0.2.0')
m.open_mth5(h5_path, mode="w")

# View the initial empty structure
print("Initial MTH5 structure:")
print(m)

[1m2026-01-01T09:57:10.904063-0800 | INFO | mth5.mth5 | _initialize_file | line: 678 | Initialized MTH5 0.2.0 file example_v020.h5 in mode w[0m
Initial MTH5 structure:
/:
    |- Group: Experiment
    --------------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Surveys
        -----------------
        --> Dataset: channel_summary
        ..............................
        --> Dataset: fc_summary
        .........................
        --> Dataset: tf_summary
        .........................


### Add Survey

In v0.2.0, we must add a survey before adding stations.

In [6]:
# Survey level metadata
print(Survey().to_json(required=False))

{
    "survey": {
        "acquired_by.author": "",
        "acquired_by.comments.author": null,
        "acquired_by.comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "acquired_by.comments.value": null,
        "acquired_by.email": null,
        "acquired_by.organization": null,
        "acquired_by.url": null,
        "citation_dataset.authors": null,
        "citation_dataset.doi": null,
        "citation_dataset.journal": null,
        "citation_dataset.pages": null,
        "citation_dataset.title": null,
        "citation_dataset.volume": null,
        "citation_dataset.year": null,
        "citation_journal.authors": null,
        "citation_journal.doi": null,
        "citation_journal.journal": null,
        "citation_journal.pages": null,
        "citation_journal.title": null,
        "citation_journal.volume": null,
        "citation_journal.year": null,
        "comments.author": null,
        "comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "comments.va

In [8]:
# Add a survey
survey_group = m.add_survey("MT_Survey_2024")

# Update survey metadata
survey_group.metadata.id = "MT_Survey_2024"
survey_group.metadata.name = "Example MT Survey"
survey_group.metadata.project = "MTH5 Documentation"
survey_group.metadata.country = "USA"
survey_group.metadata.release_license = "CC-BY-4.0"
survey_group.write_metadata()

print("Survey added:")
print(f"  Survey ID: {survey_group.metadata.id}")
print(f"  Project: {survey_group.metadata.project}")

Survey added:
  Survey ID: MT_Survey_2024
  Project: MTH5 Documentation


### Add Station

Add a station to the survey with location metadata.

In [7]:
# station level metadata
print(Station().to_json(required=False))

{
    "station": {
        "acquired_by.author": "",
        "acquired_by.comments.author": null,
        "acquired_by.comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "acquired_by.comments.value": null,
        "acquired_by.email": null,
        "acquired_by.organization": null,
        "acquired_by.url": null,
        "channel_layout": "X",
        "channels_recorded": [],
        "comments.author": null,
        "comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "comments.value": null,
        "data_type": "BBMT",
        "fdsn.alternate_code": null,
        "fdsn.alternate_network_code": null,
        "fdsn.channel_code": null,
        "fdsn.id": null,
        "fdsn.network": null,
        "fdsn.new_epoch": null,
        "geographic_name": "",
        "id": "",
        "location.datum": "WGS 84",
        "location.declination.comments.author": null,
        "location.declination.comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "location.declination.comm

In [9]:
# Add a station
station_group = m.add_station("MT001", survey="MT_Survey_2024")

# Update station metadata
station_group.metadata.location.latitude = 40.123
station_group.metadata.location.longitude = -112.456
station_group.metadata.location.elevation = 1234.5
station_group.metadata.location.datum = "WGS84"
station_group.metadata.time_period.start = "2024-01-01T00:00:00+00:00"
station_group.metadata.time_period.end = "2024-01-02T00:00:00+00:00"
station_group.write_metadata()

print("Station added:")
print(f"  Station ID: {station_group.metadata.id}")
print(f"  Location: {station_group.metadata.location.latitude}°N, "
      f"{station_group.metadata.location.longitude}°E")

Station added:
  Station ID: MT001
  Location: 40.123°N, -112.456°E


### Add Run

Add a run (continuous time series recording) to the station.

In [10]:
# Run level metadata
print(Run().to_json(required=False))

{
    "run": {
        "acquired_by.author": "",
        "acquired_by.comments.author": null,
        "acquired_by.comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "acquired_by.comments.value": null,
        "acquired_by.email": null,
        "acquired_by.organization": null,
        "acquired_by.url": null,
        "channels": null,
        "channels_recorded_auxiliary": [],
        "channels_recorded_electric": [],
        "channels_recorded_magnetic": [],
        "comments.author": null,
        "comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "comments.value": null,
        "data_logger.data_storage.id": "",
        "data_logger.data_storage.manufacturer": "",
        "data_logger.data_storage.model": null,
        "data_logger.data_storage.name": null,
        "data_logger.data_storage.type": "",
        "data_logger.firmware.author": "",
        "data_logger.firmware.last_updated": "1980-01-01T00:00:00+00:00",
        "data_logger.firmware.name": "",
        

In [11]:
# Add a run
run_group = m.add_run("MT001", "001", survey="MT_Survey_2024")

# Update run metadata
run_group.metadata.sample_rate = 256.0
run_group.metadata.time_period.start = "2024-01-01T00:00:00+00:00"
run_group.metadata.time_period.end = "2024-01-01T12:00:00+00:00"
run_group.metadata.data_logger.id = "DL001"
run_group.metadata.data_logger.manufacturer = "Example Corp"
run_group.write_metadata()

print("Run added:")
print(f"  Run ID: {run_group.metadata.id}")
print(f"  Sample Rate: {run_group.metadata.sample_rate} Hz")

Run added:
  Run ID: 001
  Sample Rate: 256.0 Hz


### Add Channels

Add electric and magnetic channels with sample data.

In [12]:
# Electric channel metadata
print(Electric().to_json(required=False))

{
    "electric": {
        "ac.end": 0.0,
        "ac.start": 0.0,
        "channel_id": null,
        "channel_number": 0,
        "comments.author": null,
        "comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "comments.value": null,
        "component": "e_default",
        "contact_resistance.end": 0.0,
        "contact_resistance.start": 0.0,
        "data_quality.comments.author": null,
        "data_quality.comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "data_quality.comments.value": null,
        "data_quality.flag": null,
        "data_quality.good_from_period": null,
        "data_quality.good_to_period": null,
        "data_quality.rating.author": null,
        "data_quality.rating.method": null,
        "data_quality.rating.value": null,
        "dc.end": 0.0,
        "dc.start": 0.0,
        "dipole_length": 0.0,
        "fdsn.alternate_code": null,
        "fdsn.alternate_network_code": null,
        "fdsn.channel_code": null,
        "fdsn.id": 

In [None]:
# Magnetic channel metadata
print(Magnetic().to_json(required=False))

{
    "magnetic": {
        "channel_id": null,
        "channel_number": 0,
        "comments.author": null,
        "comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "comments.value": null,
        "component": "h_default",
        "data_quality.comments.author": null,
        "data_quality.comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "data_quality.comments.value": null,
        "data_quality.flag": null,
        "data_quality.good_from_period": null,
        "data_quality.good_to_period": null,
        "data_quality.rating.author": null,
        "data_quality.rating.method": null,
        "data_quality.rating.value": null,
        "fdsn.alternate_code": null,
        "fdsn.alternate_network_code": null,
        "fdsn.channel_code": null,
        "fdsn.id": null,
        "fdsn.network": null,
        "fdsn.new_epoch": null,
        "filters": [],
        "h_field_max.end": 0.0,
        "h_field_max.start": 0.0,
        "h_field_min.end": 0.0,
        "h_field_

In [16]:
# Auxiliary channel metadata
print(Auxiliary().to_json(required=False))

{
    "auxiliary": {
        "channel_id": null,
        "channel_number": 0,
        "comments.author": null,
        "comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "comments.value": null,
        "component": "auxiliary_default",
        "data_quality.comments.author": null,
        "data_quality.comments.time_stamp": "1980-01-01T00:00:00+00:00",
        "data_quality.comments.value": null,
        "data_quality.flag": null,
        "data_quality.good_from_period": null,
        "data_quality.good_to_period": null,
        "data_quality.rating.author": null,
        "data_quality.rating.method": null,
        "data_quality.rating.value": null,
        "fdsn.alternate_code": null,
        "fdsn.alternate_network_code": null,
        "fdsn.channel_code": null,
        "fdsn.id": null,
        "fdsn.network": null,
        "fdsn.new_epoch": null,
        "filters": [],
        "location.datum": "WGS 84",
        "location.elevation": 0.0,
        "location.latitude": 0.0,
 

In [17]:
# Create sample time series data (1 hour at 256 Hz = 921,600 samples)
n_samples = 256 * 3600  # 1 hour of data
time_series_data = np.random.randn(n_samples) * 0.1

# Add electric channel Ex
ex_channel = m.add_channel("MT001", "001", "Ex", "electric", 
                           time_series_data, survey="MT_Survey_2024")
ex_channel.metadata.component = "ex"
ex_channel.metadata.measurement_azimuth = 0.0
ex_channel.metadata.dipole_length = 100.0
ex_channel.metadata.units = "mV/km"
ex_channel.write_metadata()

# Add magnetic channel Hx
hx_channel = m.add_channel("MT001", "001", "Hx", "magnetic", 
                           time_series_data, survey="MT_Survey_2024")
hx_channel.metadata.component = "hx"
hx_channel.metadata.measurement_azimuth = 0.0
hx_channel.metadata.sensor.type = "Induction coil"
hx_channel.metadata.units = "nT"
hx_channel.write_metadata()

print("Channels added:")
print(f"  Ex: {len(ex_channel.hdf5_dataset)} samples")
print(f"  Hx: {len(hx_channel.hdf5_dataset)} samples")

Channels added:
  Ex: 921600 samples
  Hx: 921600 samples


### View Complete Structure

Now let's view the complete MTH5 file structure.

In [18]:
# Display the complete MTH5 structure
print("Complete MTH5 v0.2.0 structure:")
print(m)

Complete MTH5 v0.2.0 structure:
/:
    |- Group: Experiment
    --------------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Surveys
        -----------------
            |- Group: MT_Survey_2024
            ------------------------
                |- Group: Filters
                -----------------
                    |- Group: coefficient
                    ---------------------
                    |- Group: fap
                    -------------
                    |- Group: fir
                    -------------
                    |- Group: time_delay
                    --------------------
                    |- Group: zpk
                    -------------
                |- Group: Reports
                -----------------
                |- Group: Standards
                -------------------
                    --> Dataset: su

### Access Summary Tables

Each group has a summary table that provides quick access to all items within that group.

In [21]:
# View channel summary table
channel_summary = m.channel_summary
channel_summary.summarize()
print("Channel Summary Table:")
channel_summary.to_dataframe()

Channel Summary Table:


Unnamed: 0,survey,station,run,latitude,longitude,elevation,component,start,end,n_samples,sample_rate,measurement_type,azimuth,tilt,units,has_data,hdf5_reference,run_hdf5_reference,station_hdf5_reference
0,MT_Survey_2024,MT001,1,40.123,-112.456,1234.5,ex,1980-01-01 00:00:00+00:00,1980-01-01 00:00:00+00:00,921600,0.0,electric,0.0,0.0,milliVolt per kilometer,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
1,MT_Survey_2024,MT001,1,40.123,-112.456,1234.5,hx,1980-01-01 00:00:00+00:00,1980-01-01 00:00:00+00:00,921600,0.0,magnetic,0.0,0.0,nanoTesla,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>


### Access Data and Metadata

Demonstrate how to access data and metadata from the file.

In [22]:
# Access a specific channel
ex = m.get_channel("MT001", "001", "Ex", survey="MT_Survey_2024")

# View channel metadata
print(f"Channel: {ex.metadata.component}")
print(f"Type: {ex.metadata.type}")
print(f"Units: {ex.metadata.units}")
print(f"Sample rate: {ex.metadata.sample_rate} Hz")
print(f"Azimuth: {ex.metadata.measurement_azimuth}°")
print(f"Data shape: {ex.hdf5_dataset.shape}")
print(f"\nFirst 10 samples:")
print(ex.hdf5_dataset[:10])

Channel: ex
Type: electric
Units: milliVolt per kilometer
Sample rate: 0.0 Hz
Azimuth: 0.0°
Data shape: (921600,)

First 10 samples:
[ 1.20116727e-01 -4.09096353e-02 -1.87617644e-01  4.31178388e-02
 -4.30365672e-02  6.48503363e-02 -1.64450955e-04 -1.64007524e-01
 -5.39261003e-02 -8.48382376e-02]


In [23]:
# Close the file
m.close_mth5()
print("MTH5 file closed successfully.")

[1m2026-01-01T10:05:34.924362-0800 | INFO | mth5.mth5 | close_mth5 | line: 772 | Flushing and closing example_v020.h5[0m
MTH5 file closed successfully.


---

## MTH5 Version 0.1.0 Structure

Version 0.1.0 uses **SurveyGroup** as the top-level container. This version is archived but maintained for legacy compatibility.

### Hierarchical Structure

```
MTH5 File (/)
└── SurveyGroup (root)
    ├── FiltersGroup
    │   ├── ZPKGroup (Pole-Zero filters)
    │   ├── FAPGroup (Frequency lookup table filters)
    │   ├── FIRGroup (Finite Impulse Response filters)
    │   ├── CoefficientGroup (Coefficient filters)
    │   └── TimeDelayGroup (Time delay filters)
    ├── ReportsGroup
    ├── StandardsGroup
    │   └── summary (Dataset)
    └── StationsGroup
        └── StationGroup (e.g., "mt001")
            ├── FeaturesGroup
            ├── FourierCoefficientsGroup
            ├── TransferFunctionsGroup
            └── RunGroup (e.g., "001")
                ├── ChannelDataset (ex, ey, hx, hy, hz, etc.)
                └── ... (multiple channels)
```

### Key Differences from v0.2.0

1. **Top Level**: SurveyGroup is root (no ExperimentGroup)
2. **Single Survey**: Can only contain one survey per file
3. **FiltersGroup Location**: Filters are at the survey level, not within surveys
4. **No Survey Hierarchy**: Cannot group multiple surveys in one file

### Group Descriptions (v0.1.0)

The groups in v0.1.0 are similar to v0.2.0 but with the structural differences noted above:

- **SurveyGroup** (root): Single MT survey container
- **FiltersGroup**: All filter types (at survey level)
- **ReportsGroup**: Survey reports
- **StandardsGroup**: Metadata standards
- **StationsGroup**: Container for all stations
- **StationGroup**: Individual station data
- **RunGroup**: Continuous recording block
- **ChannelDataset**: Individual channel time series

All metadata structures within these groups remain compatible between versions.

---

## Summary

This notebook documented the MTH5 file structure for both versions:

### Version 0.2.0 (Current - Recommended)
- **Top Level**: ExperimentGroup
- **Multi-Survey Support**: Can contain multiple surveys
- **Hierarchy**: Experiment → Surveys → Stations → Runs → Channels
- **Use Case**: Large experiments spanning multiple surveys, projects, or campaigns

### Version 0.1.0 (Legacy)
- **Top Level**: SurveyGroup  
- **Single Survey**: One survey per file
- **Hierarchy**: Survey → Stations → Runs → Channels
- **Use Case**: Individual surveys or backward compatibility

### Common Features
- **HDF5 Format**: Efficient, hierarchical data storage
- **MT Metadata Standards**: All metadata follows standardized schemas
- **Summary Tables**: Quick access via HDF5 references
- **Flexible I/O**: Read/write time series, Fourier coefficients, transfer functions
- **Data Types**: Time series, frequency domain, transfer functions, features

### Best Practices
1. Use version 0.2.0 for new projects
2. Always call `write_metadata()` after updating metadata
3. Use summary tables for efficient data discovery
4. Close files properly with `close_mth5()`
5. Leverage HDF5 references for direct access to groups/datasets