# Export and Validate Metadata

This tutorial shows how to **export and validate metadata** using the `paidiverpy` package.

## What You'll Learn

- 📂 Load metadata from JSON and CSV formats
- ✅ Validate metadata structure and contents
- 💾 Export metadata in standardized formats (iFDO)

---

## Overview

To use this package effectively, you need a metadata file that describes the images being processed.
This can be either a `.json` file (following the iFDO standard) or a `.csv` file.

Metadata provides essential context such as filenames, timestamps, and geospatial coordinates, which are critical for accurate image analysis.

---

### iFDO Standard JSON File

The **iFDO** (image FAIR Digital Object) format is a standardized way to structure metadata for image datasets. It supports a rich set of attributes including:

- **filename**: Name of the image file
- **datetime**: Timestamp of image capture
- **lat / lon**: Geospatial coordinates of the image location

To ensure iFDO compliance, structure the JSON file according to the standard.


## Validate iFDO Metadata

You can validate your metadata using the `validate_ifdo` function included in the `paidiverpy` package:

### Import dependencies

In [1]:
%load_ext autoreload
%autoreload 2
from paidiverpy.metadata_parser.ifdo_tools import validate_ifdo

### Opening the metadata

The validation function accepts a path to a json file or a json file loaded in python as a dictionary.

The output of the validation function is a list of validation errors

In [2]:
# metadata file path
metadata_file = "../metadata/metadata_ifdo_hf.json"
errors = validate_ifdo(metadata_file)

# You can also validate the metadata as a dict object on python metadadata file path
# with open(metadata_file, "r") as file:
#     metadata_data = json.dum

# validate_ifdo(metadata_file)

In [3]:
errors

[{'path': ['image-set-header'],
  'message': "'image-set-handle' is a required property"},
 {'path': ['image-set-header'],
  'message': "'image-altitude-meters' is a required property"},
 {'path': ['image-set-header'],
  'message': "'image-coordinate-uncertainty-meters' is a required property"},
 {'path': ['image-set-items', 'M58_10441297_12987744807147.jpg'],
  'message': "Missing fields: ['image-hash-sha256', 'image-handle']. Additional fields (this will not make the metadata invalid): ['image-area-square-meter', 'image-media-type']."},
 {'path': ['image-set-items', 'M58_10441297_12987744811443.jpg'],
  'message': "Missing fields: ['image-hash-sha256', 'image-handle']. Additional fields (this will not make the metadata invalid): ['image-area-square-meter', 'image-media-type']."},
 {'path': ['image-set-items', 'M58_10441297_12987744853552.jpg'],
  'message': "Missing fields: ['image-hash-sha256', 'image-handle']. Additional fields (this will not make the metadata invalid): ['image-are

As you can see above, there are several validation errors with the iFDO metadata. The package will work with those validation problems. However, some fields are mandatory per the iFDO specification, while others are required by `paidiverpy` for full pipeline functionality. These are the mandatory fields:

- `image-set-header` >> `image-set-ifdo-version`: Version of the iFDO schema used.
- `image-set-items` >> `image-filename`: Name of the image file.

It is also important to mention the this validation will also run automatically when you run a pipeline or when you instantiate a `MetadataParser` class with a iFDO metadata.

## Export Metadata

You can export metadata to several formats using the `export_metadata` function inside the `MetadataParser` class.
This is useful for interoperability, archiving, or sharing.

### Import dependencies

In [4]:
from paidiverpy.pipeline import Pipeline

### Parsing the existing metadata

You can parse and open the metadata using the Pipeline class or the MetadataParser class.

The only difference is that, if you use the Pipeline class, after you run the pipeline, metadata can be added to the output metadata based on the outputs of the pipeline steps

In [5]:
# Using the Pipeline class and get the metadata after the run the pipeline
pipeline = Pipeline(config_file_path="../config_files/config_benthic.yml")
pipeline.run()

# Using the MetadataParser class
# config = Config(config_file_path="/path/to/your/config.yaml")
# metadata_parser = MetadataParser(config=config)

[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:58 | Processing images using 8 cores[0m


INFO:paidiverpy:Processing images using 8 cores


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:58 | Running step 0: raw - OpenLayer[0m


INFO:paidiverpy:Running step 0: raw - OpenLayer


[########################################] | 100% Completed | 219.05 ms
[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:58 | Step 0 completed[0m


INFO:paidiverpy:Step 0 completed


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:58 | Running step 1: colour_correction - ColourLayer[0m


INFO:paidiverpy:Running step 1: colour_correction - ColourLayer


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:59 | Step 1 completed[0m


INFO:paidiverpy:Step 1 completed


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:59 | Running step 2: datetime - SamplingLayer[0m


INFO:paidiverpy:Running step 2: datetime - SamplingLayer


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:59 | Number of images before the sampling step: 20. Total number of images after: 14[0m


INFO:paidiverpy:Number of images before the sampling step: 20. Total number of images after: 14


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:59 | Step 2 completed[0m


INFO:paidiverpy:Step 2 completed


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:59 | Running step 3: overlapping - SamplingLayer[0m


INFO:paidiverpy:Running step 3: overlapping - SamplingLayer


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:59 | Number of images before the sampling step: 14. Total number of images after: 9[0m


INFO:paidiverpy:Number of images before the sampling step: 14. Total number of images after: 9


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:59 | Step 3 completed[0m


INFO:paidiverpy:Step 3 completed


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:59 | Running step 4: colour_correction - ColourLayer[0m


INFO:paidiverpy:Running step 4: colour_correction - ColourLayer


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:44:59 | Step 4 completed[0m


INFO:paidiverpy:Step 4 completed


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:45:00 | Running step 5: sharpen - ColourLayer[0m


INFO:paidiverpy:Running step 5: sharpen - ColourLayer


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:45:00 | Step 5 completed[0m


INFO:paidiverpy:Step 5 completed


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:45:00 | Running step 6: contrast - ColourLayer[0m


INFO:paidiverpy:Running step 6: contrast - ColourLayer


[92m☁ paidiverpy ☁  |       INFO | 2025-09-11 16:45:00 | Step 6 completed[0m


INFO:paidiverpy:Step 6 completed


[########################################] | 100% Completed | 1.01 sms


### Exporting the metadata


**1. CSV**

- Useful for compatibility with spreadsheets or data analysis tools.
- Dataset metadata is added as columns.
- Column names follow iFDO naming conventions where possible.

**2. JSON**

- A flexible, widely supported format.
- Each image’s metadata includes dataset-level metadata as additional keys.

**3. iFDO**

- Exports metadata in the native iFDO JSON format.
- `image-set-header`: Built from `dataset_metadata`
- `image-set-items`: Built from the image-level metadata
- Automatically converts EXIF and pipeline-derived metadata to iFDO fields
- Fills missing required fields with default descriptions from the iFDO schema for easier review and editing

After exporting to iFDO, the file is validated and any schema errors will be printed to the console.


In [6]:
# export to csv
pipeline.metadata.export_metadata(output_format="csv", metadata=pipeline.get_metadata(flag="all"))

In [7]:
# export to json
pipeline.metadata.export_metadata("json", metadata=pipeline.get_metadata(flag="all"))

In [8]:
# export to iFDO Standard metadata
pipeline.metadata.export_metadata("ifdo", output_path="metadata_ifdo", metadata=pipeline.get_metadata(flag="all"))



---

As you can see, there are several validation errors in the **iFDO metadata**. Please read them carefully to see if you need to update any information in your `dataset_metadata` and `metadata` as input arguments to the export_metadata` function.
It is important to note that the output metadata will fill in missing required fields with default descriptions from the iFDO schema for easy review and editing.