Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read backend configuration from .toml files #1146

Merged
merged 11 commits into from
Jan 11, 2022
11 changes: 8 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -582,9 +582,11 @@ if(openPMD_HAVE_ADIOS1)
endif()

target_include_directories(openPMD.ADIOS1.Serial SYSTEM PRIVATE
$<TARGET_PROPERTY:openPMD::thirdparty::nlohmann_json,INTERFACE_INCLUDE_DIRECTORIES>)
$<TARGET_PROPERTY:openPMD::thirdparty::nlohmann_json,INTERFACE_INCLUDE_DIRECTORIES>
$<TARGET_PROPERTY:openPMD::thirdparty::toml11,INTERFACE_INCLUDE_DIRECTORIES>)
target_include_directories(openPMD.ADIOS1.Parallel SYSTEM PRIVATE
$<TARGET_PROPERTY:openPMD::thirdparty::nlohmann_json,INTERFACE_INCLUDE_DIRECTORIES>)
$<TARGET_PROPERTY:openPMD::thirdparty::nlohmann_json,INTERFACE_INCLUDE_DIRECTORIES>
$<TARGET_PROPERTY:openPMD::thirdparty::toml11,INTERFACE_INCLUDE_DIRECTORIES>)

set_target_properties(openPMD.ADIOS1.Serial PROPERTIES
POSITION_INDEPENDENT_CODE ON
Expand Down Expand Up @@ -805,6 +807,7 @@ set(openPMD_EXAMPLE_NAMES
10_streaming_write
10_streaming_read
12_span_write
13_write_dynamic_configuration
)
set(openPMD_PYTHON_EXAMPLE_NAMES
2_read_serial
Expand All @@ -820,6 +823,7 @@ set(openPMD_PYTHON_EXAMPLE_NAMES
10_streaming_read
11_particle_dataframe
12_span_write
13_write_dynamic_configuration
)

if(openPMD_USE_INVASIVE_TESTS)
Expand Down Expand Up @@ -866,7 +870,8 @@ if(openPMD_BUILD_TESTING)

if(${testname} STREQUAL JSON)
target_include_directories(${testname}Tests SYSTEM PRIVATE
$<TARGET_PROPERTY:openPMD::thirdparty::nlohmann_json,INTERFACE_INCLUDE_DIRECTORIES>)
$<TARGET_PROPERTY:openPMD::thirdparty::nlohmann_json,INTERFACE_INCLUDE_DIRECTORIES>
$<TARGET_PROPERTY:openPMD::thirdparty::toml11,INTERFACE_INCLUDE_DIRECTORIES>)
endif()
endforeach()
endif()
Expand Down
2 changes: 2 additions & 0 deletions docs/source/details/adios1.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[adios.dataset]
transform = "blosc:compressor=zlib,shuffle=bit,lvl=1;nometa"
24 changes: 24 additions & 0 deletions docs/source/details/adios2.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[adios2.engine]
type = "sst"

[adios2.engine.parameters]
BufferGrowthFactor = "2.0"
QueueLimit = "2"

# use double brackets to indicate lists
[[adios2.dataset.operators]]
type = "blosc"

# specify parameters for the current operator
[adios2.dataset.operators.parameters]
clevel = "1"
doshuffle = "BLOSC_BITSHUFFLE"

# use double brackets a second time to indicate a further entry
[[adios2.dataset.operators]]
# specify a second operator here
type = "some other operator"

# the parameters dictionary can also be specified in-line
parameters.clevel = "1"
parameters.doshuffle = "BLOSC_BITSHUFFLE"
58 changes: 45 additions & 13 deletions docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
@@ -1,17 +1,22 @@
.. _backendconfig:

JSON configuration
==================
JSON/TOML configuration
=======================

While the openPMD API intends to be a backend-*independent* implementation of the openPMD standard, it is sometimes useful to pass configuration parameters to the specific backend in use.
:ref:`For each backend <backends-overview>`, configuration options can be passed via a JSON-formatted string or via environment variables.
A JSON option always takes precedence over an environment variable.
:ref:`For each backend <backends-overview>`, configuration options can be passed via a JSON- or TOML-formatted string or via environment variables.
A JSON/TOML option always takes precedence over an environment variable.

The fundamental structure of this JSON configuration string is given as follows:

.. literalinclude:: config_layout.json
:language: json

Every JSON configuration can alternatively be given by its TOML equivalent:

.. literalinclude:: config_layout.toml
:language: toml

This structure allows keeping one configuration string for several backends at once, with the concrete backend configuration being chosen upon choosing the backend itself.

Options that can be configured via JSON are often also accessible via other means, e.g. environment variables.
Expand All @@ -20,41 +25,62 @@ The following list specifies the priority of these means, beginning with the low
1. Default values
2. Automatically detected options, e.g. the backend being detected by inspection of the file extension
3. Environment variables
4. JSON configuration. For JSON, a dataset-specific configuration overwrites a global, Series-wide configuration.
4. JSON/TOML configuration. For JSON/TOML, a dataset-specific configuration overwrites a global, Series-wide configuration.
5. Explicit API calls such as ``setIterationEncoding()``

The configuration is read in a case-insensitive manner, keys as well as values.
An exception to this are string values which are forwarded to other libraries such as ADIOS1 and ADIOS2.
Those are read "as-is" and interpreted by the backend library.
Generally, keys of the configuration are *lower case*.
Parameters that are directly passed through to an external library and not interpreted within openPMD API (e.g. ``adios2.engine.parameters``) are unaffected by this and follow the respective library's conventions.

The configuration string may refer to the complete ``openPMD::Series`` or may additionally be specified per ``openPMD::Dataset``, passed in the respective constructors.
This reflects the fact that certain backend-specific parameters may refer to the whole Series (such as storage engines and their parameters) and others refer to actual datasets (such as compression).

A JSON configuration may either be specified as a regular string that can be parsed as a JSON object, or in the constructor of ``openPMD::Series`` alternatively as a path to a JSON-formatted text file.
In the latter case, the file path must be prepended by an at-sign ``@``.
A JSON/TOML configuration may either be specified as an inline string that can be parsed as a JSON/TOML object, or alternatively as a path to a JSON/TOML-formatted text file (only in the constructor of ``openPMD::Series``):

* File paths are distinguished by prepending them with an at-sign ``@``.
JSON and TOML are then distinguished by the filename extension ``.json`` or ``.toml``.
If no extension can be uniquely identified, JSON is assumed as default.
* If no at-sign ``@`` is given, an inline string is assumed.
If the first non-blank character of the string is a ``{``, it will be parsed as a JSON value.
Otherwise, it is parsed as a TOML value.

For a consistent user interface, backends shall follow the following rules:

* The configuration structures for the Series and for each dataset should be defined equivalently.
* Any setting referring to single datasets should also be applicable globally, affecting all datasets.
* Any setting referring to single datasets should also be applicable globally, affecting all datasets (specifying a default).
* If a setting is defined globally, but also for a concrete dataset, the dataset-specific setting should override the global one.
* If a setting is passed to a dataset that only makes sense globally (such as the storage engine), the setting should be ignored except for printing a warning.
Backends should define clearly which keys are applicable to datasets and which are not.
* All dataset-specific options should be passed inside the ``dataset`` object, e.g.:

.. code-block:: json

{
"adios2": {
"dataset": {
"put dataset options": "here"
}
}
}

.. code-block:: toml

[adios2.dataset]
# put dataset options here


Backend-independent JSON configuration
--------------------------------------

The openPMD backend can be chosen via the JSON key ``backend`` which recognizes the alternatives ``["hdf5", "adios1", "adios2", "json"]``.
The openPMD backend can be chosen via the JSON/TOML key ``backend`` which recognizes the alternatives ``["hdf5", "adios1", "adios2", "json"]``.

The iteration encoding can be chosen via the JSON key ``iteration_encoding`` which recognizes the alternatives ``["file_based", "group_based", "variable_based"]``.
The iteration encoding can be chosen via the JSON/TOML key ``iteration_encoding`` which recognizes the alternatives ``["file_based", "group_based", "variable_based"]``.
Note that for file-based iteration encoding, specification of the expansion pattern in the file name (e.g. ``data_%T.json``) remains mandatory.

The key ``defer_iteration_parsing`` can be used to optimize the process of opening an openPMD Series (deferred/lazy parsing).
By default, a Series is parsed eagerly, i.e. opening a Series implies reading all available iterations.
Especially when a Series has many iterations, this can be a costly operation and users may wish to defer parsing of iterations to a later point adding ``{"defer_iteration_parsing": true}`` to their JSON configuration.
Especially when a Series has many iterations, this can be a costly operation and users may wish to defer parsing of iterations to a later point adding ``{"defer_iteration_parsing": true}`` to their JSON/TOML configuration.

When parsing non-eagerly, each iteration needs to be explicitly opened with ``Iteration::open()`` before accessing.
(Notice that ``Iteration::open()`` is generally recommended to be used in parallel contexts to avoid parallel file accessing hazards).
Expand All @@ -80,6 +106,9 @@ A full configuration of the ADIOS2 backend:
.. literalinclude:: adios2.json
:language: json

.. literalinclude:: adios2.toml
:language: toml

All keys found under ``adios2.dataset`` are applicable globally as well as per dataset, keys found under ``adios2.engine`` only globally.
Explanation of the single keys:

Expand Down Expand Up @@ -124,11 +153,14 @@ Explanation of the single keys:
ADIOS1
^^^^^^

ADIOS1 allows configuring custom dataset transforms via JSON:
ADIOS1 allows configuring custom dataset transforms via JSON/TOML:

.. literalinclude:: adios1.json
:language: json

.. literalinclude:: adios1.toml
:language: toml

This configuration can be passed globally (i.e. for the ``Series`` object) to apply for all datasets.
Alternatively, it can also be passed for single ``Dataset`` objects to only apply for single datasets.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/details/config_layout.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"adios": "put ADIOS config here",
"adios1": "put ADIOS config here",
"adios2": "put ADIOS2 config here",
"hdf5": "put HDF5 config here",
"json": "put JSON config here"
Expand Down
11 changes: 11 additions & 0 deletions docs/source/details/config_layout.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[adios1]
# put ADIOS config here

[adios2]
# put ADIOS2 config here

[hdf5]
# put HDF5 config here

[json]
# put JSON config here
134 changes: 134 additions & 0 deletions examples/13_write_dynamic_configuration.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
#include <openPMD/openPMD.hpp>

#include <algorithm>
#include <iostream>
#include <memory>
#include <numeric> // std::iota

using std::cout;
using namespace openPMD;


int main()
{
if( !getVariants()["adios2"] )
{
// Example configuration below selects the ADIOS2 backend
return 0;
}

using position_t = double;
/*
* This example demonstrates how to use JSON/TOML-based dynamic
* configuration for openPMD.
* The following configuration is passed to the constructor of the Series
* class and specifies the defaults to used for that Series.
* This configuration can later be overridden as needed on a per-dataset
* level.
*/
std::string const defaults = R"END(
# This configuration is TOML-based
# JSON can be used alternatively, the openPMD-api will automatically detect
# the language being used
#
# Alternatively, the location of a JSON/TOML-file on the filesystem can
# be passed by adding an at-sign `@` in front of the path
# The format will then be recognized by filename extension, i.e. .json or .toml

backend = "adios2"
iteration_encoding = "group_based"
# The following is only relevant in read mode
defer_iteration_parsing = true

[adios1.dataset]
transform = "blosc:compressor=zlib,shuffle=bit,lvl=5;nometa"

[adios2.engine]
type = "bp4"

# ADIOS2 allows adding several operators
# Lists are given in TOML by using double brackets
[[adios2.dataset.operators]]
type = "zlib"

parameters.clevel = 5
# Alternatively:
# [adios2.dataset.operators.parameters]
# clevel = 9

# For adding a further parameter:
# [[adios2.dataset.operators]]
# type = "some other parameter"
# # ...

[hdf5.dataset]
chunks = "auto"
)END";

// open file for writing
Series series =
Series( "../samples/dynamicConfig.bp", Access::CREATE, defaults );

Datatype datatype = determineDatatype< position_t >();
constexpr unsigned long length = 10ul;
Extent global_extent = { length };
Dataset dataset = Dataset( datatype, global_extent );
std::shared_ptr< position_t > local_data(
new position_t[ length ],
[]( position_t const * ptr ) { delete[] ptr; } );

WriteIterations iterations = series.writeIterations();
for( size_t i = 0; i < 100; ++i )
{
Iteration iteration = iterations[ i ];
Record electronPositions = iteration.particles[ "e" ][ "position" ];

std::iota( local_data.get(), local_data.get() + length, i * length );
for( auto const & dim : { "x", "y", "z" } )
{
RecordComponent pos = electronPositions[ dim ];
pos.resetDataset( dataset );
pos.storeChunk( local_data, Offset{ 0 }, global_extent );
}

/*
* We want different compression settings for this dataset, so we pass
* a dataset-specific configuration.
* Also showcase how to define an resizable dataset.
* This time in JSON.
*/
std::string const differentCompressionSettings = R"END(
{
"resizable": true,
"adios1": {
"dataset": {
"transform": "blosc:compressor=zlib,shuffle=bit,lvl=1;nometa"
}
},
"adios2": {
"dataset": {
"operators": [
{
"type": "zlib",
"parameters": {
"clevel": 9
}
}
]
}
}
})END";
Dataset differentlyCompressedDataset{ Datatype::INT, { 10 } };
differentlyCompressedDataset.options = differentCompressionSettings;

auto someMesh = iteration.meshes[ "differentCompressionSettings" ]
[ RecordComponent::SCALAR ];
someMesh.resetDataset( differentlyCompressedDataset );
std::vector< int > dataVec( 10, i );
someMesh.storeChunk( dataVec, { 0 }, { 10 } );

iteration.close();
}

return 0;
}