Skip to content

Commit

Permalink
Merge pull request #3939 from spyridon97/add-docs-from-examples
Browse files Browse the repository at this point in the history
Move useful docs from ADIOS2-examples to ADIOS2
  • Loading branch information
vicentebolea committed Dec 5, 2023
2 parents 08f8995 + bf7b229 commit 7c95e3f
Show file tree
Hide file tree
Showing 3 changed files with 222 additions and 11 deletions.
136 changes: 136 additions & 0 deletions docs/user_guide/source/components/anatomy.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
.. _sec:basics_interface_components_anatomy:

***************************
Anatomy of an ADIOS Program
***************************

Anatomy of an ADIOS Output
--------------------------

.. code:: C++

ADIOS adios("config.xml", MPI_COMM_WORLD);
|
| IO io = adios.DeclareIO(...);
| |
| | Variable<...> var = io.DefineVariable<...>(...)
| | Attribute<...> attr = io.DefineAttribute<...>(...)
| | Engine e = io.Open("OutputFileName.bp", adios2::Mode::Write);
| | |
| | | e.BeginStep()
| | | |
| | | | e.Put(var, datapointer);
| | | |
| | | e.EndStep()
| | |
| | e.Close();
| |
| |--> IO goes out of scope
|
|--> ADIOS goes out of scope or adios2_finalize()

The pseudo code above depicts the basic structure of performing output. The ``ADIOS`` object is necessary to hold all
other objects. It is initialized with an MPI communicator in a parallel program or without in a serial program.
Additionally, a config file (XML or YAML format) can be specified here to load runtime configuration. Only one ADIOS
object is needed throughout the entire application but you can create as many as you want (e.g. if you need to separate
IO objects using the same name in a program that reads similar input from an ensemble of multiple applications).

The ``IO`` object is required to hold the variable and attribute definitions, and runtime options for a particular input
or output stream. The IO object has a name, which is used only to refer to runtime options in the configuration file.
One IO object can only be used in one output or input stream. The only exception where an IO object can be used twice is
one input stream plus one output stream where the output is reusing the variable definitions loaded during input.

``Variable`` and ``Attribute`` definitions belong to one IO object, which means, they can only be used in one output.
You need to define new ones for other outputs. Just because a Variable is defined, it will not appear in the output
unless an associated Put() call provides the content.

A stream is opened and closed once. The ``Engine`` object implements the data movement for the stream. It depends on the
runtime options of the IO object that what type of an engine is created in the Open() call. One output step is denoted
by a pair of BeginStep..EndStep block.

An output step consist of variables and attributes. Variables are just definitions without content, so one must call a
Put() function to provide the application data pointer that contains the data content one wants to write out. Attributes
have their content in their definitions so there is no need for an extra call.

Some rules:

* Variables can be defined any time, before the corresponding Put() call
* Attributes can be defined any time before EndStep
* The following functions must be treated as Collective operations

* ADIOS
* Open
* BeginStep
* EndStep
* Close

.. note::

If there is only one output step, and we only want to write it to a file on disk, never stream it to other
application, then BeginStep and EndStep are not required but it does not make any difference if they are called.

Anatomy of an ADIOS Input
-------------------------

.. code:: C++

ADIOS adios("config.xml", MPI_COMM_WORLD);
|
| IO io = adios.DeclareIO(...);
| |
| | Engine e = io.Open("InputFileName.bp", adios2::Mode::Read);
| | |
| | | e.BeginStep()
| | | |
| | | | varlist = io.AvailableVariables(...)
| | | | Variable var = io.InquireVariable(...)
| | | | Attribute attr = io.InquireAttribute(...)
| | | | |
| | | | | e.Get(var, datapointer);
| | | | |
| | | |
| | | e.EndStep()
| | |
| | e.Close();
| |
| |--> IO goes out of scope
|
|--> ADIOS goes out of scope or adios2_finalize()
The difference between input and output is that while we have to define the variables and attributes for an output, we
have to retrieve the available variables in an input first as definitions (Variable and Attribute objects).

If we know the particular variable (name and type) in the input stream, we can get the definition using
InquireVariable(). Generic tools that process any input must use other functions to retrieve the list of variable names
and their types first and then get the individual Variable objects. The same is true for Attributes.

Anatomy of an ADIOS File-only Input
-----------------------------------

Previously we explored how to read using the input mode `adios2::Mode::Read`. Nonetheless, ADIOS has another input mode
named `adios2::Mode::ReadRandomAccess`. `adios2::Mode::Read` mode allows data access only timestep by timestep using
`BeginStep/EndStep`, but generally it is more memory efficient as ADIOS is only required to load metadata for the
current timestep. `ReadRandomAccess` can only be used with file engines and involves loading all the file metadata at
once. So it can be more memory intensive than `adios2::Mode::Read` mode, but allows reading data from any timestep using
`SetStepSelection()`. If you use `adios2::Mode::ReadRandomAccess` mode, be sure to allocate enough memory to hold
multiple steps of the variable content.

.. code:: C++

ADIOS adios("config.xml", MPI_COMM_WORLD);
|
| IO io = adios.DeclareIO(...);
| |
| | Engine e = io.Open("InputFileName.bp", adios2::Mode::ReadRandomAccess);
| | |
| | | Variable var = io.InquireVariable(...)
| | | | var.SetStepSelection()
| | | | e.Get(var, datapointer);
| | | |
| | |
| | e.Close();
| |
| |--> IO goes out of scope
|
|--> ADIOS goes out of scope or adios2_finalize()
1 change: 1 addition & 0 deletions docs/user_guide/source/components/components.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ Interface Components
.. include:: engine.rst
.. include:: operator.rst
.. include:: runtime.rst
.. include:: anatomy.rst
96 changes: 85 additions & 11 deletions docs/user_guide/source/components/variable.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,17 @@ An ``adios2::Variable`` is the link between a piece of data coming from an appli
This component handles all application variables classified by data type and shape.

Each ``IO`` holds a set of Variables, and each ``Variable`` is identified with a unique name.
They are created using the reference from ``IO::DefineVariable<T>`` or retrieved using the pointer from ``IO::InquireVariable<T>`` functions in :ref:`IO`.
They are created using the reference from ``IO::DefineVariable<T>`` or retrieved using the pointer from
``IO::InquireVariable<T>`` functions in :ref:`IO`.

Data Types
--------------------
----------

Only primitive types are supported in ADIOS2.
Fixed-width types from `<cinttypes> and <cstdint> <https://en.cppreference.com/w/cpp/types/integer>`_ should be preferred when writing portable code.
ADIOS2 maps primitive types to equivalent fixed-width types (e.g. ``int`` -> ``int32_t``).
In C++, acceptable types ``T`` in ``Variable<T>`` along with their preferred fix-width equivalent in 64-bit platforms are given below:
Fixed-width types from `<cinttypes> and <cstdint> <https://en.cppreference.com/w/cpp/types/integer>`_ should be
preferred when writing portable code. ADIOS2 maps primitive types to equivalent fixed-width types
(e.g. ``int`` -> ``int32_t``). In C++, acceptable types ``T`` in ``Variable<T>`` along with their preferred fix-width
equivalent in 64-bit platforms are given below:

.. code-block:: c++

Expand Down Expand Up @@ -52,19 +54,19 @@ In C++, acceptable types ``T`` in ``Variable<T>`` along with their preferred fix
Python APIs: use the equivalent fixed-width types from numpy.
If ``dtype`` is not specified, ADIOS2 handles numpy defaults just fine as long as primitive types are passed.


Shapes
---------------------
------

ADIOS2 is designed for MPI applications.
Thus different application data shapes must be supported depending on their scope within a particular MPI communicator.
The shape is defined at creation from the ``IO`` object by providing the dimensions: shape, start, count in the ``IO::DefineVariable<T>``.
The supported shapes are described below.
The shape is defined at creation from the ``IO`` object by providing the dimensions: shape, start, count in the
``IO::DefineVariable<T>``. The supported shapes are described below.


1. **Global Single Value**:
Only a name is required for their definition.
These variables are helpful for storing global information, preferably managed by only one MPI process, that may or may not change over steps: *e.g.* total number of particles, collective norm, number of nodes/cells, etc.
These variables are helpful for storing global information, preferably managed by only one MPI process, that may or may
not change over steps: *e.g.* total number of particles, collective norm, number of nodes/cells, etc.

.. code-block:: c++

Expand Down Expand Up @@ -157,8 +159,80 @@ be applicable to it.

JoinedArrays are currently only supported by BP4 and BP5 engines,
as well as the SST engine with BP5 marshalling.


Global Array Capabilities and Limitations
-----------------------------------------

ADIOS2 is focusing on writing and reading N-dimensional, distributed, global arrays of primitive types. The basic idea
is that, usually, a simulation has such a data structure in memory (distributed across multiple processes) and wants to
dump its content regularly as it progresses. ADIOS2 was designed to:

1. to do this writing and reading as fast as possible
2. to enable reading any subsection of the array

.. image:: https://imgur.com/6nX67yq.png
:width: 400

The figure above shows a parallel application of 12 processes producing a 2D array. Each process has a 2D array locally
and the output is created by placing them into a 4x3 pattern. A reading application's individual process then can read
any subsection of the entire global array. In the figure, a 6 process application decomposes the array in a 3x2 pattern
and each process reads a 2D array whose content comes from multiple producer processes.

The figure hopefully helps to understand the basic concept but it can be also misleading if it suggests limitations that
are not there. Global Array is simply a boundary in N-dimensional space where processes can place their blocks of data.
In the global space:

1. one process can place multiple blocks

.. image:: https://imgur.com/Pb1s03h.png
:width: 400

2. does NOT need to be fully covered by the blocks

.. image:: https://imgur.com/qJBXYcQ.png
:width: 400

* at reading, unfilled positions will not change the allocated memory

3. blocks can overlap

.. image:: https://imgur.com/GA59lZ2.png
:width: 300

* the reader will get values in an overlapping position from one of the block but there is no control over from which
block

4. each process can put a different size of block, or put multiple blocks of different sizes

5. some process may not contribute anything to the global array

Over multiple output steps

1. the processes CAN change the size (and number) of blocks in the array

* E.g. atom table: global size is fixed but atoms wander around processes, so their block size is changing

.. image:: https://imgur.com/DorjG2q.png
:width: 400

2. the global dimensions CAN change over output steps

* but then you cannot read multiple steps at once
* E.g. particle table size changes due to particles disappearing or appearing

.. image:: https://imgur.com/nkuHeVX.png
:width: 400


Limitations of the ADIOS global array concept

1. Indexing starts from 0
2. Cyclic data patterns are not supported; only blocks can be written or read
3. If Some blocks may fully or partially fall outside of the global boundary, the reader will not be able to read those
parts

.. note::

Technically, the content of the individual blocks is kept in the BP format (but not in HDF5 format) and in staging.
If you really, really want to retrieve all the blocks, you need to handle this array as a Local Array and read the
blocks one by one.

0 comments on commit 7c95e3f

Please sign in to comment.