Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADIOS1 backend #126

Merged
merged 33 commits into from
Apr 27, 2018
Merged

ADIOS1 backend #126

merged 33 commits into from
Apr 27, 2018

Conversation

C0nsultant
Copy link
Member

@C0nsultant C0nsultant commented Apr 11, 2018

This adds both a serial and parallel ADIOS1 backend.

One potential drawback of this backend is that it can only mutually exclusively build a non-MPI/parallel version. This might induce contraints on other backends as well, depending on how they are built.

The implementation in this PR does the bare minimum to use ADIOS1 effectively:

  • It considers the typical ADIOS workflow of defining variables before opening a file and tries to reduce the number of times a file has to be (re-)opened. To do this, file opening is deferred until the first write must happen. Two IO queues are used in the backend to discern the two phases of operations (before / after file open). One condition under which re-opening a file can not be avoided is when a manual Series::flush() is called before new a variable is defined (there may be more complex ones that we have not recognized yet).
  • It bunches reads for every file to schedule a combined read of multiple chunks.
  • It keeps file handles open as long as is possible. It also reopenes them when required.
  • It allows to write dynamically sized chunks per dataset. Typically, ADIOS recommends to define a unique variable for every different-sized/-localized chunk for every dataset. We solve this by creating scalar varaibles for every every dimension of the extent and offset (in "/tmp", so it does not conflict with the openPMD standard). After being defined once, these variables are re-used as the extent and offset for any chunk of a dataset. To do this, the scalar variables just have to be overwritten with the desired extent and offset values before performing the chunk write.

@@ -41,7 +41,7 @@ endfunction()

openpmd_option(MPI "Enable MPI support" AUTO)
openpmd_option(HDF5 "Enable HDF5 support" AUTO)
openpmd_option(ADIOS1 "Enable ADIOS1 support" OFF)
openpmd_option(ADIOS1 "Enable ADIOS1 support" AUTO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also occurs in README.md, docs/source/dev/buildoptions.rst and docs/source/dev/dependencies.rst

@ax3l ax3l mentioned this pull request Apr 12, 2018
Determine and check the parallel/serial status of a found ADIOS1 lib
CMakeLists.txt Outdated
#endif()
# TODO we could support more combinations than MPI+pADIOS and noMPI+sADIOS
if(openPMD_HAVE_MPI AND openPMD_HAVE_ADIOS1 AND ADIOS_HAVE_SEQUENTIAL)
message(FATAL_ERROR "Found MPI but requested ADIOS1 is serial. "
Copy link
Member

@ax3l ax3l Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure a MPI-parallel ADIOS not not still ship a "sequential" component "as well"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library itself might provide both, but with the used FindAdios.cmake only mutually exclusive versions get linked. Since ADIOS has a unified interface (with dummy MPI if serial), I doubt you can link both simultaneously.

@@ -178,6 +179,9 @@ if(ADIOS_FOUND)
endforeach()
# we could append ${CMAKE_PREFIX_PATH} now but that is not really necessary

# determine whether found library is serial only
Copy link
Member

@ax3l ax3l Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can only determine:

  • is serial and parallel
  • is serial only

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, the vocabulary in this comment is not sufficient. This determines if the library links as serial only with the specified components.
The library itself might be serial + parallel.

Copy link
Member

@ax3l ax3l Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool.

want I wanted to say is just: if you build serial, the lib/ dir will contain a regular and a _nompi version, as far as I remember.

If you check that only the noMPI is linked this is fine, but you might want to prefer it someway during search and validate it with two adios installations (/w and /wo MPI), printing all vars to make sure. Also includes (and their #defines) should match.

Bump required ADIOS1 version to 1.13.0.
Enable serial ADIOS1 through -D_NOMPI.
Enable tests according to used backends.
Move ADIOS1 logic form parallel to serial (dummy MPI allows for mostly
the same code).
@ax3l ax3l self-assigned this Apr 15, 2018
@ax3l ax3l added this to To do in First Stable Release via automation Apr 15, 2018
@@ -16,6 +16,9 @@ packages:
openmpi@1.6.5%gcc@7.2.0 arch=linux-ubuntu14-x86_64: /usr
openmpi@1.6.5%clang@5.0.0 arch=linux-ubuntu14-x86_64: /usr
buildable: False
packages:
Copy link
Member

@ax3l ax3l Apr 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh oh, pls remove this single line

@C0nsultant C0nsultant moved this from To do to In progress in First Stable Release Apr 15, 2018
@ax3l
Copy link
Member

ax3l commented Apr 15, 2018

I think there is more to do for the CI than we can do in the next few days.
Just leave it off for now (remove the last three commits) and we will add travis for ADIOS in a follow-up together end of the week.

@C0nsultant
Copy link
Member Author

This is probably not a problem with our build process. The errors are caused by timeouts during download of our dependencies (e.g. https://zlib.net was down yesterday). Now, Boost & HDF5 are horribly slow, but making progress.

@ax3l
Copy link
Member

ax3l commented Apr 16, 2018 via email

@@ -83,6 +99,9 @@ allocatePtr(Datatype dtype, size_t numPoints)
data = new bool[numPoints];
del = [](void* p){ delete[] static_cast< bool* >(p); p = nullptr; };
break;
case DT::STRING:
/* user assings c_str pointer */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: assigns

dims.c_str(),
global_dims.c_str(),
local_offsets.c_str());
ASSERT(id >= 0 /* ??? */, "Internal error: Failed to define ADIOS variable during Dataset writing");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good candidate for a VERIFY macro - seen this a lot in the last on startup/env issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much agreed. We should not exclude these assertions in production code, unless the user really wants to.

break;
}
case DT::BOOL:
throw std::runtime_error("No workaround for ADIOS1 bool implemented");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current PIConGPU representation: ComputationalRadiationPhysics/picongpu#1756

also just as ints, unfortunately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently tinkering with this. Adapting the behaviour of using unsigned bytes for bools means we can store them but not read them back typesafe.
Having every uint8_t of value 0 be reinterpreted to a bool false and every uint8_t of value 1 be reinterpreted to a bool true is undesirable when reading a file.

Dataset opening, path & dataset listing.
Change frontend logic to rely less on HDF5 backend implementation
quirks.
Working ADIOS test case.
Working ADIOS read test case.
ADIOS attribute datatype test case.
To fit ADIOS's workflow better and to avoid redundant file open/file
closing, separate the IO operations in ADIOS1 backend into two queues:

Defines and file context preperation (m_setup) that gets drained before
a file is opened for writing.

Actual data transfers that require an open file but do not re-open it (
m_work).
1.13.1 is very mich desired.
@@ -126,8 +126,9 @@ join(std::vector< std::string > const& vs, std::string const& delimiter)
default:
std::ostringstream ss;
std::copy(vs.begin(),
vs.end(),
vs.end() - 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o.0

this changes looks a bit unsave, e.g. for zero-lengths :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offline discussion: add comment // do not append separator after last element

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty and size 1 vector cases are handled above ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, saw this too late :)

@ax3l
Copy link
Member

ax3l commented Apr 23, 2018

@C0nsultant can you pls add a little to-do list in the PR description what is still missing in this PR? :) Just curious :)

@C0nsultant
Copy link
Member Author

I added the major two things that have me refrain from merging yet.
Nothing major, at least the testing can be done in a few hours (should have time for that tomorrow).

Adapt the different ADIOS workflow to work with fileBased output. This
requires closing and flushing open handles as ADIOS handles are only
one-directional (in- or out-put).
@ax3l
Copy link
Member

ax3l commented Apr 23, 2018

sounds good.
leave the CI out for now, it looks still buggy and I had no time to progress on it.

Treat booleans as unsigned bytes. This allows us to write them, but not
read them back as bool (they are uint8_t).
@C0nsultant C0nsultant changed the title [WIP] ADIOS1 backend ADIOS1 backend Apr 26, 2018
C0nsultant and others added 5 commits April 26, 2018 11:40
Include new matcher functionality in tutorial.
ADIOS1 does not offer deletion inside files, so handle those request
non-gracefully.
@ax3l
Copy link
Member

ax3l commented Apr 26, 2018

can only mutually exclusively build a serial/parallel version

@C0nsultant but if I build the parallel version and satisfy the MPI dependency e.g. when running in mpiexec - can I still create a serial Series from a single rank? This should work, no?

CMakeLists.txt Outdated
@@ -301,6 +303,10 @@ if(openPMD_HAVE_ADIOS1)
target_link_libraries(openPMD PUBLIC ${ADIOS_LIBRARIES})
target_include_directories(openPMD SYSTEM PUBLIC ${ADIOS_INCLUDE_DIRS})
target_compile_definitions(openPMD PUBLIC "-DopenPMD_HAVE_ADIOS1=1")
if(ADIOS_HAVE_SEQUENTIAL)
# TODO might be smarter to get ALL definitions from adios-config -s & parse it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ideally in FindADIOS.cmake we should populate ADIOS_DEFINITIONS. We then use that var here.

CMakeLists.txt Outdated
if(${examplename} MATCHES ".+parallel$")
if(openPMD_HAVE_MPI)
# Current examples all use HDF5, elaborate if other backends are used
if(openPMD_HAVE_HDF5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't exclude the build, just the test below is fine.
the examples should still be build and only throw errors at runtime if they try to use a backend that does not exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the examples as tests in CI. With this approach, CI tests without HDF5 will always fail.

Copy link
Member

@ax3l ax3l Apr 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the add_tests are in the block below and properly guarded. Those are the runtimeand require HDF5 files and backend. but the compile should still be done.

case DT::UNDEFINED:
throw std::runtime_error("Unknown Attribute datatype");
default:
throw std::runtime_error("Datatype not implemented in HDF5 IO");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADIOS :)

/** Create a functor to determine if a file can be of a format given the filename on disk.
*
* @param name String containing desired filename without filename extension.
* @param f File format to check plausibility for.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe write "backend applicability" instead of "plausibility"?

break;
case DT::UNDEFINED:
case DT::DATATYPE:
throw std::runtime_error("Unknown Attribute datatype");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if one has still to break after a throw...?
probably, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execution at that point aborts until you reach the next mathcing catch (if it reaches the top level, the process aborts). And even inside that catch, you can not possilbly recover to the point of failure (unless you use goto-fuckery).

So no, there is no break required after a throw just like after a return.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I was more wondering if the default can be reached without the break.

for( int i = 0; i < size; ++i )
{
vs[i] = auxiliary::strip(std::string(c[i], std::strlen(c[i])), {'\0'});
/* TODO pointer should be freed, but this causes memory curruption */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a doxygen todo please: /** @todo ...

char const* c = params.str().c_str();

int status;
/* TODO ADIOS_READ_METHOD_BP_AGGREGATE */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//! @todo ...

REQUIRE(s.getAttribute("vecDouble").get< std::vector< double > >() == std::vector< double >({0., 1.79769e+308}));
REQUIRE(s.getAttribute("vecLongdouble").get< std::vector< long double > >() == std::vector< long double >({0.L, 1.18973e+4932L}));
REQUIRE(s.getAttribute("vecString").get< std::vector< std::string > >() == std::vector< std::string >({"vector", "of", "strings"}));
REQUIRE(s.getAttribute("bool").get< uint8_t >() == static_cast< uint8_t >(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment on bool reads please?
maybe we also want to write this in the manual in a section on "backends: features & limitations" or so.


TEST_CASE( "hzdr_adios1_sample_content_test", "[serial][adios1]" )
{
// since this file might not be publicly available, gracefully handle errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add /** @todo add bp example files to https://github.com/openPMD/openPMD-example-datasets */

@C0nsultant
Copy link
Member Author

but if I build the parallel version and satisfy the MPI dependency e.g. when running in mpiexec - can I still create a serial Series from a single rank? This should work, no?

Yes. Yes, that always works with single-size MPI communicators.
Let's rephrase that to non-MPI and parallel versions.

@C0nsultant C0nsultant merged commit 0b17f15 into openPMD:dev Apr 27, 2018
First Stable Release automation moved this from In progress to Done Apr 27, 2018
@C0nsultant C0nsultant deleted the topic/adios branch April 27, 2018 13:12
@ax3l
Copy link
Member

ax3l commented May 8, 2018

Funny, I just saw this: https://github.com/ornladios/ADIOS/blob/v1.13.1/KNOWN_BUGS

That means writing a var for each chunk's dimensions is indeed the right way to go in ADIOS1 ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants