Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParticleSpecies & RecordComponent Serialize #963

Merged
merged 5 commits into from Jul 3, 2021
Merged

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Apr 9, 2021

Preparation to derive information necessary to construct a new series that can access the an equivalent ParticleSpecies or RecordComponent (see #952).

src/RecordComponent.cpp Outdated Show resolved Hide resolved
src/RecordComponent.cpp Outdated Show resolved Hide resolved
examples/2_read_serial.cpp Outdated Show resolved Hide resolved
src/RecordComponent.cpp Outdated Show resolved Hide resolved
src/RecordComponent.cpp Outdated Show resolved Hide resolved
@franzpoeschel
Copy link
Contributor

I have implemented some of the suggestions above on a branch building on this one https://github.com/franzpoeschel/openPMD-api/tree/topic-serialize-suggestion

@ax3l
Copy link
Member Author

ax3l commented Jun 30, 2021

Thank you for the suggestion, I like the openPMDGroup touch, that makes it way more general :)
I didn't realize we have a myPath member on Attributable already, hah!

ax3l and others added 4 commits July 2, 2021 15:35
Preparation to derive information necessary to construct a
new series that can access the same RecordComponent.
- pickle Record Components
- it's aliiive
@ax3l ax3l force-pushed the topic-serialize branch 4 times, most recently from 7a1fdda to 8a28920 Compare July 3, 2021 01:18
@ax3l ax3l changed the title [WIP] ParticleSpecies & RecordComponent Serialize ParticleSpecies & RecordComponent Serialize Jul 3, 2021
// This is a big hack for now, but it works for our use
// case, which is spinning up remote serial read series
// for DASK.
static auto series = openPMD::Series(
Copy link
Member Author

@ax3l ax3l Jul 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much of a hack this static member is to fix up lifetimes, but it seems to work pretty well :)
We only have little control when it gets destroyed, which is likely as late as process end/module unload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when a user wants to unpickle two different Series within the same application? The static auto series will not be constructed a second time and then be stuck with the old file path.
Do you do this for avoiding the Series from going out of scope?
Would something like return seriesAccessor( std::move( series ), group) help? This would ensure that the seriesAccessor takes no reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I see the problem.

    add_pickle(
        cl,
        [](openPMD::Series & series, std::vector< std::string > const & group ) {
            uint64_t const n_it = std::stoull(group.at(1));
            return series.iterations[n_it].meshes[group.at(3)][group.at(4)];
        }
    )

This lets the Series go out of scope and returns only the record component..

I think that this is a fundamental design issue with our Python bindings: Python has garbage collection, but we don't use it. Realistically, calling del series should not be sufficient in Python to destroy the Series, since the series should stay active as long as there is an active dependent member like a contained RecordComponent.
So, moving away from a C++-centered memory model for our Python API should probably do the trick.

Three quick and dirty fixes, each quicker and dirtier than the last:

  1. Is it somehow possible in PyBind to quickly tell the garbage collection that in the above code the series and the returned MeshRecordComponent depend on each other? This should at least implement the above solution locally.
  2. Make a subclass PickledMeshRecordComponent that privately stores the series.
  3. Just do:
    static openPMD::Series series;
    series = openPMD::Series(...); // this is executed every time
    (This solution will only allow one deserialized object to be used at a time though)

Copy link
Member Author

@ax3l ax3l Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fundamental design issue with our Python bindings: Python has garbage collection, but we don't use it.

Offline discussed: What we mean is we don't use the possibility to tell the garbage collector dependencies between our object hierarchy yet (in both directions) yet.

Is it somehow possible in PyBind to quickly tell the garbage collection that in the above code the series and the returned MeshRecordComponent depend on each other? This should at least implement the above solution locally.

Gotta investigate that, would be a nice and clean solution 👍

Currently not tested and likely to fail in this situation:

  • unpickle of multiple objects of the same class (I test different classes to be pickled/unpickled)

good follow-up to at least have the last object (of the same kind) always be in usable state.

@ax3l
Copy link
Member Author

ax3l commented Jul 3, 2021

And lift-off 🚀

openpmd_dask.mp4

DistributedDaskOpenPMD.zip

@ax3l ax3l merged commit 2fd519a into openPMD:dev Jul 3, 2021
@ax3l ax3l deleted the topic-serialize branch July 3, 2021 06:16
@ax3l ax3l mentioned this pull request Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants