Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/gsd with queue #1543

Merged
merged 39 commits into from Jun 5, 2023
Merged

Feat/gsd with queue #1543

merged 39 commits into from Jun 5, 2023

Conversation

b-butler
Copy link
Member

@b-butler b-butler commented May 2, 2023

Description

Add a hoomd.write.Deque class which stores a std::deque of logger and trajectory data up to a specified maximum size N. The deque pushes data to the front and pops old data off the back of the deque. The analyze method only collects and removes data as necessary. All writes are triggered by the dump method exposed in Python. The purpose of this class is to allow selective high frequency data storing, by allowing users to determine when to write data.

One foreseen pattern of use would be to create a custom action which composes a Deque object calling dump when the specified conditions have been met.

Motivation and context

To provide a means for high-performance, high-frequency writes in HOOMD.

How has this been tested?

Tests based on the GSD tests were added.

Change log

Added: `hoomd.write.Deque` class for enabling selective high-frequency frame writing.

Checklist:

b-butler and others added 22 commits April 27, 2023 13:29
This will more easily allow for derived classes of GSDDumpWriter since
they can choose when to write or not.
- Move initFileIO to protected
- Add method for querying file initialization
This class provided a triggerable multi frame write.
Also move initFileIO to the constructor. This prevents the need for
double caching of the buffer size prior to opening the file.
Ensure that we do not rely on Python's garbage collector to ensure
that the file is written after the call to write() ends.
Also check for errors in new GSD API calls.
While this is unexpected, it is required to keep performance high as
until the first write for a file occurs, all data is recorded and sorted
in a GSDDumpWriter::GSDFrame object. By writing the first frame, we
prevent this. However, we then don't know if the first frame should be
included in the actual analysis.
All tests including logging tests now pass.
@b-butler b-butler force-pushed the feat/gsd-with-queue branch 2 times, most recently from 46c68d8 to f141d6e Compare May 8, 2023 20:24
@b-butler b-butler marked this pull request as ready for review May 8, 2023 21:36
@b-butler b-butler requested a review from a team as a code owner May 8, 2023 21:36
Copy link
Contributor

@AlainKadar AlainKadar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had one suggestion



@pytest.fixture(scope='function')
def hoomd_snapshot(lattice_snapshot_factory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to call this frame instead of snapshot? Or better to keep consistent with the name of the module? Are there plans to rename this module to frame in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these are hoomd.Snapshot objects. I renamed gsd.Snapshot to gsd.Frame to avoid confusion with hoomd.Snapshot.

@b-butler
Copy link
Member Author

b-butler commented May 9, 2023

@joaander The test errored due to timeout which is currently set at 3000 second or 50 minutes but MPI tests finish in 30 seconds on hodges. Also, MPI tests for the neighbor list are erroring for so reason on hodges. Will look into that.

@joaander
Copy link
Member

joaander commented May 9, 2023

@joaander The test errored due to timeout which is currently set at 3000 second or 50 minutes but MPI tests finish in 30 seconds on hodges. Also, MPI tests for the neighbor list are erroring for so reason on hodges. Will look into that.

I can't look at it right now. Maybe later this week. If you can find a configuration that reproduces the deadlock, I usually debug them by 1) Running the tests with mpirun and wait for the deadlock. 2) Attach a debugger to the processes and see where each rank is waiting.

@b-butler
Copy link
Member Author

pre-commit.ci autofix

@b-butler
Copy link
Member Author

The deadlocking is solved. I forgot to not call gsd functions on all ranks.

Copy link
Member

@joaander joaander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The implementation is very clean. I have a few suggestions to improve the documentation.

frame.dihedral_data,
frame.improper_data,
frame.constraint_data,
frame.pair_data);
}
}

// emit on all ranks, the slot needs to handle the mpi logic.
m_write_signal.emit(m_handle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code was used, it would be a problem for the burst writer. However, this has now been replaced by Logger. I opened #1554 to remove these dead code paths. No need to make any changes in this pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this, I refer to m_write_signal.emit.



@pytest.fixture(scope='function')
def hoomd_snapshot(lattice_snapshot_factory):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these are hoomd.Snapshot objects. I renamed gsd.Snapshot to gsd.Frame to avoid confusion with hoomd.Snapshot.

hoomd/write/gsd_burst.py Outdated Show resolved Hide resolved
hoomd/write/gsd_burst.py Show resolved Hide resolved
hoomd/write/gsd_burst.py Outdated Show resolved Hide resolved
b-butler and others added 2 commits May 19, 2023 15:59
Co-authored-by: Joshua A. Anderson <joaander@umich.edu>
This behavior is not intuitive but required for good performance.
@b-butler
Copy link
Member Author

@joaander I just thought of a potential mitigation of the initial write from the burst writer. We could either

  • provide a method to write similar to GSD.write
  • or show how to get this to run at the very beginning of a simulation with an And trigger.

What do you think?

@joaander
Copy link
Member

Yes, we could encourage users to use GSD.write to write the initial file and then open it in append mode. We could also enforce this behavior by making it an error for BurstWriter to create files or append to files with 0 frames.

Adds option to run immediately upon attaching for those still wishing to
use this.
This improve the ability for writers to immediately upon attaching write
log data depending on one or more computes or tuners.
Copy link
Member

@joaander joaander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some documentation revisions directly to the branch. Please review and make any further edits as needed. It looks like you need to merge in conflicts from trunk-major.

I have one comment on parameter naming to think about. Other than that, this is good to merge.

hoomd/write/gsd_burst.py Show resolved Hide resolved
@b-butler b-butler added the validate Execute long running validation tests on pull requests label Jun 2, 2023
@b-butler b-butler requested a review from joaander June 2, 2023 20:46
Copy link
Member

@joaander joaander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@joaander joaander merged commit fa388ef into trunk-major Jun 5, 2023
72 of 73 checks passed
@joaander joaander deleted the feat/gsd-with-queue branch June 5, 2023 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
validate Execute long running validation tests on pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants