Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bugs with kernel (re)compiling when the global device got clear and reinitialized #1752

Merged
merged 2 commits into from Jan 8, 2020

Conversation

@ndtrung81
Copy link
Contributor

ndtrung81 commented Oct 31, 2019

Summary

This PR is to fix the bugs caused by the last PR #1735 when the global device is cleared (e.g., triggered by the clear command in the input script) and then initialized for a new run, as reported by @rbberger in #1748.

Related Issues

fixes #1748

Author(s)

Trung Nguyen (Northwestern)

Licensing

By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).

Backward Compatibility

Yes

Implementation Notes

Initialize ucl_device with NULL in the constructor of the base classes and then check if it is changed from the last value in init_atomic() after the device is (re)initialied: if so, recompile the kernels, otherwise skip the kernel recompilation.

Note: the OpenCL build still show a gradual increase in memory allocation if multiple consecutive runs are called. massif shows this is associated with libopencl.so, which is not so helpful. The recommended practice with this case (OpenCL builds and multiple consecutive runs) is to add "pre no" to the run command.

Note: the base classes (BaseAtomic, BaseCharge, etc.) need to be refactored to reduce duplicated codes.

Post Submission Checklist

Please check the fields below as they are completed after the pull request has been submitted. Delete lines that don't apply

  • The feature or features in this pull request is complete
  • Licensing information is complete
  • Corresponding author information is complete
  • The source code follows the LAMMPS formatting guidelines
  • The feature has been verified to work with the conventional build system
  • The feature has been verified to work with the CMake based build system
…ed and then reinitialized
Copy link
Member

rbberger left a comment

example still crashes

@ndtrung81

This comment has been minimized.

Copy link
Contributor Author

ndtrung81 commented Nov 4, 2019

@rbberger The example does not crash on my machine.

quadro-gp100.txt

gtx-1050.txt

Is there a way to check that we are compiling and running the same version of the source code?

@ndtrung81

This comment has been minimized.

Copy link
Contributor Author

ndtrung81 commented Nov 13, 2019

@rbberger sorry I still don't understand why the example crashed with the latest changes for the jenkins regression. I have tried the changes on my machines, or applied them manually to the master branch, and it runs through in either cases. Is there anything I can do to resolve the issue? @akohlmey do you have any suggestions to move things forward? Thanks!

@ndtrung81 ndtrung81 requested a review from akohlmey Nov 26, 2019
@ndtrung81

This comment has been minimized.

Copy link
Contributor Author

ndtrung81 commented Nov 26, 2019

@akohlmey @rbberger Please advise me how to proceed with this PR. Thanks!

@akohlmey

This comment has been minimized.

Copy link
Member

akohlmey commented Nov 26, 2019

@akohlmey @rbberger Please advise me how to proceed with this PR. Thanks!

don't have time and access to suitable resources to look into this myself over the next few weeks.

@akohlmey akohlmey dismissed rbberger’s stale review Jan 6, 2020

I cannot reproduce the problem either

@akohlmey akohlmey requested a review from rbberger Jan 8, 2020
@akohlmey akohlmey assigned akohlmey and unassigned ndtrung81 Jan 8, 2020
@akohlmey akohlmey requested a review from sjplimp Jan 8, 2020
@akohlmey akohlmey mentioned this pull request Jan 8, 2020
9 of 10 tasks complete
@akohlmey

This comment has been minimized.

Copy link
Member

akohlmey commented Jan 8, 2020

@sjplimp @rbberger @ndtrung81 since i cannot reproduce the reported remaining crashes on my machine, we are out of options to track this down by ourselves and have effectively the choice between two not perfect version. So it seems to me it would be best to merge this anyway and hope for getting some reports from other users or not(!) and then track things down with that input.
To stick with our poilicies, though, I need some approval for merging it.

@sjplimp
sjplimp approved these changes Jan 8, 2020
@akohlmey akohlmey merged commit d3eed9d into lammps:master Jan 8, 2020
6 checks passed
6 checks passed
lammps/pull-requests/cmake/cmake-kokkos-omp-pr head run ended
Details
lammps/pull-requests/cmake/cmake-serial-pr head run ended
Details
lammps/pull-requests/kokkos-omp-pr head run ended
Details
lammps/pull-requests/openmpi-pr head run ended
Details
lammps/pull-requests/serial-pr head run ended
Details
lammps/pull-requests/shlib-pr head run ended
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.