Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and benchmark Particle #2296

Closed
wants to merge 8 commits into from

Conversation

jngrad
Copy link
Member

@jngrad jngrad commented Oct 7, 2018

This PR is not meant to be merged (yet).

Following our exchange on #2239, I've developed a benchmarking suite to measure the integration time of pre-equilibrated particle simulations. The suite currently provides scripts for a LJ gas, a LJ liquid, a P3M saline solution and a P3M ionic gas, and runs them using 1/2/4/8/16 cores, 1k/10k particles per core, and 3 versions of myconfig.hpp.

The benchmarks are activated in CMake with -DWITH_BENCHMARKS=ON. To add your own benchmarks, use /src/maintainer/benchmarks/lj.py as a template and add a new line in /src/maintainer/benchmarks/CMakeLists.txt. Pass option --visualize to lj.py to start OpenGL and visualize the simulation.

Execute /maintainer/benchmarks/suite.sh to run the benchmarks. The script loops over various myconfig.hpp files and measures the impact of including more features in the struct Particle. The script also loops over commits in the git history to measure the performance of various implementations of struct Particle. The results are stored in benchmarks.csv, where the relevant fields are:

  • commit: git commit (currently testing the original struct Particle implementation, a raw pointer implementation and a unique_ptr implementation)
  • config: version of myconfig.hpp (currently testing minimal, default and maxset)
  • script: benchmark file (currently running lj.py and p3m.py)
  • arguments: space-separated list of arguments passed to the benchmark
  • cores: MPI cores (1 if mpiexec was not used)
  • MPI: True if mpiexec was used, False otherwise
  • mean: mean execution time of a single integration step (seconds)
  • ci: 95% confidence interval for the mean

Debug fields:

  • steps_per_tick: sample size of the mean
  • duration: duration of the simulation (seconds), should be 1-2 min long
  • E1, E2, E3: energy values from the final state of the simulations (to check the benchmark is reproducible)

The benchmark takes 20 hours on my workstation, you can get the raw results here (I'll run it on bee next week). The pointer implementation of struct Particle runs 23% slower than the original implementation, for both raw and smart pointers. Using the original implementation and default features as a baseline, the increase in integration time can be broken down as follows:

implementation minimal default maxset
original 91% 100% 175%
raw pointers 111% 130% 217%
unique pointers 110% 128% 215%

The pointer implementation reduces the struct size from 928 to 208 bytes. Adding more features increases the execution time in the same way no matter the implementation. We need to change that by re-shuffling the members of Particle substructs to minimize the slow-down.

Apply the C++11 Rule of Five: default constructor, copy constructor,
move constructor, copy assignment operator, move assignment operator,
destructor.
@codecov
Copy link

codecov bot commented Oct 7, 2018

Codecov Report

Merging #2296 into python will increase coverage by <1%.
The diff coverage is 71%.

Impacted file tree graph

@@           Coverage Diff           @@
##           python   #2296    +/-   ##
=======================================
+ Coverage      71%     71%   +<1%     
=======================================
  Files         380     380            
  Lines       18938   19039   +101     
=======================================
+ Hits        13588   13661    +73     
- Misses       5350    5378    +28
Impacted Files Coverage Δ
src/core/bonded_interactions/harmonic.hpp 85% <ø> (ø) ⬆️
src/core/nonbonded_interactions/ljcos.hpp 96% <ø> (ø) ⬆️
src/core/cells.hpp 100% <ø> (ø) ⬆️
src/core/nonbonded_interactions/buckingham.hpp 100% <ø> (ø) ⬆️
src/core/bonded_interactions/quartic.hpp 0% <ø> (ø) ⬆️
src/core/nonbonded_interactions/morse.hpp 100% <ø> (ø) ⬆️
src/core/nonbonded_interactions/soft_sphere.hpp 100% <ø> (ø) ⬆️
src/core/nonbonded_interactions/ljgen.hpp 100% <ø> (ø) ⬆️
src/core/nonbonded_interactions/ljcos2.hpp 100% <ø> (ø) ⬆️
src/core/bonded_interactions/harmonic_dumbbell.hpp 0% <ø> (ø) ⬆️
... and 85 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76aefca...56860fe. Read the comment docs.

Resources in struct Particle are now owned by pointers, excepted
struct ParticlePosition which has to be contiguous in memory.
Resources in struct Particle are now owned by unique_ptr's, excepted
struct ParticlePosition which has to be contiguous in memory.
Benchmarking suite to measure the integrator execution time under
various conditions: build features, struct Particle implementation,
number of cores, number of particles, interaction types.
Revert resource management edits for the ParticleForce member
of struct Particle.
@jngrad
Copy link
Member Author

jngrad commented Oct 8, 2018

As suggested by @RudolfWeeber and @fweik, the ParticleForce member is now contiguous:

struct Particle {
  ParticlePosition r;
  ParticleForce f;
  std::unique_ptr<ParticleProperties> p;
  std::unique_ptr<ParticleLocal> l;
  std::unique_ptr<ParticleMomentum> m;
  /* followed by feature-enabled members */
};

Using a subset of the tests (LJ gas and P3M saline solution), the slow down is now 16% instead of 23%. Adding more features still affects performance:

implementation minimal default maxset
original 91% 100% 182%
raw pointers 114% 132% 228%
unique pointers 113% 131% 226%
ParticleForce 106% 119% 209%

@RudolfWeeber RudolfWeeber added this to the Espresso 4.1 milestone Oct 15, 2018
Create a struct ParticleExtended to group together all unique_ptr
defined in struct Particle.
The short range loop has to access charges and types frequently.
@jngrad
Copy link
Member Author

jngrad commented Nov 7, 2018

Moved charge and type to the main Particle struct to make the short range loop evaluate faster. LJ liquid is still 10% slower, LJ gas is 20% slower. Electrostatics is now slightly faster. The benchmarks run without GHOSTS_HAVE_BONDS.

LJ gas (dens=0.02) LJ liquid (dens=0.50)
implementation minimal default maxset minimal default maxset
original 91% 100% 169% 87% 100% 182%
current 111% 118% 188% 100% 111% 191%

P3M gas P3M solvated
implementation minimal default maxset minimal default maxset
original 97% 100% 133% 89% 100% 159%
current 97% 101% 129% 86% 95% 151%

@fweik
Copy link
Contributor

fweik commented Nov 21, 2018

@jngrad could you please put the benchmarks into a separate pull request? I think we can merge those quickly and there is already interest to extend them.

@jngrad jngrad mentioned this pull request Nov 22, 2018
@RudolfWeeber
Copy link
Contributor

@jngrad, could you please run the corrected benchmarks on the various particle implementations again?

@jngrad
Copy link
Member Author

jngrad commented Dec 6, 2018

LJ (vol. fraction=0.02) LJ (vol. fraction=0.50) P3M (vol. fraction=0.25)
implement. min def. max min def. max min def. max
original 91% 100% 175% 87% 100% 195% 93% 100% 152%
current 108% 118% 195% 94% 107% 196% 91% 94% 133%
pr2394 73% 83% 127% 80% 99% 157% 92% 105% 136%

Using commits c1d850e, 6fb0487, 345fd4b, discarding LJ 10k 0.02 on 8 cores.

@fweik fweik added the Core label Jan 28, 2019
@KaiSzuttor KaiSzuttor removed this from the Espresso 4.1 milestone Jul 15, 2019
bors bot added a commit that referenced this pull request Aug 9, 2019
3061: Avoid unneeded reallocations of ghost data r=jngrad a=fweik

So I looked at profiles of the particle exchange code, and noticed
some things... In some situations the can improve the performance
considerably, so I think this should go into the release.
This has some overlap with #2296, but the approach is similar, 
and this is slightly cleaner and can be merged.
It's slightly hackish, but I think this is the best we can do with
the current `Cell` data structure.

Description of changes:
 - Realloc ghost particles only if needed
 - Fixed memleak in resort code
 - Fixed corner-case of Utils::List::resize
 - Removed exclusions from ghosts


Co-authored-by: Florian Weik <fweik@icp.uni-stuttgart.de>
bors bot added a commit that referenced this pull request Aug 9, 2019
3061: Avoid unneeded reallocations of ghost data r=jngrad a=fweik

So I looked at profiles of the particle exchange code, and noticed
some things... In some situations the can improve the performance
considerably, so I think this should go into the release.
This has some overlap with #2296, but the approach is similar, 
and this is slightly cleaner and can be merged.
It's slightly hackish, but I think this is the best we can do with
the current `Cell` data structure.

Description of changes:
 - Realloc ghost particles only if needed
 - Fixed memleak in resort code
 - Fixed corner-case of Utils::List::resize
 - Removed exclusions from ghosts


3067: Document bonded potentials r=KaiSzuttor a=jngrad

Fixes #3049

Description of changes:
- bonded IAs listed in #3049 are now documented in both Sphinx and Doxygen
- cleaned up outdated docstrings
- re-shuffled paragraphs in [7. Bonded interactions](http://espressomd.org/html/doc/inter_bonded.html)

Co-authored-by: Florian Weik <fweik@icp.uni-stuttgart.de>
Co-authored-by: Jean-Noël Grad <jgrad@icp.uni-stuttgart.de>
@KaiSzuttor
Copy link
Member

@jngrad can this be closed for now?

@jngrad
Copy link
Member Author

jngrad commented Oct 24, 2019

Yes, this has derived too much. It will be easier to restart directly from python.

@jngrad jngrad closed this Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants