Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimised pair_lj_charmm_coul* code #124

Merged
merged 3 commits into from Jul 27, 2016
Merged

Optimised pair_lj_charmm_coul* code #124

merged 3 commits into from Jul 27, 2016

Conversation

ibethune
Copy link
Collaborator

@ibethune ibethune commented Jul 27, 2016

Some small optimisations to the pair_lj_charmm_coul compute() functions, principally storing reciprocal values to avoid excessive floating point divisions. In my tests (for a system dominated by these pair interactions), it gave a 4% speedup.

Before:

Loop time of 120.903 on 1 procs for 1000 steps with 7018 atoms

Performance: 5.717 ns/day, 4.198 hours/ns, 8.271 timesteps/s
99.3% CPU use with 1 MPI tasks x no OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 103.35     | 103.35     | 103.35     |   0.0 | 85.48
Bond    | 1.095      | 1.095      | 1.095      |   0.0 |  0.91
Kspace  | 4.3415     | 4.3415     | 4.3415     |   0.0 |  3.59
Neigh   | 7.2393     | 7.2393     | 7.2393     |   0.0 |  5.99
Comm    | 1.1778     | 1.1778     | 1.1778     |   0.0 |  0.97
Output  | 0.0020359  | 0.0020359  | 0.0020359  |   0.0 |  0.00
Modify  | 2.9979     | 2.9979     | 2.9979     |   0.0 |  2.48
Other   |            | 0.6963     |            |       |  0.58

After:

Loop time of 116.697 on 1 procs for 1000 steps with 7018 atoms

Performance: 5.923 ns/day, 4.052 hours/ns, 8.569 timesteps/s
99.4% CPU use with 1 MPI tasks x no OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 99.448     | 99.448     | 99.448     |   nan | 85.22
Bond    | 1.0763     | 1.0763     | 1.0763     |   0.0 |  0.92
Kspace  | 4.0912     | 4.0912     | 4.0912     |   0.0 |  3.51
Neigh   | 7.3013     | 7.3013     | 7.3013     |   0.0 |  6.26
Comm    | 1.1464     | 1.1464     | 1.1464     |   0.0 |  0.98
Output  | 0.00031209 | 0.00031209 | 0.00031209 |   0.0 |  0.00
Modify  | 2.9559     | 2.9559     | 2.9559     |   0.0 |  2.53
Other   |            | 0.6776     |            |       |  0.58

@akohlmey
Copy link
Member

FYI, there is quite a bit more speedup possible when optimizing data access and making invariant if statements constant at compile time via templates. see the modifications done in the OPT and USER-OMP package (when compiling USER-OMP without OpenMP enabled, you should pretty much get what is in OPT or better).

@akohlmey akohlmey merged commit 46f034d into lammps:lammps-icms Jul 27, 2016
@ibethune ibethune deleted the optimised_pair_lj branch July 27, 2016 12:44
jtclemm pushed a commit to jtclemm/lammps that referenced this pull request Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants