You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running a 1D laser plasma simulation with both field and collisional ionizations using 2048 patches and 1024 cores. While the initial speed of the run is okay, it slows down dramatically over time. A general question is how to identify this performance issue and find a way to improve it.
What I have looked at so far is the size of the restart dumps of each processor and the timing in the profil.txt. In the restart files, I see that one processor has much bigger restart dump compared to other processors, this seems to suggest a load balance issue, but I am not sure how to get more information about the load changes over time and how to better adjust the load balance setting.
Currently I have the following for load balancing, where Nt is the time steps per laser cycle.
Hi,
I am running a 1D laser plasma simulation with both field and collisional ionizations using 2048 patches and 1024 cores. While the initial speed of the run is okay, it slows down dramatically over time. A general question is how to identify this performance issue and find a way to improve it.
What I have looked at so far is the size of the restart dumps of each processor and the timing in the profil.txt. In the restart files, I see that one processor has much bigger restart dump compared to other processors, this seems to suggest a load balance issue, but I am not sure how to get more information about the load changes over time and how to better adjust the load balance setting.
Currently I have the following for load balancing, where Nt is the time steps per laser cycle.
LoadBalancing( initial_balance = True, every = 100*Nt, cell_load = 1., frozen_particle_load = 0.1, )
The other thing I notice from the profil.txt file is that the beginning of each restart run seems to be much faster
--- Timestep = 0 x Main.print_every = ---
Time Min Avg Max SD
Particles 1.637e-04 1.733e-01 4.040e+00 2.088e-01
Collisions 1.664e-04 4.148e-01 6.812e+00 4.791e-01
Sync Particles 2.407e-04 7.793e-02 1.080e+01 4.609e-01
Sync Densities 2.300e-04 6.779e-02 1.091e+01 4.902e-01
--- Timestep = 1 x Main.print_every = ---
Time Min Avg Max SD
Particles 3.161e-04 3.433e-01 8.079e+00 4.149e-01
Collisions 2.512e-04 8.292e-01 1.362e+01 9.577e-01
Sync Particles 4.485e-04 1.532e-01 2.160e+01 9.350e-01
Sync Densities 3.696e-04 1.349e-01 2.182e+01 9.801e-01
--- Timestep = 2 x Main.print_every = ---
Time Min Avg Max SD
Particles 4.639e-04 5.130e-01 1.212e+01 6.209e-01
Collisions 3.340e-04 1.243e+00 2.043e+01 1.437e+00
Sync Particles 6.336e-04 2.346e-01 3.239e+01 1.460e+00
Sync Densities 5.157e-04 2.020e-01 3.272e+01 1.470e+00
compared to the end of each restart run,
--- Timestep = 4868 x Main.print_every = ---
Time Min Avg Max SD
Particles 8.328e-01 8.206e+02 1.737e+04 9.543e+02
Diagnostics 2.349e+01 1.547e+02 3.697e+02 2.185e+01
Collisions 3.930e-01 2.014e+03 2.907e+04 2.267e+03
Sync Particles 1.016e+00 3.785e+02 4.599e+04 2.039e+03
Sync Densities 8.644e-01 3.581e+02 4.666e+04 2.123e+03
--- Timestep = 4869 x Main.print_every = ---
Time Min Avg Max SD
Particles 8.329e-01 8.208e+02 1.737e+04 9.544e+02
Diagnostics 2.349e+01 1.547e+02 3.698e+02 2.185e+01
Collisions 3.931e-01 2.014e+03 2.907e+04 2.267e+03
Sync Particles 1.016e+00 3.786e+02 4.599e+04 2.039e+03
Sync Densities 8.645e-01 3.581e+02 4.666e+04 2.124e+03
--- Timestep = 4870 x Main.print_every = ---
Time Min Avg Max SD
Particles 8.331e-01 8.209e+02 1.737e+04 9.546e+02
Diagnostics 2.349e+01 1.547e+02 3.698e+02 2.185e+01
Collisions 3.932e-01 2.014e+03 2.907e+04 2.268e+03
Sync Particles 1.016e+00 3.787e+02 4.599e+04 2.039e+03
Sync Densities 8.647e-01 3.582e+02 4.666e+04 2.124e+03
This occurs for each restart run. Is there an explanation for this behavior?
Thanks, I will any recommendation you may have!
The text was updated successfully, but these errors were encountered: