-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simulation performance steadily declines #62
Comments
This is likely the same issue as openmm/openmm#4277. I'll try your script and see what I can tell. |
When I first tried to run your script it died, I think from running out of memory. I reduced the box size from 30 to 20 A and tried again. It's now past 100,000 steps and hasn't shown the slightest change in speed. It's completely constant at 3.27 ns/day. What kind of GPU are you running on? How much memory do you have (both main memory and GPU memory)? Does |
Your script doesn't specify which implementation to use, which means it defaults to NNPOps. I tried switching to TorchANI. The results are different in a few ways.
I assume this doesn't match the behavior you see? |
Oh, sorry, this was my mistake. I just rerun the calculation with the script with a 20A waterbox and TorchANI specified. NPT simulation:
NVT simulation
I am running this on a NVIDIA GeForce RTX 3080 Ti with CUDA Version: 12.2. Nothing else is running on this machine, CPU utilization seems 100% and nvidia-smi doesn't show anything suspicious. |
Your script throws an exception. In the line barostate = MonteCarloBarostat(1, unit.atmosphere, temperature) I assume it's supposed to be |
Ah, sorry, that's it:
|
Here is what I get on RTX 4080.
It's the same as before: there's an initial slowdown, but then it gradually speeds up again. I'm not sure what the warning means, or whether it could be related. |
@peastman, this warning is usual with pytorch jit script models. The first runs through the model pytorch measures performance and looks for optimization opportunities. Just for completeness, this is a plot of the performance you posted, @peastman. Which is consistent with what I see on a 4090. |
Thank you for looking into this. I have rebuild my conda environment and applied the patch described here: #50 (comment). Now I see similar performance and performance behavior as reported by @peastman and @RaulPPelaez for a waterbox with 20A edge size — no noticeable performance degradation for the simulation time (~50 ps). However, increasing the box size to 30 A, as in the initially provided simulation script, results in performance degradation as the simulation progresses. |
is this running with NVFuser enabled or disabled? Can you try with NVFuser disabled ?
NVFuser is now NOT the default in newer pytorch: https://discuss.pytorch.org/t/nvfuser-with-torch-compile/187829/2 |
Disabling NVFuser solved the issue! That's great, thank you! |
I ran a pure waterbox simulation within a 25 Angstrom box with
Ani-2x
and thetorchani
implementation. The simulation performance decreased from ~2ns/day to 0.3 ns/day within the first 1ns of simulation time.I've attached a script to reproduce this behavior and the output of the StateReporter. Is there anything that I need to do differently?
waterbox_simulation_1ns.zip
The text was updated successfully, but these errors were encountered: