Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better precision and Performance with Runge-Kutta-4 integrator #121

Merged
merged 11 commits into from Jan 4, 2018

Conversation

fat-lobyte
Copy link
Collaborator

Up until now, Trajectories used the Verlet integration scheme. This worked kind of alright, however it required a rather small step size, which lead to rather hefty performance penalty.
To alleviate this, the force cache was introduced that sped up the force calculation, but came with its own drawback of decreased precision.

Using a more advanced integration Scheme like the Runga-Kutta 4 Integration allows a much higher integration step (improving performance) while maintaining precision.

There are now 4 invocations of the force function instead of one, but this allowed us to increase the stepsize from 0.1s to 2.0s, resulting in a 2.0 / 0.1 / 4 = 5-fold performance increase, while maintaining about the same (cache-less) precision.
This allows users to disable the cache for no or a small perfomance loss for a high increase in precision.

The step size is now variable and can be changed in the code as Trajectories.Settings.IntegrationStepSize or in the settings by adding <float name="IntegrationStepSize">2.0</float> to /GameData/Trajectories/Plugin/PluginData/Trajectories/config.xml.

What I think should be done before merging this PR:

  • Testing! Please download and test in various situations.
  • Determining the actual performance cost compared to the Verlet (old, in master) integrator. From my testing, using the RK4 without Cache is about 1.5x slower than using Verlet with Cache.

What I think should be done after merging this PR:

  • Add a slider for the IntegrationStepSize in the Settings UI
  • Add settings and switching Code to chose the old (Verlet) or the RK-4 integrator, and the option to add more integrators
  • In the near future, I would like to try a little optimization pass. Additionally, I would like to try and limit the refresh rate of the trajectory calculation.
    We really need the result at most 0.5 times a second, unless you're doing KOS things.
  • I would like to try implement an Adaptive method like Dormand-Prince 4/5 that adapts the step size to the highest value when changes are little, and to a lower value if there are many changes. This could increase the performance even further which will allow us to completely ditch the cache.

A compiled version of this branch can be found here:

https://github.com/fat-lobyte/KSPTrajectories/releases/download/v1.7.2-pre_integrator1/Trajectories.dll

@fat-lobyte
Copy link
Collaborator Author

fat-lobyte commented Dec 28, 2017

Here is the precision comparison.

The columns correspond to the situations that were tested (represented by the .sfs quicksave files in the master branch).

The rows correspond to the Integration methods that were tested. Verlet is the old integrator, RK-4 is the new integrator. I tried out 3 different Integration steps: 3.0s, 2.0s and 0.5s.

The cells show the deviation in kilometers, where:

  • Overshooting is potitive (further downrange than predicted)
  • Falling short is negative (sooner than predicted)
  kerbin-half kerbin-flat eve-quarter eve-flat duna-half
Verlet [dt=0.1] 2,5 47 -2,1 15 4,4
Verlet [dt=0.1] w. cache -1,5 -403 -5 -415 4,9
RK-4 [dt=2] 0,4 16,7 -4,2 -2,6 3,7
RK-4 [dt=2] w. cache -3,7 -513 -7,2 -520 4,1
RK-4 [dt=3] 0,2 14,9 -4,2 3 3,9
RK-4 [dt=0.5] 0,3 18,2 -4,2 -2,8 3,7

As you can see, the RK-4 with an IntegrationStepSize of 2.0 yields about the same or better results as Verlet with a 0.1 stepsize. Decreasing the stepsize did not help much with precision.

@fat-lobyte
Copy link
Collaborator Author

Here is the performance comparison.

The columns correspond to the situations that were tested (represented by the .sfs quicksave files in the master branch).

The rows correspond to the Integration methods that were tested. Verlet is the old integrator, RK-4 is the new integrator. I tried out 3 different Integration steps: 3.0s, 2.0s and 0.5s.

The cells show the performance impact as shown in the Settings-Tab of the Trajectory window, in Milliseconds [ms].

On an Intel Core i5 4690K with a boost clock of 3.9 GHz (4.74 GFLOPS per Core), these are the results:

  kerbin-half kerbin-flat eve-quarter eve-flat duna-half
Verlet [dt=0.1] 20,8 19,7 20,1 20,3 19,9
Verlet [dt=0.1] w. cache 2,9 2,9 2,8 2,8 2,9
RK-4 [dt=2] 4 3,9 3,9 3,7 3,9
RK-4 [dt=2] w. cache 2,4 2,6 2,2 2,4 2,6
RK-4 [dt=3] 2,9 2,9 2,7 2,9 2,9
RK-4 [dt=3] w. cache 2,5 2,8 1,5 2,1 2,6
RK-4 [dt=0.5] 16,3 15,8 15,8 15,5 15,9
RK-4 [dt=0.5] w. cache 3 3 2,6 2,9 2,9

On an Intel Core i5 3320M with a maximum clock of 3100 MHz, these are the results:

  kerbin-half kerbin-flat eve-quarter eve-flat duna-half
Verlet [dt=0.1] 36,2 34 32 33,5 36,2
Verlet [dt=0.1] w. cache 3,7 3.2 3 3,3 3,2
RK-4 [dt=2] 7,8 7,5 7 7,5 7,8
RK-4 [dt=2] w. cache 2,5 2,6 2,3 2,7 2,7
RK-4 [dt=3] 5,4 5 4,4 4,8 4,9
RK-4 [dt=3] w. cache 2,6 2,7 2,5 2,3 3,4
RK-4 [dt=0.5] 31,5 25,5 28,5 29,5 30,8
RK-4 [dt=0.5] w. cache 4,1 3,7 3,3 3,1 3,6

The performance benefit without cache and approximately the same precision is between 4,4 and 5,2 times.

Unfortunately, the difference between Verlet with Cache and RK-4 without cache are still a bit too much for comfort, here the cache-less RK-4 can be about 2 times slower than the Verlet with cache on slower systems.

Whether or not this is acceptable is a more difficult question, but personally I would always prefer the much higher precision without the cache to a small performance hit. What are your opinions?

With this PR merged, I will probably start suggesting to turn off the cache most of the time to users.

@fat-lobyte fat-lobyte merged commit bd1dcaa into neuoy:master Jan 4, 2018
@fat-lobyte
Copy link
Collaborator Author

Merged due to lack of negative feedback, a.k.a. "works for me".

@fat-lobyte fat-lobyte deleted the integrator branch January 4, 2018 14:58
@PiezPiedPy
Copy link
Contributor

I've had no problems with it and got a performance boost on my system.
Forgot to post, with new year and all I completely forgot, Sorry :(

@fat-lobyte
Copy link
Collaborator Author

No worries, glad that you could look at it :)

@PiezPiedPy
Copy link
Contributor

Got a question regarding the accelerationFunc lambda
Would it not be better to change this to a Local function (new to C#7) since calling a local function is cheaper, possibly increasing performance even further.

@fat-lobyte
Copy link
Collaborator Author

Got a question regarding the accelerationFunc lambda
Would it not be better to change this to a Local function (new to C#7) since calling a local function is cheaper, possibly increasing performance even further.

Thanks for bringing this feature to my attention! I checked it out, and it seems the performance is actually a tiny bit worse.

This is probably because the main advantage of a local function would be the lack of the creation of a delegate object - but we do need to create a delegate object! The RK4Step() function takes a function delegate and uses it to evaluate the acceleration at different positions.

Of course, theoretically it would be possible to manually inline everything into the AddPatch() function to save any kind of function call overhead. But TBH, I would actually prefer to go the opposite way, namely modularizing the Code more and more, to reduce all of the Monster-sized functions into more managable chunks.

@PiezPiedPy
Copy link
Contributor

I checked it out, and it seems the performance is actually a tiny bit worse.

Thanks for checking.

we do need to create a delegate object! The RK4Step() function takes a function delegate and uses it to evaluate the acceleration at different positions.

Thanks for letting me know, I didn't look too deeply into how the calculations were performed.

I would actually prefer to go the opposite way, namely modularizing the Code more and more, to reduce all of the Monster-sized functions into more managable chunks.

I totally agree with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants