Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of trace rays #105

Closed
szapp opened this issue Feb 24, 2017 · 3 comments
Closed

Improve performance of trace rays #105

szapp opened this issue Feb 24, 2017 · 3 comments

Comments

@szapp
Copy link
Owner

szapp commented Feb 24, 2017

Trace rays for focus and distance/obstacle collection are fired continuously every frame. Benchmarking showed that the freeAimRay function takes (worst-case) 100+ ms on a weak machine. Doing this each frame is a crime on performance.
The approach to return the previous results from the trace ray/focus collection every nth frame, instead of recalculating it every frame, failed, because Gothic's internal focus collection does not happen every frame either causing a discrepancy: Trace ray recalculation and internal focus update do no coincide, such that a new focus can never be captured by the trace ray (as it will be overwritten by the previous frames' results).
Task:
Either (1) find where Gothic's internal focus collection happens exactly, and hook at that address directly, or (2) force recalculation when the focus has changed.

@szapp szapp changed the title Improve performance Improve performance of trace rays Feb 24, 2017
@szapp
Copy link
Owner Author

szapp commented Feb 24, 2017

If implementing (1) which should be most performance friendly, define the trace ray results as global variables, to make them accessible from different functions without the need to call the function.
Approach:

  1. Hook void __thiscall oCNpc::CollectFocusVob(int) 0x733A10 and other candidates and print to zSpy every time the function is called, while also printing to zSpy from freeAimRay when Gothic updated the focus. This will help in finding the function which is really responsible for collecting the focus without much reading the engine functions themselves.
  2. Once the focus collection function is identified, try to find a suitable address within to overwrite the collected focus. Test the validity of the function and the hook by overwriting the focus with zero to check if it stays zero (otherwise there is another function setting the focus vob).
  3. Call a modified version of freeAimRay from that hooked address, where the results are stored in global variables, which can be accessed from anywhere any point in time.

@szapp
Copy link
Owner Author

szapp commented Feb 24, 2017

Investigation of engine functions:
void __thiscall oCAIHuman::CheckFocusVob(int) 0x69B7A0 is called every frame (even in the menu, but not during animations like weapon switching). If there is no focus, it calls void __thiscall oCNpc::CollectFocusVob(int) 0x733A10. If there is a focus, however, it will stay the focus until it dies or the angles become to big(?).

@szapp
Copy link
Owner Author

szapp commented Feb 25, 2017

The approach (1) did give some insights, but was not the way to go because it wouldn't resolve the problem of performance. The real issue turned out to be the function freeAimAnimation which was hooking the engine function int __thiscall oCAniCtrl_Human::InterpolateCombineAni(float float int) 0x6B6170 awkwardly, having freeAimAnimation called multiple times per frame.
The engine function is used for other purposes as well. The solution is to instead hook int __thiscall oCAIHuman::BowMode(int) 0x695F00 at the correct address (0x696296) where oCAniCtrl_Human::InterpolateCombineAni is called.

szapp added a commit that referenced this issue Feb 25, 2017
The cause for the performance drops was an awkwardly hooked function
(see #105) which caused the trace ray machinery to be run ten times per
freame during aiming in ranged combat. This is now resolved (tenfold
speed-up).
Performance can be further boosted by a new entry in the ini file which
introduces an adjustable recalculation frequency in milliseconds.
By default the trace ray/focus collection is recalculated at each frame.
When increasing the interval (to up to 500 ms) the focus collection
becomes less precise, but might boost the performance slightly on weak
machines (powerful machines will not benefit from this setting and
should keep the value close to zero).
Caution: Increasing this value beyond 45 ms will introduce a stutter
in spells like blink that continually visualize the aim vob.
@szapp szapp closed this as completed Feb 25, 2017
@szapp szapp added this to the v1.0.0 milestone Jun 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant