Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: C#-Performance: Use a custom static lookup to handle InvokeGodotClassMethod and HasGodotClassMethod #89826

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

roookeee
Copy link

@roookeee roookeee commented Mar 23, 2024

Preface

This is in an early draft state and by no means complete. I am presenting my working proof of concept to discuss if and how to progress with the main idea of this PR. There are lots of open TODOs, questions and polishing to be done. Please focus on the idea instead of the rough sketch shown here (there is also some debug output which will obviously be removed la<zter on).

Be warned: this is a long one to give readers enough context! If you want a TL;DR scroll to the bottom for the performance numbers.

Introduction

Everytime Godot wants to call a C# script method from C++, the auto-generated InvokeGodotClassMethod is invoked.
The current C# bridge implementation of InvokeGodotClassMethod approximately works like this for a C# script which overrides _Ready and _Process:

protected override bool InvokeGodotClassMethod(in godot_string_name method, NativeVariantPtrArgs args, out godot_variant ret)
{
    if (method == MethodName._Ready && args.Count == 0) {
        _Ready();
        ret = default;
        return true;
    }
    if (method == MethodName._Process && args.Count == 1) {
        _Process(global::Godot.NativeInterop.VariantUtils.ConvertTo<double>(args[0]));
        ret = default;
        return true;
    }
    return base.InvokeGodotClassMethod(method, args, out ret);
}

While this seems pretty o.k. there is a small detail which makes InvokeGodotClassMethod more complicated than it seems: the engine is passing the string "_process" as the method name while the user script InvokeGodotClassMethod checks for "_Process". This case is handled by the C# glue of the builtin Godot classes (e.g. Node2D or Node3D) which all user scripts are inherting from (either directly or indirectly via other user scripts):

// excerpt from the generated Node.cs
// MethodProxyName__* are the PascalCase variants, MethodName._Process is "_process" here
if ((method == MethodProxyName__process || method == MethodName._Process) && args.Count == 1 && HasGodotClassMethod((godot_string_name)MethodProxyName__process.NativeValue))
{
    _Process(VariantUtils.ConvertTo<double>(args[0]));
    ret = default;
    return true;
}
if ((method == MethodProxyName__ready || method == MethodName._Ready) && args.Count == 0 && HasGodotClassMethod((godot_string_name)MethodProxyName__ready.NativeValue))
{
    _Ready();
    ret = default;
    return true;
}

And then there is HasGodotClassMethod here - what does it do? It checks if a method is actually implemented as e.g. Node has to supply an empty stub implementation for _Process(double delta) to make this code compile. But why not just call the stub implementation? Because the C++ -> C# function parameter marshalling done via VariantUtils could be costly, especially for more complex types like arrays - so why marshall if an empty stub is called? This also has to be implemented this way because what happens if a user actually implements both _Process(double) and _process(double) in their script? We can't just auto-proxy "_process" to "_Process" because of this.

At this point we traversed the class hierarchy for a bit and jump back up to the original C# class to call HasGodotClassMethod.
This really isn't cheap and there is another part to this: the engine calls our C# script with "_process", "_enter_tree" etc. no matter if we implement it or not and calling HasGodotClassMethod with an "unimplemented" method name traverses the whole class hierarchy and depending on what class the current C# script is based on, this can be quite expensive - especially for _PhysicsProcess for Nodes which don't override it in C#.
Additionally, every* public C# script method gets added into the if-chain of InvokeClassMethod and HasGodotClassMethod, which indirectly makes all Godot base class (Node etc) method calls slower because of how the snake_case to PascalCase handling is implemented (see above) - this hurts, especially for _Process and _PhysicsProcess.

This makes the current implementation:

  • somewhat slow (see performance numbers later)
  • hard to reason about as a human
  • hard to reason about for the branch predictor

An alternative implementation

I will try to keep this section high level:

Instead of a bunch of ifs we will use a Dictionary<(ParameterCount, MethodName), Delegate> to resolve functions. Adding all the C# bridge details to this yields a wrapper class (called ScriptMethodRegistry for now) and the following snippet for a user script called MainScene which implements _Ready and _Process in C#:

public new static readonly ScriptMethodRegistry<MainScene> MethodRegistry = new ScriptMethodRegistry<MainScene>()
    .Register(MethodName._Ready, 0, (MainScene scriptInstance, NativeVariantPtrArgs args, out godot_variant ret) => 
    {
        scriptInstance._Ready();
        ret = default;
    })
    .Register(MethodName._Process, 1, (MainScene scriptInstance, NativeVariantPtrArgs args, out godot_variant ret) => 
    {
        scriptInstance._Process(global::Godot.NativeInterop.VariantUtils.ConvertTo<double>(args[0]));
        ret = default;
    })
    .Compile();

This still misses the "_process" -> "_Process" mapping for the engine calls which is done via aliases, here is an excerpt from Node.cs:

 public static readonly ScriptMethodRegistry<Node> MethodRegistry = new ScriptMethodRegistry<Node>()
    // remember MethodProxyName__process is the PascalCaseName here
    .AddAlias(Node.MethodProxyName__process, 0, Node.MethodName._Process)

These aliases are then imported which gives us:

public new static readonly MethodRegistry<MainScene> MethodRegistry = new MethodRegistry<MainScene>()
    // this is new, we register our parents registry (aliases and function registrations)
    // our parent registry did the same with Node, which did the same with GodotObject ...
    .Register(global::Godot.Node3D.MethodRegistry)
    .Register(MethodName._Ready, 0, (MainScene scriptInstance, NativeVariantPtrArgs args, out godot_variant ret) => 
    {
        scriptInstance._Ready();
        ret = default;
    })
    .Register(MethodName._Process, 1, (MainScene scriptInstance, NativeVariantPtrArgs args, out godot_variant ret) => 
    {
        scriptInstance._Process(global::Godot.NativeInterop.VariantUtils.ConvertTo<double>(args[0]));
        ret = default;
    })
    .Compile();

This concept fixes all* evil issues:

  • if a script overrides a function that its parent already implemented, it gets overriden because the registry of the parent is added first and then potentially overriden by local .Register definitions
  • if Godot users actually implement _process(double) and _Process(double), the correct method is invoked when the engine passes "_process" as the method name

The .Compile() step is needed because we need to apply the alias logic somewhere.

And thats about it, invoking a function is easy and fast now, a negative response is just as fast which is a big boon (see introduction):

protected override bool InvokeGodotClassMethod(in godot_string_name method, NativeVariantPtrArgs args, out godot_variant ret)
{
    if (MethodRegistry.TryGetMethod(in method, args.Count, out var scriptMethod))
    {
         scriptMethod(this, args, out ret);
         return true;
    }

    ret = new godot_variant();
    return false;
}

HasGodotClassMethod

I went ahead and changed the implementation for HasGodotClassMethod as well because it kind of uses the same pattern and the ScriptMethodRegistry could fulfill its requirements with minor adjustments. We will have to think about other places like InvokeGodotClassStaticMethod, should we apply this pattern there too?

Downsides and caveats

Here is an unordered list of downsides / caveats I can think of:

  • I pointed out that the current implementation is hard to reason about, the proposed changes are not improving this in a significant way and to some people may even seem more complex, which is understandable
  • We have to be careful not to do too much work in a static context as it would negatively impact users (stutters, hanging because of load time initialization); this is especially annoying for Godot users because its hard to manually control static initialization (I think RuntimeHelpers.RunClassConstructor etc. should be avoided in Godot user code). Speaking from personal experience, we would have to do significantly more work for this to become noticable
  • We need one static dictionary per C# script type (no matter if its generated C++ glue classes or user scripts), this increases Godot C#'s memory footprint which we should measure at least once

Known issues

  • Example: A user-script which inherits from Node which provides _Process(int) will erroneously be called as a substitute for _Process(double) because the current aliasing implementation does no type checks. This will be difficult to fix / address. The master version of InvokeGodotClassMethod would only "erroneously" call the stub _Process(double) in Node (because HasGodotClassMethod does no type checks and thus returns true). How would this work in GDScript, what is the expected or acceptable behaviour?
    • This is already complicated in current master if a user defines _Process(int) before defining _Process(double) in their C# script. Overloads are deduplicated in ScriptMethodGenerator by name and argument count and only one of them is emitted into InvokeGodotClassMerthod - _Process(int) in this case. The correct method would still be called when the engine passes "_process" to InvokeGodotMethod though
    • We need a clear stance on method overloading with equal argument counts here, shouldn't we forbid it in general? Right now it feels like undefined behaviour (e.g. it dependent on method declaration order in C#)
  • master handles overloading for methods with the same parameter count by using the first occurence, this PR uses the last occurence

Personal opinion on downsides, caveats, issues and how to deal with them

This is just my personal opinion on how to deal with certain issues, feel free to skip it.

I think we need a clear definition of what the C# integration supports and what is outright illegal, leading to compilation issues (the same as missing the partial identifier in user scripts).
I propose the following new constraints:

  1. same-parameter method overloading for public methods is disallowed
  2. public methods are disallowed to start with _[a-z] (in words: underscore with a lowercase character) as this heavily conflicts with the snake_case to PascalCase resolution which even breaks master at the moment (_process(int) would be called by the engine "_process" call)

Users affected by 1. are already somewhat affected, as master only registers the first overload for signal bindings etc.
Users affected by 2. are on the wrong path anyways, using _ as a prefix to non-engine supplied functions is suspicious on its own, going for lowercase characters afterwards is even more weird.

I think very few users will be affected by these constraints. Right now its hard to gauge what input (user scripts) the C# integration has to accomodate, which is also shown by the lack of extensive tests or documentation on what edge-cases are legal or illegal (the C# integration even silently discards all but one same parameter method overload).
Godots C# integration has good error reporting in place, we should use it more.

Method overloading was also requested in GDScript, but declined for now because of performance concerns.

Performance measurements

I created a test project which spawn 100.000 Nodes with an attached C# script. The attached C# script overrides _Ready and _Process and increments a global variable which is printed on the screen to avoid dead code elimination, which pretty much nullified my previous benchmark results. Additionally the minimum and average process time is printed, which is used in the table below. If you do your own testing with this project, please do multiple runs as the run to run variance (especially in release builds) is quite high, 5 - 10 runs should suffice. As the project is a CPU intensive synthetic scenario you should aim to reduce the noise on your machine (e.g. IDEs indexing stuff, streaming 8k videos of Godot showcases).
HighNodeCountNoDeadCode.zip

On my machine (Win11, NET.Core 7.0.2, CPU: 5800x3D, 120hz display - might matter because of vsync), I get the following results:

Implementation Run type Process Time (ms)
master @ Godot 4.3 Dev 5 Editor "Run project" min: 234,20ms - avg: 239,01ms
This PR Editor "Run project" min: 39,39ms - avg: 42,15ms
master @ Godot 4.3 Dev 5 Official Windows export template min: 16,58ms - avg: 17,67ms
This PR Windows LTO=full export template min: 13,35ms - avg: 15,35ms

(Remember: I took the best run for all variants out of ~4-6 runs after letting the project run ~ 10-15 seconds as the run to run variance is high enough to make a difference here)

The gains for in editor running are significant (>5x).
The gains for release builds are smaller by comparison, around 14-20%.

Having the editor be this much closer to the release template is a goodie for the developer experience, especially for large projects.

I attached a .NET profiler to the release version of the test project and there is still some room to improve the lookup by using a custom hash code, that shaved off at most 1-2ms of process time locally. I will get to these micro-optimizations when we commit to this PR.

TODOs

  • Make the PR build in CI (code style, code gen tests, spell checking)
  • Rename FunctionRegistry to ScriptMethodRegistryand its accompanying classes to *Method* to be in line with other Godot wordings
  • Cleanup TODOs meant for myself
  • Talk to C# binding code owners about open questions, issues and how to solve them (see remaining code TODOs)
  • Check with the C# binding code owners if its ok to make godot_string_name._data visible with internal as everything else breaks apart if this change is declined
  • Discuss if same argument count method overloads in C# scripts should be allowed, they don't work correctly anyways. Emulating the master overload handling (discarding all but one overload, by definition order) would be complicated in this PR
  • Clarify why the HasGodotClassMethod is used in CSharpInstanceBridge._Get without parameter count checks, a getter with parameters makes no sense to me

TODOs if we proceed with this PRs idea

  • What about InvokeGodotClassStaticMethod?
  • Check the assembly size, it must have shrunk a bit
  • Add unit tests for ScriptMethodRegistry as the whole "inheritance" and aliasing part of it is pretty easily testable

Thank you to @paulloz and @raulsntos for helping me in Rocket.Chat / Discord on how to go on with my idea!

@roookeee roookeee requested a review from a team as a code owner March 23, 2024 21:20
@roookeee roookeee marked this pull request as draft March 23, 2024 21:23
@roookeee roookeee force-pushed the function-registry-csharp-bindings-poc branch 3 times, most recently from 8ccbde2 to b3b098a Compare March 23, 2024 21:37
@YeldhamDev YeldhamDev added this to the 4.x milestone Mar 23, 2024
@roookeee
Copy link
Author

I could use some help with fixing the pipeline because the diff isn't really helping me:
image
I checked for LF / CLRF issues but the whole file is LF like other places.

@paulloz
Copy link
Member

paulloz commented Mar 24, 2024

I expect this is a bom issue, double-check the file encoding between utf8 and utf8-bom.
ICYMI, you can run the checks locally using pre-commit if it's easier.

@roookeee
Copy link
Author

roookeee commented Mar 24, 2024

Thanks, I was too lazy to setup the pre-commit hooks it seems (my apologies) - I fixed the BOM issue and will move on to the other broken pipeline steps. May I get a workflow approval for future pipeline runs? Is this approval manual for every push I do for draft PRs?

EDIT: Scratch that, the GH action is running for my forked repo on its own!

@roookeee roookeee force-pushed the function-registry-csharp-bindings-poc branch 4 times, most recently from 0ae33cc to 07a6bdb Compare March 26, 2024 17:10
@roookeee
Copy link
Author

roookeee commented Mar 26, 2024

I gave this one more round of polish, names are more in line with standard Godot wording and I refactored the ScriptMethod delegate to not return bool as that should be handled by InvokeGodotClassMethod - that seemed to have gained me 2-3ms (~5% in relative performance) locally, but I won't edit the main post because I feel like that could be noise from less stuff running in parallel on my machine. The remaining TODOs are about discussions to be had. The main post has been adjusted as well (wording, my opinion on how to tackle the open issues, better examples etc.)

If someone with some spare time and an actual Godot C# project stumbles upon this PR, I would love to hear if this PR breaks or improves stuff for you!

I am adding this comment as I am force pushing at the moment, so the history is kind of lost

…InvokeGodotClassMethod and HasGodotClassMethod
@roookeee roookeee force-pushed the function-registry-csharp-bindings-poc branch from 07a6bdb to f0cf4fc Compare March 27, 2024 00:50
@roookeee
Copy link
Author

roookeee commented Mar 28, 2024

It seems like I miscalculated how much the release template performance would differ from "Editor -> Run Project".
The gains for release builds are much lower than the over 5x gains in the editor, but its still noticable.

Furthermore I had to redo my test project as the benchmark was most likely affected by dead code elimnation (empty _Process method etc), this is fixed now and the main post has been updated with new numbers and a new test project.

Sorry for the back and forth, still need to get the hang of how to do this properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants