Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomise/scramble class bindings in production export templates #7063

Open
SysError99 opened this issue Jun 11, 2023 · 40 comments
Open

Randomise/scramble class bindings in production export templates #7063

SysError99 opened this issue Jun 11, 2023 · 40 comments

Comments

@SysError99
Copy link

SysError99 commented Jun 11, 2023

Describe the project you are working on

Many of online games that aim to run on browser, and few of games that I got request on-demands (I'm freelancer).

Describe the problem or limitation you are having in your project

There were multiple discussions on how to protect Godot's source code from malicious actions in Godot's production exports. Ultimately, we don't know much of solutions since Godot is pretty much an open project. There's going to be hackers/researchers that will try to find way and breach it anyway.

From godotengine/godot#59241 (comment), in technical standpoint, it gives substantial amount of source protection, and is quite comparable to what ProGuard is doing in minification mode. However, most of time, Godot is all about calling internal functions and native implementations since it gives better performance, which still needs named references from the main executable. Hence it still gives clues on large part of the code itself. Taking this code for example (snippet from https://kidscancode.org/godot_recipes/3.x/2d/platform_character/index.html)

extends KinematicBody2D

export (int) var speed = 1200
export (int) var jump_speed = -1800
export (int) var gravity = 4000

var velocity = Vector2.ZERO

func get_input():
    velocity.x = 0
    if Input.is_action_pressed("walk_right"):
        velocity.x += speed
    if Input.is_action_pressed("walk_left"):
        velocity.x -= speed

func _physics_process(delta):
    get_input()
    velocity.y += gravity * delta
    velocity = move_and_slide(velocity, Vector2.UP)
    if Input.is_action_just_pressed("jump"):
        if is_on_floor():
            velocity.y = jump_speed

If the idea godotengine/godot#59241 (comment) becomes true, on the compilation state, only speed, jump_speed, gravity, velocity, and get_input will get minified, since they're 'user-defined' labels. Which means, on the decompilation state, assuming that the decompiler is able to determine internal labels, it'd still have a lot of internal references shown below.

extends KinematicBody2D

export (int) var _a = 1200
export (int) var _b = -1800
export (int) var _c = 4000

var _d = Vector2.ZERO

func _e():
    _d.x = 0
    if Input.is_action_pressed("walk_right"):
        _d.x += _a
    if Input.is_action_pressed("walk_left"):
        _d.x -= _a

func _physics_process(delta):
    _e()
    _d.y += _c * delta
    _d = move_and_slide(_d, Vector2.UP)
    if Input.is_action_just_pressed("jump"):
        if is_on_floor():
            _d.y = _b

With the combination of ChatGPT, we could somewhat easily retrieve labels back (I know it's not that difficult to make a guess on this type of snippet but as a showcase)
image

You'll see that it's not difficult, if not trivial, to restore the source code. Although it could be considered out of control without specific implementations from the project owners themselves. However, I know that the entire intent of the idea isn't about obfuscation, but rather giving portability to the source code, and the minification is just a side effect that happens to be an advantage.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Giving that Godot identifies internal calls using strings, we could also take advantage of it to make it incomprehensible by just random all name bindings on the Godot export template binaries**, then we remap all references in user-defined scripts to the randomised internal calls matching the said binary.

Which means, the snippet would similarly become this:

extends _a

export (int) var _aa = 1200
export (int) var _ab = -1800
export (int) var _ac = 4000

var _ad = _i._j

func _ae():
    _ad._g = 0
    if _b._c("walk_right"):
        _ad._g += _aa
    if _b._c("walk_left"):
        _ad._g -= _aa

func _d(delta):
    _ae()
    _ad._h += _ac * delta
    _ad = _e(_ad, _i._k)
    if _b._c("jump"):
        if _f():
            _ad._h = _ab

Now ChatGPT has hard time making guess on labels, which gives much more random results in the end.
image

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

First and foremost, the codebase must cooperate with this technique by indicate which labels are safe and suitable to alter. Basically, all labels that are accessible in Godot's docs. After this, we need to create an automation that re-labels all existing bindings inside the source code itself, then it stores all relabeled names in 'remap table', as example shown below:

        ...
	"x_label": "_Ew",
	"get_x_label": "_Fw",
	"set_x_label": "_Gw",
	"y_label": "_Hw",
	"get_y_label": "_Iw",
	"set_y_label": "_Jw",
	"triangles_updated": "_Kw",
	"BLEND_MODE_INTERPOLATED": "_Nw",
	"BLEND_MODE_DISCRETE": "_Qw",
	"BLEND_MODE_DISCRETE_CARRY": "_Tw",
	"AnimationNodeBlendSpace2D": "_mX",
	"add_node": "_1Tz",
	"connect_node": "_Zw",
	"disconnect_node": "_1x",
	"get_node": "_1I3",
	...

This will be used to remap all names during GDScript exporting process, assuming the GDScript bytecode sequence is something like this.

[OBJ_CALL, OBJ_REF, LOCAL_STACK, 0, "get_node", "MyNode/SubNode"]

The get_node now should be renamed to _1I3 as the remap table has defined.

[OBJ_CALL, OBJ_REF, LOCAL_STACK, 0, "_1I3", "MyNode/SubNode"]

This way will not only help saving space for small portion (because string bindings are now shorter), but also helps providing a generic project obfuscator that everyone could use. The only downside on the user's standpoint is that now they need to manually compile the binary in order to obtain the binary along with the remap table, to be used on the project itself.

If this enhancement will not be used often, can it be worked around with a few lines of script?

This isn't only possible with few lines of code, but needs significant rework on the Godot's source code itself.

Is there a reason why this should be core and not an add-on in the asset library?

The same reason why it doesn't have short-code workarounds, it simply needs attention on the source code itself.

@atirut-w
Copy link

The only downside on the user's standpoint is that now they need to manually compile the binary in order to obtain the binary along with the remap table, to be used on the project itself.

It can probably be done automatically at export-time, but then the remap table would be somewhere inside the executable and someone which enough time could rip them out of a game.

@SysError99
Copy link
Author

SysError99 commented Jun 11, 2023

The only downside on the user's standpoint is that now they need to manually compile the binary in order to obtain the binary along with the remap table, to be used on the project itself.

It can probably be done automatically at export-time, but then the remap table would be somewhere inside the executable and someone which enough time could rip them out of a game.

Nope, it doesn't include remap table in the executable. In fact, entire process is just all about making it unreadable on the end. The remap table solely exists as separated file to be used in project export time and is used to remap labels to match what export template expects, it doesn't need to exist anywhere but in development environment.

On production package, there will essentially be two parts, one is the executable, and a PCK or whatever that scripts get remapped to be compatible with the export template. Remap table is no longer needed in production environment.

@SysError99
Copy link
Author

SysError99 commented Jun 11, 2023

On my test prototype, the process involves hard-coding on the source code itself. The executable has no concept of internal remapping whatsoever. It's all done in compile time and all hard-coded . In the process, it also generates remap table file to be used for Godot project source remapping.

Then during export time, internal labels in GDScript files get translated to what the executable expects (or more like, understand) and get saved. Now you just need to pack two files, without the remap table because the project file is already translated.

@Zireael07
Copy link

How do you propose this be done? If remap code is in Godot repo, then it falls victim to being easy to peek at just like any other open source project...

@fire
Copy link
Member

fire commented Jun 11, 2023

👎 Protecting your code in Godot projects is hard because it's an open-source engine. Even if you change names of things, smart people can still figure out your code. Changing the engine to make this better would be a big job and might cause other problems.

Scientists found a way for AI to understand what unnamed parts (neurons) do in a computer brain (neural network) 1. This could help AI understand code with random names too, making it easier to figure out hidden code.

So, just hiding your code isn't enough. You should also protect your game's online parts and make sure only the right people can use them. And remember, making fun games is important too!

This comment was assisted with AI technologies.

Footnotes

  1. "Understanding the Role of Individual Units in a Deep Neural Network." OpenReview. https://openreview.net/forum?id=NpsVSN6o4ul

@fire
Copy link
Member

fire commented Jun 11, 2023


Crafted by Genie


You

Rename the function follow godot engine's style guide of readable names and clarity.

extends _a

export (int) var _aa = 1200
export (int) var _ab = -1800
export (int) var _ac = 4000

var _ad = _i._j

func _ae():
    _ad._g = 0
    if _b._c("walk_right"):
        _ad._g += _aa
    if _b._c("walk_left"):
        _ad._g -= _aa

func _d(delta):
    _ae()
    _ad._h += _ac * delta
    _ad = _e(_ad, _i._k)
    if _b._c("jump"):
        if _f():
            _ad._h = _ab

Aria

extends Node

export (int) var move_speed = 1200
export (int) var jump_force = -1800
export (int) var gravity = 4000

var velocity = Vector2.ZERO

func update_input():
    velocity.x = 0
    if Input.is_action_pressed("walk_right"):
        velocity.x += move_speed
    if Input.is_action_pressed("walk_left"):
        velocity.x -= move_speed

func _physics_process(delta):
    update_input()
    velocity.y += gravity * delta
    velocity = move_and_slide(velocity, Vector2.UP)
    if Input.is_action_pressed("jump"):
        if is_on_floor():
            velocity.y = jump_force

I have renamed the function and variables according to Godot Engine's style guide for readability and clarity.

@SysError99
Copy link
Author

How do you propose this be done? If remap code is in Godot repo, then it falls victim to being easy to peek at just like any other open source project...

The remap needs to be manually generated on the user's local repo itself, it doesn't get pushed to the main repo. It's all by user's responsibility to use an automated script to compile a fresh copy of binary. If they choose the official export template, this doesn't provide any sort of protection whatsoever.

@SysError99
Copy link
Author

SysError99 commented Jun 12, 2023

-1 Protecting your code in Godot projects is hard because it's an open-source engine. Even if you change names of things, smart people can still figure out your code. Changing the engine to make this better would be a big job and might cause other problems.

Scientists found a way for AI to understand what unnamed parts (neurons) do in a computer brain (neural network) 1. This could help AI understand code with random names too, making it easier to figure out hidden code.

So, just hiding your code isn't enough. You should also protect your game's online parts and make sure only the right people can use them. And remember, making fun games is important too!

This comment was assisted with AI technologies.

Footnotes

  1. "Understanding the Role of Individual Units in a Deep Neural Network." OpenReview. https://openreview.net/forum?id=NpsVSN6o4ul leftwards_arrow_with_hook

As I said earlier, this isn't a full-proof protection (depends on threat model) if they have sufficient resources they don't even need an AI, and the snippet code is just a demonstration of how AI would get confused by just simple all-random strings. It become much more effective if the user uses all default bindings and only exposes few of guessable strings into the code itself, making randomisation much more effective.

I'm actually doing it right now, but it's out of this proposal scope because it involves randomsing everything in the user's code and engine source code itself except GUI strings.

@SysError99
Copy link
Author

@fire Also, this proposal isn't only limited to the bindings itself, but also any of strings that are randomisable. E.g., keywords, logical statements, operator symbols, etc. When things get combinded together, it provides even more random results:

_xi _a

_mb /_ex< _cq _aa } 1200
_mb /_ex< _cq _ab } -1800
_mb /_ex< _cq _ac } 4000

_cq _ad } _i*_j

_qr _ae/<*
    _ad*_g } 0
    _mm _b*_c/>walk_right><*
        _ad*_g {} _aa
    _mm _b*_c/>walk_left><*
        _ad*_g -} _aa

_qr _d/_xx<*
    _ae/<
    _ad*_h {} _ac * _xx
    _ad } _e/_ad? _i*_k<
    _mm _b*_c/>jump><*
        _mm _f/<*
            _ad*_h } _ab

This is what ChatGPT has said. It's surprisingly accurate, sure, but the vector context is completely missed out. Function name is also now missed.
image

Better yet, also random \t and \n, now ChatGPT even refuses to restore the code:

_xi _a^_mb /_ex< _cq _aa } 1200^_mb /_ex< _cq _ab } -1800^_mb /_ex< _cq _ac } 4000^^_cq _ad } _i*_j^^_qr _ae/<*^\\_ad*_g } 0^\\_mm _b*_c/>walk_right><*^\\\\_ad*_g {} _aa^\\_mm _b*_c/>walk_left><*^\\\\_ad*_g -} _aa^^_qr _d/_xx<*^\\_ae/<^\\_ad*_h {} _ac * _xx^\\_ad } _e/_ad? _i*_k<^\\_mm _b*_c/>jump><*^\\\\_mm _f/<*^\\\\\\_ad*_h } _ab

@SysError99
Copy link
Author

SysError99 commented Jun 12, 2023

I should also repeat, it all 'depends' on threat model. If the hacker is willing enough to also assist AI on how to make a guess on strings, it should be breached out rather easily.

But again, leaving it open just because someone is going to hack it anyway is still a bad attitude since almost every other game engines have this type of protection on first start, and still not considered it a bad practice to do so.

@fire
Copy link
Member

fire commented Jun 12, 2023

As a developer on the engine I believe any such code is bloat and I do not support adding randomizing / scrambling class bindings into Godot Engine.

I am not the only developer, so maybe there are good reasons.

@AThousandShips
Copy link
Member

AThousandShips commented Jun 12, 2023

A thing that is pretty important to consider here that I haven't seen mentioned is this, you will have to chose between either:

  1. Use static typing for all variables
  2. Use the same abbreviated name for all method and property access on non-exact typed variables

Because for foo.bar() the name bar depends entirely on what type it is, and this cannot be known at compile time except in specific contexts with a lot of analysis

This can seriously weaken the efficacy of this method, or force even more complexity for the attempts to reduce this issue (like making individual renamings for each source file or even each method), I'd also say that it can become extremely weak to improvements to AI as pointed out as well, especially with type analysis like if a certain type only has one export of a certain type, or the massive reduction in things to pick from in built in properties or methods based on types, especially as GDScript doesn't allow functions with different signatures

Given the rate of growth of AI in recent years I'm pretty certain this kind of method is going to be completely obsolete in a few years, and even for low end setups it'll be trivial to break within a decade

Now you can make it a bit more powerful if you do a complete scrambling, replacing not just every copy of a word with the same one, but each individual position maps to a unique name, as well as scrambling constants, but that's just gonna result in a mapping file that's easily 10%-20% of the source which feels pretty ridiculous.

And all that being said, the usefulness of scrambling meaningful names to prevent working out what it does is limited, and both humans and AI can work out by reverse engineering, it only makes it a bit more difficult, and it doesn't really stop people from reusing the code while not fully understanding the intent as long as they understand the function

@AThousandShips
Copy link
Member

AThousandShips commented Jun 12, 2023

Now for my personal views, take them for what they are, my personal views:
I do not believe, or support, any dedicated DRM methods in core, save for encryption, or methods that have other (primary) purposes, such as minification, compilation and optimization, transpiling, etc, that helps or enables DRM, but isn't aimed directly at it

I believe that anyone that wants DRM is best helped by a plugin, partially because it can be improved in efficacy by having it be closed source, at least partially, and in-house. While security through obscurity is generally a bad idea, in this case it improves security compared to having the algorithms for DRM not be readily available for training on.

Further, even currently it's pretty simple to handle the scrambling as a plugin, not even using a compiled module, given export and loading options, and writing a compiled module would reduce any overhead

So both from a personal, philosophical perspective I think this is best served as not in core, but also from a practical perspective, and I don't think it's something we should spend effort on putting in core, especially as I feel that a large part of the userbase are likely to use it, and arguably shouldn't as it'd involve people using it with relatively limited knowledge and using pretty standard setups which could weaken security for everyone using it

Edit: I didn't call this proposal obfuscation, my wording was an oversight in that case, I reused the word since I had talked about obfuscation before, my bad, I was talking about the obfuscation of not having the source that does the scrambling open, and I realised I used the wrong phrasing, should have been obscurity

Also, another consideration:
How hard is it to simply extract the de-scrambled source code from the program itself, as in by memory sniffing etc.? If that is simple enough it kind of negates the whole usefulness of this

@SysError99
Copy link
Author

SysError99 commented Jun 12, 2023

Now for my personal views, take them for what they are, my personal views: I do not believe, or support, any dedicated DRM methods in core, save for encryption, or methods that have other (primary) purposes, such as minification, compilation and optimization, transpiling, etc, that helps or enables DRM, but isn't aimed directly at it

I believe that anyone that wants DRM is best helped by a plugin, partially because it can be improved in efficacy by having it be closed source, at least partially, and in-house. While security through obfuscation is generally a bad idea, in this case it improves security compared to having the algorithms for DRM not be readily available for training on.

Further, even currently it's pretty simple to handle the obfuscation as a plugin, not even using a compiled module, given export and loading options, and writing a compiled module would reduce any overhead

So both from a personal, philosophical perspective I think this is best served as not in core, but also from a practical perspective, and I don't think it's something we should spend effort on putting in core, especially as I feel that a large part of the userbase are likely to use it, and arguably shouldn't as it'd involve people using it with relatively limited knowledge and using pretty standard setups which could weaken security for everyone using it

Put it in simple words, all things of this technique is doing is just making Godot talks in a new, completely random language, and remap table is like a vocabulary table that only exists to help vanilla Godot Editor translates user's code to what the custom export template expects. It doesn't solve DRM issue, but rather assisting it.

There are already reports that Godot games got cracked by simply swapping DRM-protected executable with vanilla executable, rendering DRMs completely useless, or even some games that got completely ripped from another project and got stack up with more assets, stages, alternated node positions, more injected code, etc., which is just too severe to let it pass. This technique helps solving the issue by making all custom export templates completely incompatible to any of others by default. It solves the issue on the asset swap, and useless DRM problems, considering no matter how sophisticated DRM gets implemented, if user's Godot code is still compatible with vanilla Godot, it's still trivial to crack even by script kiddies, which isn't possible with this technique without complete project reformatting.

And my personal take, I don't believe that majority of Godot users might even been aware that their Godot projects are completely unprotected and are trivially breached, unlike other game engines with similar build process, which could also due to the assumption that Godot would 'compile' the code and make it not easily readable (which is true in certain extent, such as in Godot 3's gdc, but still is easily decompiled with all labels preserved). Or even if ones who had been aware of it actually went in rage that Godot team doesn't take this factor into an account and make their projects vulnerable since other game engines actually do it by default. However I won't put a blame to Godot team like them since it's still considered an out-of-control situation.

Also, I don't aim or believe that obfuscation provides security (at least in this particular case). There's nothing that stops users to read the code and try to understand and alternate it, but it become another story if there's a way to make it much more difficult and more time-consuming to do such a thing, especially in online and constant update scenarios. There's definitely a difference between trying to edit a plain text, easy-to-read code (which is what Godot is doing right now), and trying to edit a hard to read, scrambled code, which skilled coders/hackers would do, and it's what this proposal is trying to solve.

On the footnote, this proposal actually provides more than just an 'obfuscation'. Since hacking techniques need to interface with certain function calls/addresses in the project, this technique actually causes binary offset to be alternated, especially if we also swap around properties and method calls in binding phases, causes remap table reconstructing much more difficult since it's not predictable anymore, and makes hacking tools much more difficult to maintain on rolling releases since they need to keep track on entry points that keep changing on every releases.

@Calinou
Copy link
Member

Calinou commented Jun 12, 2023

This would require recompiling export templates for each and every project you wish to export, which makes it a no-no from a usability perspective. We cannot expect users to be compiling export templates in most cases (even if they use custom C++ modules, as they may be using prebuilt export templates provided by the module developer).

PCK encryption already requires this and it's a big usability hurdle.

@SysError99
Copy link
Author

SysError99 commented Jun 12, 2023

This would require recompiling export templates for each and every project you wish to export, which makes it a no-no from a usability perspective. We cannot expect users to be compiling export templates in most cases (even if they use custom C++ modules, as they may be using prebuilt export templates provided by the module developer).

PCK encryption already requires this and it's a big usability hurdle.

It's the same reason why PCK encryption is an option on the first place. It's not going to be an option that everyone must use, but PCK encryption still makes a cut on the feature list anyway despite providing weak protection.

@AThousandShips
Copy link
Member

AThousandShips commented Jun 13, 2023

I'm not convinced any kind of source protection like this short of actively scrubbing the source code from the running project will give any real protection, and I don't even mean that it might be simple to reverse it with the right tools. Can you confirm that it isn't trivial to just dump the process memory and extract the de-scrambled code from the dump? Because AFAIK the script instance retains the source code.

I'm not suggesting it is trivial, I have no idea how hard it is, but as the old comparison goes:
If the door is weaker than the lock, why pick the lock

If the process of dumping the memory is easier than un-scrambling, then people will just do that instead

@SysError99
Copy link
Author

SysError99 commented Jun 14, 2023

I'm not convinced any kind of source protection like this short of actively scrubbing the source code from the running project will give any real protection, and I don't even mean that it might be simple to reverse it with the right tools. Can you confirm that it isn't trivial to just dump the process memory and extract the de-scrambled code from the dump? Because AFIK the script instance retains the source code.

I'm not suggesting it is trivial, I have no idea how hard it is, but as the old comparison goes: If the door is weaker than the lock, why pick the lock

If the process of dumping the memory is easier than un-scrambling, then people will just do that instead

If I understand your statement right, in the final product, the unscrambled labels aren't even anywhere to be found in the export template. Even with memory dumping process it isn't much things to see since all labels will get alternated in hard code way. In other words, it doesn't even get translated and remapped in readable forms inside the binary. Remap table ONLY EXISTS IN EXPORTING PROCESS, which will be used for vanilla Godot to translate all function calls to what export template expects, since the export template only understands scrambled labels (the export template doesn't even have readable labels integrated because it's no longer required, it just directly reads scrambled labels). Even with memory dump they will still see the same scrambled labels as it's shown on the binary itself. Simply put, it mimicks what native code compiler is doing, because named labels aren't required in order for the program to run, it's just best to not have it on the first place.

@SysError99
Copy link
Author

SysError99 commented Jun 14, 2023

I already done this type of scrambler before but not all labels are altered. I could get it done in few months as a showcase and open source it. The first prototype I done is pretty much the scrambler itself trying its best to random labels and trying not to randomise crucial labels that may cause unexpected behaviour to happen (e.g., third party file parsers/importers no longer working).

I probably won't gonna make it working on all Godot versions since working as single person is difficult. But at least in version 3.5 and upcoming 3.6 could be a good candidate because it's only version that is matured on HTML5 platform, so scrambled and shortened labels would give more noticeable size changes on the export template itself.

@AThousandShips
Copy link
Member

AThousandShips commented Jun 14, 2023

So you are suggesting something to operate between the source and the compiler so that the source stored in a script will still be scrambled? I'm sorry I thought from your description this was to be used when loading scripts, and would depend on changes to core that I didn't see mentioned my bad

However, a number of calls internally are not using compile time labels and would not be possible to hard map, as said above with ambiguous cases, so you'd be forced to use static typing, AFAIK

@YuriSizov
Copy link
Contributor

YuriSizov commented Jun 14, 2023

The export template itself must retain original labels because a lot of Godot functionality is inherently string-based. Same goes for any meta level algorithms operating on the names of classes, methods, or members. You would not be able to convert them in the source scripts, so your export template needs to support both scrambled and original labels for an arbitrary Godot project to work.

@Zireael07
Copy link

... which kinda defeats the point here @YuriSizov

I was kinda hoping this might work as many JS libraries do that sort of obfuscation and it does indeed stop script kiddies

@YuriSizov
Copy link
Contributor

YuriSizov commented Jun 14, 2023

Well, JS would have the same problems, since you can do something like MyObject["method_name"](), which a static analyzer won't be able to recognize (especially if the name is in a variable, set elsewhere, and not a string literal). So what we can do to user scripts is on the same level as those JS libraries that you mention.

Obfuscation libraries also cannot modify keywords which is proposed here. They have other techniques to make code unreadable but runnable, but the essentials of the language remain, because they cannot modify their runtime. It's in the browser/node/whatever. @SysError99 suggests that we do modify the runtime in Godot by recompiling the export template. And for some projects it might work, but given the level of reflection Godot API allows or even requires, making it work for arbitrary project would be impossible.

So in other words, if you want to use it, you'd need to avoid any bit of logic that relies on strings or on class/method/member names. If you do, then the export template has to keep the original names to be able to run your code.

@SysError99
Copy link
Author

SysError99 commented Jun 14, 2023

Well, JS would have the same problems, since you can do something like MyObject["method_name"](), which a static analyzer won't be able to recognize (especially if the name is in a variable, set elsewhere, and not a string literal). So what we can do to user scripts is on the same level as those JS libraries that you mention.

Obfuscation libraries also cannot modify keywords which is proposed here. They have other techniques to make code unreadable but runnable, but the essentials of the language remain, because they cannot modify their runtime. It's in the browser/node/whatever. @SysError99 suggests that we do modify the runtime in Godot by recompiling the export template. And for some projects it might work, but given the level of reflection Godot API allows or even requires, making it work for arbitrary project would be impossible.

So in other words, if you want to use it, you'd need to avoid any bit of logic that relies on strings or on class/method/member names. If you do, then the export template has to keep the original names to be able to run your code.

This is what I envisioned as well. On my first prototype requires string analyser to find any of strings that could possibly be binders, and utilises global remap table that also shares between internal bindings and user-defined labels to eliminate dynamically-typed language object-field access problems. At runtime it works flawlessly, but has great risk of it alternating unwanted strings. E.g., OK is a valid field in GDScript, but is also a common word. There were some occasions that those words get replaced with randomised strings.

Fortunately, in function calls Godot actually takes NodePath but also accepts common strings and convert them internally. Not only that, we could also make the analyser a little bit smarter by check for certain rules that help distinguishing between common strings and NodePath type of strings, either by checking the NodePath syntax directly (a best method), check for whitespaces, common NodePath syntaxes, 'too darn common' keywords, etc. This helps a lot to minimise the chance of strings getting wrongly replaced. Plus, on the custom template, we could also give users a warning to explicitly use NodePath bindings for much safer typing, and moreover, a lot safer to do string alternation.

@YuriSizov
Copy link
Contributor

This helps a lot to minimise the chance of strings getting wrongly replaced.

If you start replacing strings, you will mess user projects. An example of the top of my head: the learning app from GDQuest uses method names from tester classes to generate names for tests which are displayed in the user interface. You just can't replace that.

Of course that is a bit beside the point as it doesn't involve built-in methods, but you can easily imagine a scenario where you want to display/print/log the built-in method name or use it for something that needs to be readable.

@SysError99
Copy link
Author

SysError99 commented Jun 15, 2023

This helps a lot to minimise the chance of strings getting wrongly replaced.

If you start replacing strings, you will mess user projects. An example of the top of my head: the learning app from GDQuest uses method names from tester classes to generate names for tests which are displayed in the user interface. You just can't replace that.

Of course that is a bit beside the point as it doesn't involve built-in methods, but you can easily imagine a scenario where you want to display/print/log the built-in method name or use it for something that needs to be readable.

I'm not so concerned about applications that need internal string bindings when the scenario is clear enough. Since it only performs in custom export templates, the default/official still wouldn't get affected anyway.

I should also elaborate that in production phase, there are already scenarios that debug logging can't be comprehended easily because internal names are randomised/obfuscated. Application users need to explicitly submit the log to developer and troubleshoot problems already. If the use case is clear enough, the default export template should work.

@SysError99
Copy link
Author

Noting that this proposal may or may not work, also highly depends on the team's 'target audiences'. Since I saw numerous complaints that Godot projects are highly vulnerable to get alternated and re-published, especially from ones that started working on bigger projects. While other individuals I assumed to may not be aware that these are problems before or just got suddenly struck with the problem (there were posts on Reddit that complained their Godot games got stolen with few lines of code and assets alternated). We could put a poll if general audiences actually see it as a problem or not, but ones that do lean towards aggression all the time. I'm not putting this as a 'must solve' problem but just said those cases as a gist of it.

I give this proposal zero hopes if it's getting into green light, since the problem is must easier to say than done, and also there were already debates that if the source 'obfuscation' actually works, while in practice it's been done all time in most of production scenarios.

@phil-hudson
Copy link

Noting that this proposal may or may not work, also highly depends on the team's 'target audiences'. Since I saw numerous complaints that Godot projects are highly vulnerable to get alternated and re-published, especially from ones that started working on bigger projects. While other individuals I assumed to may not be aware that these are problems before or just got suddenly struck with the problem (there were posts on Reddit that complained their Godot games got stolen with few lines of code and assets alternated). We could put a poll if general audiences actually see it as a problem or not, but ones that do lean towards aggression all the time. I'm not putting this as a 'must solve' problem but just said those cases as a gist of it.

I give this proposal zero hopes if it's getting into green light, since the problem is must easier to say than done, and also there were already debates that if the source 'obfuscation' actually works, while in practice it's been done all time in most of production scenarios.

+1 to this. It is indeed disconcerting for those who have turned to Godot and discovered that the source code is essentially bundled in plaintext with APKs. This concern holds particular weight for games heavily reliant on offline functionality, where the majority of features and data are directly stored in GD script. It becomes discouraging to invest significant effort when the source code can be easily extracted from the APK and the app subsequently resold.

This drawback significantly hampers the practicality of using Godot for commercial purposes. Users generally anticipate a basic level of code protection as a minimum requirement. Otherwise, it feels as if our private GitHub repositories are wide open for anyone to access. While it is true that obfuscated code can still be reconstructed, it adds an additional layer of effort and serves as a deterrent for some attempts. I believe this level of protection is what most Godot users would expect and appreciate.

@AThousandShips
Copy link
Member

The appropriate way of solving this should be to not export GDScript as text, or offer the option not to, as in compiling, or transpiling, this not only complicates reworking and reverse-engineering, or just plain extracting the source, but also reduces size of the exported project by quite a bit, and generally improves load times and performance.

As noted before here this is fully possible to do as a plugin already, and while I understand the concern and inability to sue people for infringement, the idea that this is a major issue because people can resell your app is kinda unwarranted, just as how people can't just extract the images or meshes or sound from your app and resell it.

@SysError99
Copy link
Author

@AThousandShips see #4220 (comment)

@AThousandShips
Copy link
Member

Thank you, that is a relevant argument, but also covered by the same reply, and I was specifically responding to the claim of people reselling an app

@RIKDXHQ11

This comment was marked as off-topic.

@AThousandShips
Copy link
Member

AThousandShips commented Jun 26, 2023

In other conversations, the administrator kept saying that I was venting my emotions

TBH you are, you aren't providing any new information, you're just pointing out issues that have already been raised. Please listen to feedback from maintainers and stop making these kinds of comments, you have already been warned #4220 (comment)

@RIKDXHQ11

This comment was marked as spam.

@SysError99
Copy link
Author

SysError99 commented Jun 27, 2023

@RIKDXHQ11 I understand your concern and frustration but this is not about criticism issue and NOT related to GD/CPP transpiler issues (see #3069 for related topic). In this issue we discuss on the solution that mimics what compiler is doing.

On footnote, Godot isn't made by a company and do not owe anything. As the contributor said earlier, if the tool isn't working for you, please consider something else. There are tools out there that might serve you better. If the tool works great for you but only lack of this option, forking it or making third-party tools for it is always an option.

@RIKDXHQ11

This comment was marked as spam.

@AThousandShips
Copy link
Member

AThousandShips commented Jun 29, 2023

But I never think I'm wrong.

No one is trying to

You need to stop spamming this issue though, be reasonable and don't be immature please

@SysError99
Copy link
Author

SysError99 commented Jun 29, 2023

This could be off-topic and I will be very appreciated if this get marked as it.

I'm also not a westerner. In fact, I'm also in a pretty much quite an urban area in southeast Asia, but let's push it to the other side.

You might be surprised that most of time 'obfuscated' code, don't even come up in native binaries. Many of commercial games written in Unity don't even mainly use IL2CPP for the purpose of hiding code, since decompilers can easily obtain labels from IL calls from C# assemblies (it doesn't run all natively, it still needs C# Interpreter), and only game logic that's considerable 'obfuscated' or rather 'optimised'. Certain types of software can also easily reconstruct native binaries back to somewhat readable code. Ghidra, the tool from NSA, does actually target all-native code so it isn't exactly safe either. I'll take Hoyoverse games for example, they don't use C# or even IL2CPP for their game logic, they pretty much made a custom version of Lua interpreter to hide code, and write most of game logic from there, because using machine code to hide logic isn't really viable.

GD scripts, like Python, are terrible things. Their existence may seem convenient, but it actually adds a lot of work and is meaningless.

It's not. Machine code binaries are very difficult to debug for non-CS savvies, and needs a lot of internal understandings of how computer works. It's not for everyone. Even if GD/CPP magically happens tomorrow, and it actually is easy to write code on and use it, users still need to know how the transpiler works since the outcome from the transpiler might not be as easily comprehensible because of how native code works. In the opposite, interpreted languages can pretty much solve all issues above. It makes the software much more portable, and easily to be applied to other works (since it's pretty much readable). Better yet, if the intention is to hide code, you can also modify the interpreter to specifically read the 'scrambled' code (which is what this proposal is aiming for) and help the work even more than the native code itself because hackers need to take care of two separated parts in order to reverse back the code into readable form, and also the option that Hoyoverse games are officially using.

As for this timestamp, I personally and strongly believe that Godot 4.x is still pretty much in very immature state, since 80 percent of core features that I wanted from this version are still in very broken state. Contributors already got their hands full of work in order to get the software working as what of most users wanted. Plenty of 'regressions' happened and they are yet to be fixed. Hundreds and thousands of pull requests that need to be reviewed. I even glad that some contributors even considered godot-proposals to be under attention considering how Godot 4.x made them so busy with works. I even stated myself that my own proposal isn't a 'must-solve' issue. I know it's unfair for small devs that wanted code protection, but at the moment Godot team doesn't have enough manpower to try and look up at the issue.

Last note. Please, at least, understand how software development as a team works.

@heppocogne
Copy link

In my opinion:

  1. It may make hard to solve bugs appear only in release build.
  2. Godot API does not need to be protected. It's overkill to randomize API bindings.
  3. It would have a negative impact on performance.

Anyway, I strongly hope the encryption procedure will become somehow easier in the future.

@SysError99
Copy link
Author

SysError99 commented Jul 3, 2023

  1. It may make hard to solve bugs appear only in release build.
  2. Godot API does not need to be protected. It's overkill to randomize API bindings.

This is intentional side effect because we wanted to intentionally break it in order to make the official build no longer compatible with the custom export, and only the main developer would be able to help troubleshooting the problem (since they own the remap table in development environment). There was a problem that Godot games are 'too easily' to be cracked by just swapping DRM-protected executable with official build, and the game will work just fine, and this would help solving the issue for pretty much longer than just trying to encrypt games (single weakpoint). And this is pretty much not a way to protect Godot APIs, it's just trying to break compatibility between official build and custom build.

  1. It would have a negative impact on performance.

I don't think so, unless Godot team wrote the code with very hard-coded string size, it might still leave a lot of room for null spaces. All this proposal does is to randomise names and it also gets shorter. It should have zero impact on both positively and negatively on performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests