Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDScript: Reintroduce binary tokenization on export #87634

Merged
merged 2 commits into from Feb 9, 2024

Conversation

vnen
Copy link
Member

@vnen vnen commented Jan 26, 2024

This adds back a function available in 3.x: exporting the GDScript files in a binary form by converting the tokens recognized by the tokenizer into a data format.

It is enabled by default on export but can be manually disabled. The format helps with loading times since, the tokens are easily reconstructed, and with hiding the source code, since recovering it would require a specialized tool. Code comments are not stored in this format.

The --test command can also include a --use-binary-tokens flag which will run the GDScript tests with the binary format instead of the regular source code by converting them in-memory before the test runs.

Besides the regular option to export GDScript as binary tokens, this also includes a compression option on top of it. The binary format needs to encode some information which generally makes it bigger than the source text. This option reduces that difference by using Zstandard compression on the buffer.

@MichaelWengren
Copy link

But what happened with intermediate representation proposal, is there is any work done there? Why bring back this useless tokenization if there is zero protection to the source code because gdsdecomp can recover everything in a second?

@arkology
Copy link
Contributor

Besides the regular option to export GDScript as binary tokens, this
also includes a compression option on top of it. The binary format
needs to encode some information which generally makes it bigger than
the source text. This option reduces that difference by using Zstandard
compression on the buffer.

Would be great to see exported projects size comparison

@vnen
Copy link
Member Author

vnen commented Jan 27, 2024

To be clear, this is unrelated to the IR proposal. The proposal itself will take some time to solidify because it needs approval from the maintainers and not everyone think it's feasible.

This PR is a palliative to avoid having plain source code export while we wait for that or figure out something else, because a lot of people has been complaining about this issue.

@MichaelWengren
Copy link

This PR is a palliative to avoid having plain source code export while we wait for that or figure out something else, because a lot of people has been complaining about this issue.

But returning tokenization does not solve this problem because scripts are still easily opened with gdsdecomp tool.

@AThousandShips
Copy link
Member

But returning tokenization does not solve this problem because scripts are still easily opened with gdsdecomp tool.

That won't change in any real way with any solution, any compiled/parsed/processed format will be vulnerable to that, it's just a matter of degrees, the biggest improvement will be this over plain source to the vast majority of "attacks"

@MichaelWengren
Copy link

any compiled/parsed/processed format

You can't run decompiled code because it's gibberish but in the case of GDScript tokenization you absolutely can revert it back to it's original form and run it without any issues, this is why this tokenization is useless.
I had high hopes for intermediate representation format, but after almost two months it turned out that no work had even begun

@AThousandShips
Copy link
Member

AThousandShips commented Jan 27, 2024

I'd suggest reading up on this and getting a realistic perspective on this, if you genuinely believe that decompilation isn't a realistic way of reverse engineering code then you are not going to have realistic expectations of the outcomes of these things

but after almost two months it turned out that no work had even begun

Please be realistic about the time it takes to undertake such a major project, and the explanations for the expectations on it above by vnen... Just because it hasn't happened yet doesn't mean it won't, especially with just a few months

If you'd rather wait longer with no changes and no improvements for performance then that's your opinion, but I don't think it's reasonable to wait for a feature that doesn't clash with this just to get a bit better obfuscation (not that obfuscation is the main goal here anyway)

(Also if you're going to block me you will not react to my comments please, that's just petty, I don't want to have to block you back...)

@fire
Copy link
Member

fire commented Jan 29, 2024

Reducing loading times and applying zstd compression is good reason to reintroduce binary tokenization on its own merits.

I don't know if I'll have much success reviewing the code.

@@ -84,6 +90,7 @@ class EditorExportPreset : public RefCounted {
bool enc_directory = false;

String script_key;
int script_mode = MODE_SCRIPT_BINARY_TOKENS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think compressed is probably better default.

Copy link
Member

@reduz reduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my end.

@reduz
Copy link
Member

reduz commented Feb 6, 2024

@MichaelWengren you can decompile and modify anything, be it .net or anything else.
Eventually instead of an IR, what will most likely happen is GDScript files optionally compiled to binary (C) per platform. This will make it pretty impossible to see or modify the code already.

@vnen
Copy link
Member Author

vnen commented Feb 8, 2024

Updated to set the compressed mode as the default.

Comment on lines +1392 to +1404
// Script export parameters.

VBoxContainer *script_vb = memnew(VBoxContainer);
script_vb->set_name(TTR("Scripts"));

script_mode = memnew(OptionButton);
script_vb->add_margin_child(TTR("GDScript Export Mode:"), script_mode);
script_mode->add_item(TTR("Text (easier debugging)"), (int)EditorExportPreset::MODE_SCRIPT_TEXT);
script_mode->add_item(TTR("Binary tokens (faster loading)"), (int)EditorExportPreset::MODE_SCRIPT_BINARY_TOKENS);
script_mode->add_item(TTR("Compressed binary tokens (smaller files)"), (int)EditorExportPreset::MODE_SCRIPT_BINARY_TOKENS_COMPRESSED);
script_mode->connect("item_selected", callable_mp(this, &ProjectExportDialog::_script_export_mode_changed));

sections->add_child(script_vb);
Copy link
Member

@akien-mga akien-mga Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is GDScript specific code spilling into the main editor, which wouldn't be relevant if the GDScript module is disabled. So it's slightly breaking encapsulation.

But it's not a first, and I'm not sure it's worth adding a complex abstraction for this so it can be handled in the module for each script language individually. Your call if you think it's worth looking into.

For now I think this doesn't need to prevent merging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though about this but I'm not sure of what would be the best solution. This is the same way it is handled in 3.x.

Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some accidental reverts, see above

@vnen
Copy link
Member Author

vnen commented Feb 8, 2024

Some accidental reverts, see above

I thought git push --force-with-lease would prevent this. I'll have to be more careful from now on.

This adds back a function available in 3.x: exporting the GDScript
files in a binary form by converting the tokens recognized by the
tokenizer into a data format.

It is enabled by default on export but can be manually disabled. The
format helps with loading times since, the tokens are easily
reconstructed, and with hiding the source code, since recovering it
would require a specialized tool. Code comments are not stored in this
format.

The `--test` command can also include a `--use-binary-tokens` flag
which will run the GDScript tests with the binary format instead of the
regular source code by converting them in-memory before the test runs.
Besides the regular option to export GDScript as binary tokens, this
also includes a compression option on top of it. The binary format
needs to encode some information which generally makes it bigger than
the source text. This option reduces that difference by using Zstandard
compression on the buffer.
@akien-mga akien-mga merged commit 77af6ca into godotengine:master Feb 9, 2024
16 checks passed
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants