Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add typed array support for binary serialization #78219

Merged

Conversation

dalexeev
Copy link
Member

@dalexeev dalexeev commented Jun 14, 2023

@tool
extends EditorScript

func print_buffer(buffer: PackedByteArray) -> void:
    var hex := buffer.hex_encode()
    var res: String
    for i in range(0, mini(hex.length(), 64), 8):
        if i > 0:
            res += " "
        res += hex.substr(i, 8)
    print(res)

func _run() -> void:
    var a: Array = []
    var b: Array = [256]
    var c: Array[int] = [128]
    var d: Array[RefCounted] = [RefCounted.new()]
    var e: Array[MyClass] = [MyClass.new(111)]
    
    print_buffer(var_to_bytes(a))
    print_buffer(var_to_bytes(b))
    print_buffer(var_to_bytes(c))
    print_buffer(var_to_bytes_with_objects(d))
    print_buffer(var_to_bytes_with_objects(e))
    
    print(var_to_bytes_with_objects(e).hex_encode())
    
    print(bytes_to_var(var_to_bytes(a)))
    print(bytes_to_var(var_to_bytes(b)))
    print(bytes_to_var(var_to_bytes(c)))
    print(bytes_to_var_with_objects(var_to_bytes_with_objects(d)))
    print(bytes_to_var_with_objects(var_to_bytes_with_objects(e))[0].x)
    
    print(bytes_to_var_with_objects(var_to_bytes_with_objects(MyClass.new(222))).x)

For builtins, it is possible to optimize the size by removing the 4 bytes of type that are repeated before each element. And the element type can be encoded in the array type instead of separate 4 bytes.

I'm not sure if OBJECT_AS_ID can be used for arrays.

For scripts see:

if (script.is_valid()) {
String resource_text = String();
if (p_encode_res_func) {
resource_text = p_encode_res_func(p_encode_res_ud, script);
}
if (resource_text.is_empty() && script->get_path().is_resource_file()) {
resource_text = "Resource(\"" + script->get_path() + "\")";
}
if (!resource_text.is_empty()) {
p_store_string_func(p_store_string_ud, resource_text);
} else {
ERR_PRINT("Failed to encode a path to a custom script for an array type.");
p_store_string_func(p_store_string_ud, class_name);
}
} else if (class_name != StringName()) {

Production edit: closes godotengine/godot-roadmap#68

@dalexeev dalexeev requested a review from a team as a code owner June 14, 2023 10:47
@akien-mga akien-mga added this to the 4.1 milestone Jun 15, 2023
@akien-mga
Copy link
Member

Needs rebase to fix CI.

@dalexeev dalexeev force-pushed the core-typed-arrays-bin-serialization branch from 8dac994 to 28b6390 Compare June 17, 2023 11:07
@akien-mga akien-mga modified the milestones: 4.1, 4.2 Jun 23, 2023
@akien-mga
Copy link
Member

Moving to 4.2 as this seems a bit risky to merge just before RC1, even if it was approved today. Could be considered for a cherry-pick if it's safe enough and not breaking.

@akien-mga akien-mga added the cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release label Jun 23, 2023
@dalexeev
Copy link
Member Author

I think this shouldn't be cherry-picked, since the array flags are not checked in master, and the type data will be recognized as the size (and elements) of the array.

@akien-mga akien-mga removed the cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release label Jun 23, 2023
@akien-mga
Copy link
Member

Thanks for the input. How should we handle the data compatibility between 4.1 and 4.2 with this PR merged?

@dalexeev
Copy link
Member Author

I don't know. 4.2 will be able to read 4.1 data (which stores typed arrays as untyped), but 4.1 will not be able to read 4.2 data correctly. Backward compatibility is respected, but forward compatibility is broken. This is OK in the case of save files, but can be a problem in the case of multiplayer games if the users' clients are using different versions of Godot. On the other hand, due to bug #69215 this should not be very common.

I think it's acceptable for a minor release if we add a warning to the 4.2 release notes? We could mitigate the impact by adding checks for array flags now or in later 4.1.x patch releases (without implementing support for typed arrays). But the problem will remain for all previous releases (e.g. 4.0.3) and non-upgraded clients.

Preserving forward compatibility (avoiding array type being recognized as size or elements) looks like an unrealistic or very hacky solution to me.

core/io/marshalls.cpp Outdated Show resolved Hide resolved
core/io/marshalls.cpp Outdated Show resolved Hide resolved
@dalexeev dalexeev force-pushed the core-typed-arrays-bin-serialization branch from 28b6390 to 86c2ddf Compare August 22, 2023 18:54
core/io/marshalls.cpp Outdated Show resolved Hide resolved
ERR_FAIL_COND_V_MSG(!path.is_resource_file(), ERR_INVALID_DATA, "Invalid script path: '" + path + "'.");
Ref<Script> script = ResourceLoader::load(path);
ERR_FAIL_COND_V_MSG(script.is_null(), ERR_INVALID_DATA, "Can't load script at path: '" + path + "'.");
obj->set_script(script);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tool EditorScript doesn't work, but normal Node works.

bool GDScript::can_instantiate() const {
#ifdef TOOLS_ENABLED
return valid && (tool || ScriptServer::is_scripting_enabled());
#else

@dalexeev dalexeev force-pushed the core-typed-arrays-bin-serialization branch from 86c2ddf to 7116158 Compare August 23, 2023 18:03
core/io/marshalls.cpp Outdated Show resolved Hide resolved
@dalexeev dalexeev force-pushed the core-typed-arrays-bin-serialization branch from 7116158 to c9db0ea Compare August 24, 2023 11:34
@dalexeev
Copy link
Member Author

Should we add OBJECT_AS_ID support for typed arrays? I'm not sure how it works and what it's for. I tried to do this 2 months ago, but I didn't succeed.

Copy link
Collaborator

@Faless Faless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems mostly okay, but the PR breaks encoding and decoding of built-in scripts (see my comment below).

core/io/marshalls.cpp Outdated Show resolved Hide resolved
core/io/marshalls.cpp Outdated Show resolved Hide resolved
core/io/marshalls.cpp Outdated Show resolved Hide resolved
@Faless
Copy link
Collaborator

Faless commented Feb 28, 2024

See this MRP for the issue:

broken_builtin.zip

@dalexeev dalexeev force-pushed the core-typed-arrays-bin-serialization branch from fde8a51 to 7cfaa97 Compare February 29, 2024 09:34
@dalexeev dalexeev requested a review from Faless February 29, 2024 09:35
@Faless
Copy link
Collaborator

Faless commented Mar 1, 2024

It seems there is another case which worked before this patch, and it not currently working anymore.

That's the case where you setup the GDScript using the source_code property:

func _ready() -> void:
	# Manual script
	var node = Node.new()
	var script = GDScript.new()
	script.source_code = """
extends Node

func _init():
	print("okay")
"""
	node.set_script(script)
	bytes = var_to_bytes_with_objects(node)
	print(bytes_to_var_with_objects(bytes))

I don't think it's a super important case, but it can be fixed preserving the old behavior with this patch as far as I can tell:

diff --git a/core/io/marshalls.cpp b/core/io/marshalls.cpp
index 00652b9d6b..29d313de7d 100644
--- a/core/io/marshalls.cpp
+++ b/core/io/marshalls.cpp
@@ -1624,8 +1624,11 @@ Error encode_variant(const Variant &p_variant, uint8_t *r_buffer, int &r_len, bo
 							Ref<Script> script = obj->get_script();
 							if (script.is_valid()) {
 								String path = script->get_path();
-								ERR_FAIL_COND_V_MSG(path.is_empty() || !path.begins_with("res://"), ERR_UNAVAILABLE, "Failed to encode a path to a custom script.");
-								value = path;
+								if (path.is_empty() || !path.begins_with("res://")) {
+									value = obj->get(E.name);
+								} else {
+									value = path;
+								}
 							}
 						} else {
 							value = obj->get(E.name);

@dalexeev
Copy link
Member Author

dalexeev commented Mar 1, 2024

@Faless I did this on purpose. When you deserialize a script, you get another instance of the script, a different class (albeit with the same source code). This didn't work properly before (typed variable assignments, type compatibility checking). Now it works where possible and explicitly fails where it doesn't.

@Faless
Copy link
Collaborator

Faless commented Mar 1, 2024

I did this on purpose. When you deserialize a script, you get another instance of the script, a different class

Well, that's the same for any other object. If I serialize a Node, then deserialize it, doing node == deserialized_node will return false. They are 2 different instances.

If we want to prevent this completely we should also change the deserialization part to always require a valid script path (right now, it can deserialize such scripts, but not serialize them).

We should also add the breaks compat label since if anyone is relying on that behavior, their project will break upon update (they'll have to manually handle serialization of those scripts).

@Cammymoop
Copy link
Contributor

This is a pretty major change to script serialization behavior.

GDScript is currently serialized as a resource, keeping the source code as it was whenever an object with a script was serialized (var_to_bytes_with_objects etc)
With this, that is no longer the case and the resource path is saved, so that, when it's deserialized, the script is instead loaded from the bundled resources or loader cache. Which can be good for a number of reasons, including not ending up with some stale code in a serialized object that you aren't aware of and being able to compare the type of the script/the objects script class properly.

This is backwards compatible though, so deserializing is no less dangerous than before.

Shouldn't this part of the change have it's own proposal or issue though?

@Cammymoop
Copy link
Contributor

Also if this implementation is going in it looks like it could fix stuff like #65393 and #68666

@raulsntos
Copy link
Member

The changes to script serialization are needed to fix var_to_bytes_with_objects/bytes_to_var_with_objects for Objects with C# scripts. Without these changes, deserializing Objects with C# scripts end up with an invalid C# script attached.

C# doesn't support embedded scripts, so they must be loaded from a path. IMO, serializing a script that was loaded from a path shouldn't lose its path. So I consider this a bug fix.

@Faless diff in #78219 (comment) still fixes the bug for C# scripts, so I consider that also an acceptable solution.

@Cammymoop
Copy link
Contributor

Cammymoop commented Mar 4, 2024

The usage of the script path to serialize the typed array's "script type" is completely separable from the change to how object's attached script is serialized.

@raulsntos The latter change is what the C# serialization issues rely on as far as I can tell. Either could be done without the other it looks like. This PR does both but the description it's mentioned issues are exclusively about the former.

(I don't mind that it's doing both at once but please make it more clear in the main description and title what this is actually doing)

@Delsin-Yu
Copy link
Contributor

Any chance we can have this for 4.3, or what's the blocker on it?

@dalexeev
Copy link
Member Author

This is a pretty major change to script serialization behavior.

Without the change, binary deserialization of scripted objects and typed arrays is mostly broken at the moment. Especially in the case of typed arrays, where the same script is encoded several times, and we can end up with a situation where each element has a different type, incompatible with the type of the array itself.

I discovered this as a result of testing and did not even consider it as a compatibility break, just as a bug fix. As far as I remember, this was noted in the first comment and further in the discussion.

Also, the approach is used in text serialization, this change eliminates the inconsistency between them. See also #69248 for context.

If we want to prevent this completely we should also change the deserialization part to always require a valid script path (right now, it can deserialize such scripts, but not serialize them).

I would prefer this option so as not to create inconsistency between scripts with and without paths.

We should also add the breaks compat label since if anyone is relying on that behavior, their project will break upon update (they'll have to manually handle serialization of those scripts).

Done.

@dalexeev dalexeev force-pushed the core-typed-arrays-bin-serialization branch from 7cfaa97 to c30bec7 Compare March 20, 2024 08:37
@dalexeev
Copy link
Member Author

If we want to prevent this completely we should also change the deserialization part to always require a valid script path (right now, it can deserialize such scripts, but not serialize them).

I would prefer this option so as not to create inconsistency between scripts with and without paths.

Done.

@@ -687,7 +687,8 @@ Error decode_variant(Variant &r_variant, const uint8_t *p_buffer, int p_len, int
 							(*r_len) += used;
 						}
 
-						if (str == "script" && value.get_type() == Variant::STRING) {
+						if (str == "script") {
+							ERR_FAIL_COND_V_MSG(value.get_type() != Variant::STRING, ERR_INVALID_DATA, "Invalid value for \"script\" property, expected script path as String.");
 							String path = value;
 							ERR_FAIL_COND_V_MSG(path.is_empty() || !path.begins_with("res://") || !ResourceLoader::exists(path, "Script"), ERR_INVALID_DATA, "Invalid script path: '" + path + "'.");
 							Ref<Script> script = ResourceLoader::load(path, "Script");

Copy link
Collaborator

@Faless Faless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work 🏆 !

@akien-mga akien-mga merged commit ec5cae3 into godotengine:master Apr 4, 2024
16 checks passed
@dalexeev dalexeev deleted the core-typed-arrays-bin-serialization branch April 4, 2024 12:48
@akien-mga
Copy link
Member

Thanks!

@Saadies
Copy link

Saadies commented Apr 14, 2024

Also if this implementation is going in it looks like it could fix stuff like #65393 and #68666

I've tried with the recent Master d007340

And unfortunately neither saving and loading resources with a class name, nor saving and loading nested Resources using ResourceSaver.FLAG_BUNDLE_RESOURCES is working. (#74918 and #65393)

minimumProjektKalado.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet