Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add structs in GDScript #7329

Open
reduz opened this issue Jul 19, 2023 · 183 comments
Open

Add structs in GDScript #7329

reduz opened this issue Jul 19, 2023 · 183 comments

Comments

@reduz
Copy link
Member

reduz commented Jul 19, 2023

Describe the project you are working on

Godot

Describe the problem or limitation you are having in your project

There are some cases where users run into limitations using Godot with GDScript that are not entirely obvious how to work around. I will proceed to list them.

  • Classes are fine in most cases, but if you need to a allocate a lot of them, they can be more inefficient. Classes are not lightweight in Godot.
  • Sometimes you want something between a class and a dictionary, with code completion to just throw game data written by hand.
  • When exposing some APIs to script (not many, but sometimes) we use a dictionary, which has not good doc nor code completion (as example, get_property_list), because creating just a class for that is a bit overkill (they use a lot more memory).
  • Sometimes, when working with dozens of thousands of objects (think Bullet hell, simulation, etc), as much as we can optimize the interpreter, the fact that memory is not flat makes this very very inefficient in GDScript because of the high amount of cache misses.

Again, save for the third point, these are mostly performance issues related to GDScript that can't be just fixed with a faster interpreter. It needs a more efficient way to pack structures.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

The idea of this proposal is that we can solve these following problems:

  • Add struct support to GDScript (this has some user demand, given classes are quite heavy resource wise).
  • These are useful instead of using dictionaries when you want to just throw random data together, because it does the same but it has code completion.
  • Add limited struct support to the Godot API, so we avoid exposing things as dictionaries in the API, which are unfriendly (no code completion, no doc)
  • As a plus, have a way to have optimized, flat memory arrays of these structs. this allows much higher performance in GDScript when dealing with thousands of elements (as in, bullet hell, simulation, etc). We know we can use multimesh to accelerate drawing (or the new APIs in Godot 4 for drawing which are more optimized), but storing logical data for processing all these is still inefficient cache wise in GDScript.

How does it work?
In GDScript, structs should look very simple:

struct MyStruct:
	var a : String
	var b : int
	var c # untyped should work too

That's it. Use it like:

var a : MyStruct
a.member = 819
# or
var a := MyStruct(
   a = "hello",
   b = 22)

You can use them probably to have some manually written data in a way that is a bit more handy than a dictionary because you can get types and code completion:

var gamedata : struct:
      var a: int 22
      var enemy_settings : struct:
             var speed : String

And you should also get nice code completion.

But we know we want to use this in C++ too, as an example:

STRUCT_LAYOUT( Object, ProperyInfoLayout, STRUCT_MEMBER("name", Variant::STRING, String()), STRUCT_MEMBER("type", Variant::INT), STRUCT_MEMBER("hint", Variant::INT, 0), STRUCT_MEMBER("hint_string", Variant::STRING), STRUCT_MEMBER("class_name", Variant::STRING, String()) );

// .. // 

class Object {

//..//

// script bindings:
// Instead of
TYpedArray<Dictionary> _get_property_list();
// which we have now, we want to use:
TypedArray<Struct<ProperyInfoLayout>> _get_property_list();
// This gives us nice documentation and code completion.
//..//
}

This makes it possible to, in some cases, make it easier to expose data to GDScript with code completion.

Note on performance

If you are worried whether this is related to your game performance, again this should make it a bit clearer:

Q: Will this make your game faster vs dictionaries?
A: Probably not, but they can be nice data stores that can be typed.

Q: Will this make your game faster vs classes?
A: Structs are far more lightweight than classes, so if you use classes excessively, they will provide for a nice replacement that uses less memory, but performance won't change.

Struct arrays

Imagine you are working on a game that has 10000 enemies, bullets, scripted particles, etc. Managing the drawing part efficiently in Godot is not that hard, you can use MultiMesh, or if you are working in 2D, you can use the new Godot 4 CanvasItem::draw_* functions which are superbly efficient. (and even more if you use them via RenderingServer).

So your enemy has the following information:

class Enemy:
   var position: Vector2
   var attacking : bool
   var anim_frame : int

var enemies : Array[Enemy]
enemies.resize(10000)

for e in enemies:
  e.position = something
  # enemy logic

This is very inefficient in GDScript, even if you use typed code, for two reasons:

  1. Classes are big, maybe 16kb as base, using so many can be costly memory wise.
  2. When you start having and processing this many elements, memory is all over the place so there is not much cache locality. This causes memory bottleneck.

You can change the above to this:

struct Enemy:
   var position: Vector2
   var attacking : bool
   var anim_frame : int

var enemies : Array[Enemy]
enemies.resize(10000)

for e in enemies:
  e.position = something
  # enemy logic

This will use a lot less memory, but performance will be about the same. You want the memory of all these enemies to be contiguous.

NOTE: Again, keep in mind that this is a special case when you have tens of thousands of structs and you need to process them every frame. If your game does not use nearly as many entities (not even 1/10th) you will see no performance increase. So your game does not need this optimization.

Flattened Arrays

Again, this is a very special case, to get the performance benefits for it you will need to use a special array type.

struct Enemy:
   var position: Vector2
   var attacking : bool
   var anim_frame : int

var enemies : FlatArray[Enemy]
enemies.resize(10000)

for e in enemies:
  e.position = something
  # enemy logic

FlatArrays are a special case of struct array that allocate everything contiguous in memory, they are meant for performance only scenarios. Will describe how they work later on, but when this is used together with typed code, performance should increase very significantly. In fact, when at some point GDScript gets a JIT/AOT VM, this should be near C performance.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

// Implementation in Array

// ContainerTypeValidate needs to be changed:

struct ContainerTypeValidate {
	Variant::Type type = Variant::NIL;
	StringName class_name;
	Ref<Script> script;
        LocalVector<ContainerTypeValidate> struct_members; // Added this for structs, assignment from structs with same layout but different member names should be allowed (because it is likely too difficult to prevent)
	const char *where = "container";
};

// ArrayPrivate needs to be changed:

class ArrayPrivate {
public:
	SafeRefCount refcount;
	Vector<Variant> array;
	Variant *read_only = nullptr; // If enabled, a pointer is used to a temporary value that is used to return read-only values.
	ContainerTypeValidate typed;
	
	// Added struct stuff:
	uint32_t struct_size = 0; 
	StringName * struct_member_names = nullptr;
	bool struct_array = false;
	
	_FORCE_INLINE_ bool is_struct() const {
		return struct_size > 0;
	}
	
	_FORCE_INLINE_ bool is_struct_array() const {
		return struct_size > 0;
	}

	_FORCE_INLINE_ int32_t find_member_index(const StringName& p_member) const {
		for(uint32_t i = 0 ; i<struct_size ; i++) {
			if (p_member == struct_member_names[i]) {
				return (int32_t)i;
			}
		}
		
		return -1;
	}
	
	_FORCE_INLINE_ bool validate_member(uint32_t p_index,const Variant& p_value) {
		// needs to check with ContainerValidate, return true is valid
	}

};

// Not using LocalVector and resorting to manual memory allocation to improve on resoure usage and performance.

// Then, besides all the type comparison and checking (leave this to someone else to do)
// Array needs to implement set and get named functions:


Variant Array::get_named(const StringName& p_member) const {
	ERR_FAIL_COND_V(!_p->is_struct(),Variant();
	int32_t offset = _p->find_member_index(p_member);
	ERR_FAIL_INDEX_V(offset,_p->array.size(),Variant());
	return _p->array[offset];
}

void Array::set_named(const StringName& p_member,const Variant& p_value) {
	ERR_FAIL_COND(!_p->is_struct());
	int32_t offset = _p->find_member_index(p_member);
	ERR_FAIL_INDEX(offset,_p->array.size());
	ERR_FAIL_COND(!p->validate_member(p_value);
	_p->array[offset].write[offset]=p_value;
}

// These can be exposed in Variant binder so they support named indexing
// Keep in mind some extra versions with validation that return invalid set/get will need to be added for GDScript to properly throw errors

// Additionally, the Array::set needs to also perform validation if this is a struct.


// FLATTENED ARRAYTS
// We may also want to have a flattneed array, as described before, the goal is when users needs to store data for a huge amount of elements (like lots of bullets) doing
// so in flat memory fashion is a lot more efficient cache wise. Keep in mind that because variants re always 24 bytes in size, there will always be some
// memory wasting, specially if you use many floats. Additionally larger types like Transform3D are allocated separately because they don't
// fit in a Variant, but they have their own memory pools where they will most likely be allocated contiguously too.
// To sump up, this is not as fast as using C structs memory wise, but still orders of magnitude faster and more efficient than using regular arrays.

var a = FlatArray[SomeStruct]
a.resize(55) // 
print(a.size()) // 1 for single structs
a[5].member = 819

// So how this last thing work?
// The idea is to add a member to the Array class (not ArrayPrivate):

class Array {
	mutable ArrayPrivate *_p;
	void _unref() const;
	uint32_t struct_offset = 0; // Add this
public:

// And the functions described above actually are implemented like this:

Variant Array::get_named(const StringName& p_member) const {
	ERR_FAIL_COND_V(!_p->struct_layout.is_struct(),Variant();
	int32_t offset = _p->find_member_index(p_member);
	offset += struct_offset * _p->struct_size;
	ERR_FAIL_INDEX_V(offset,_p->array.size(),Variant());
	return _p->array[offset];
}

void Array::set_named(const StringName& p_member,const Variant& p_value) {
	ERR_FAIL_COND(!_p->struct_layout.is_struct());
	int32_t offset = _p->find_member_index(p_member);
	ERR_FAIL_COND(!p->validate_member(p_value);
	offset += struct_offset * _p->struct_size;
	ERR_FAIL_INDEX(offset,_p->array.size());
	_p->array[offset].write[offset]=p_value;
}

Array Array::struct_at(int p_index) const {
	ERR_FAIL_COND_V(!_p->struct_layout.is_struct(),Array());
	ERR_FAIL_INDEX_V(p_index,_p->array.size() / _p->struct_layout.get_member_count(),Array())
	Array copy = *this;
	copy.struct_offset = p_index;
	return copy;
}

// Of course, functions such as size, resize, push_back, etc. in the array should not be modified in Array itself, as this makes serialization of arrays
impossible at the low level.
// These functions should be special cased with special versions in Variant::call, including ther operator[] to return struct_at internally if in flattened array mode.
// Iteration of flattened arrays (when type information is known) could be done extremely efficiently by the GDScript VM by simply increasing the offset variable in each loop. Additionally, the GDScript VM, being typed, could be simply instructed to get members by offset, and hence it could use functions like this:

Variant Array::get_struct_member_by_offset(uint32_t p_offset) const {
	ERR_FAIL_COND_V(!_p->struct_layout.is_struct(),Variant();
	int32_t offset = p_offset;
	offset += struct_offset * _p->struct_size;
	ERR_FAIL_INDEX_V(offset,_p->array.size(),Variant());
	return _p->array[offset];
}

void Array::set_struct_member_by_offset(uint32_t p_offset,const Variant& p_value) {
	ERR_FAIL_COND(!_p->struct_layout.is_struct());
	int32_t offset = p_offset;
	offset += struct_offset * _p->struct_size;
	ERR_FAIL_INDEX(offset,_p->array.size());
	_p->array[offset].write[offset]=p_value;
}


// TYPE DESCRIPTIONS in C++

// Another problem we will face with this approach is that there are many cases where we will want to actually describe the type.
// If we had a function that returned a dictionary and now we want to change it to a struct because its easier for the user to use (description in doc, autocomplete in GDScript, etc) we must find a way. As an example for typed arrays we have:

TypedArray<Type> get_someting() const;

// And the binder takes care. Ideally we want to be able to do something like:

Struct<StructLayout> get_someting() const;

// We know we want to eventually do things like like this exposed to the binder.

TypedArray<Struct<PropertyInfoLayout>> get_property_list();

// So what are Struct and StructLayout?

//We would like to do PropertyInfoLayout like this:


STRUCT_LAYOUT( ProperyInfo, STRUCT_MEMBER("name", Variant::STRING), STRUCT_MEMBER("type", Variant::INT), STRUCT_MEMBER("hint", Variant::INT), STRUCT_MEMBER("hint_string", Variant::STRING), STRUCT_MEMBER("class_name", Variant::STRING) );

// How does this convert to C?

// Here is a rough sketch
struct StructMember {
	StringName name;
	Variant:Type type;
	StringName class_name;
        Variant default_value;
	
	StructMember(const StringName& p_name, const Variant::Type p_type,const Variant& p_default_value = Variant(), const StringName& p_class_name = StringName()) { name = p_name; type=p_type; default_value = p_default_value; class_name = p_class_name; }
};

// Important so we force SNAME to it, otherwise this will be leaked memory
#define STRUCT_MEMBER(m_name,m_type,m_default_value) StructMember(SNAME(m_name),m_type,m_default_value)
#define STRUCT_CLASS_MEMBER(m_name,m_class) StructMember(SNAME(m_name),Variant::OBJECT,Variant(),m_class)


// StructLayout should ideally be something that we can define like


#define STRUCT_LAYOUT(m_class,m_name,...) \
struct m_name { \
        _FORCE_INLINE_ static  StringName get_class() { return SNAME(#m_class)); }
        _FORCE_INLINE_ static  StringName get_name() { return SNAME(#m_name)); }
	static constexpr uint32_t member_count = GET_ARGUMENT_COUNT;\
	_FORCE_INLINE_ static const StructMember& get_member(uint32_t p_index) {\
		CRASH_BAD_INDEX(p_index,member_count)\
		static StructMember members[member_count]={ __VA_ARGS__ };\
		return members[p_index];\
	}\
};
		 
// Note GET_ARGUMENT_COUNT is a macro that we probably need to add tp typedefs.h, see:
// https://stackoverflow.com/questions/2124339/c-preprocessor-va-args-number-of-arguments

// Okay, so what is Struct<> ?

// Its a similar class to TypedArray


template <class T>
class Struct : public Array {
public:
	typedef Type T;
	
	_FORCE_INLINE_ void operator=(const Array &p_array) {
		ERR_FAIL_COND_MSG(!is_same_typed(p_array), "Cannot assign a Struct from array with a different format.");
		_ref(p_array);
	}
	_FORCE_INLINE_ Struct(const Variant &p_variant) :
			Array(T::member_count, T::get_member,Array(p_variant)) {
	}
	_FORCE_INLINE_ Struct(const Array &p_array) :
			Array(T::member_count, T::get_member,p_array) {
	}
	_FORCE_INLINE_ Struct() {
			Array(T::member_count, T::get_member) {
	}
};

// You likely saw correctly, we pass pointer to T::get_member. This is because we can't pass a structure and we want to initialize ArrayPrivate efficiently without allocating extra memory than needed, plus we want to keep this function around for validation:

Array::Array(uint32_t p_member_count, const StructMember& (*p_get_member)(uint32_t));
Array::Array(uint32_t p_member_count, const StructMember& (*p_get_member)(uint32_t),const Array &p_from); // separate one is best for performance since Array() does internal memory allocation when constructed.

// Keep in mind also that GDScript VM is not able to pass a function pointer since this is dynamic, so it will need a separate constructor to initialize the array format. Same reason why the function pointer should not be kept inside of Array.
// Likewise, GDScript may also need to pass a Script for class name, which is what ContainerTypeValidate neeeds.

// Registering the struct to Class DB
// call this inside _bind_methods of the relevant class

// goes in object.h
#define BIND_STRUCT(m_name) ClasDB::register_struct( m_name::get_class(), m_name::get_name(),  m_name::member_count, m_name::get_member);

Then you will also have to add this function `Array ClassDB::instantiate_struct(const StringName &p_class, const StringName& p_struct);` in order to construct them on demand.

// Optimizations:

// The idea here is that if GDScript code is typed, it should be able to access everything without any kind of validation or even copies. I will add this in the GDScript optimization proposal I have soon (pointer addressing mode).

// That said, I think we should consider changing ArrayPrivate::Array from Vector to LocalVector, this should enormously improve performance when accessing untyped (And eventually typed) arrays in GDScript. Arrays are shared, so there is not much of a need to use Vector<> here.

If this enhancement will not be used often, can it be worked around with a few lines of script?

N/A

Is there a reason why this should be core and not an add-on in the asset library?

N/A

@dalexeev
Copy link
Member

dalexeev commented Jul 19, 2023

@nlupugla
Copy link

Based on my understanding, it looks like users will be able to add and remove fields from their struct at runtime. Is that a "feature" or a "bug" of this implementation?

@reduz
Copy link
Member Author

reduz commented Jul 19, 2023

@nlupugla No, I think they should not be able to add and remove fields from structs. If you want this kind of flexibility, Dictionary should be used. That would make it also super hard for the typed compiler to optimize.

@Calinou
Copy link
Member

Calinou commented Jul 19, 2023

Can structs have default values for their properties?

@danilopolani
Copy link

Can structs have default values for their properties?

IMO structs should not have default properties; as Go structs, they are just interfaces basically

@nlupugla
Copy link

@nlupugla No, I think they should not be able to add and remove fields from structs. If you want this kind of flexibility, Dictionary should be used. That would make it also super hard for the typed compiler to optimize.

To be clear, are you saying that users can't add/remove fields with the current proposal or are you saying they shouldn't be able to add/remove fields. My reasoning was that since these structs are basically just fancy arrays, you can arbitrarily add and delete elements. Maybe there is something wrong in that reasoning though.

@reduz
Copy link
Member Author

reduz commented Jul 19, 2023

@Calinou I am more inclined to think they should not, at least on the C++ side.

On the GDScript side, maybe, It would have to be handled internally in the parser, and emit the initializer code as instructions if it sees a struct initialized.

@reduz
Copy link
Member Author

reduz commented Jul 19, 2023

@nlupugla Adding and removing elements would break the ability of the typed optimizer to deal with it, because it will assume a property will always be at the array same index when accessing it. The resize() and similar functions in Array, as example, should not work in struct mode.

@nlupugla
Copy link

@nlupugla Adding and removing elements would break the ability of the typed optimizer to deal with it, because it will assume a property will always be at the array same index when accessing it. The resize() and similar functions in Array, as example, should not work in struct mode.

This has implications for many of the "normal" array functions as they all now have to check weather they are a struct or not to determine weather adding/deleting is okay right? To be clear, I agree that users should not be able to add or remove fields from structs at runtime, I just want to be sure the proposal is clear about how this restriction will be implemented.

@GsLogiMaker
Copy link

How would memory be managed for structs? Would they be reference counted, passed by value, manual, or other?

@michaldev
Copy link

Great idea. I believe it would be worth expanding it with defaults - something similar to Pydantic.

@nlupugla
Copy link

How would memory be managed for structs? Would they be reference counted, passed by value, manual, or other?

They are basically fancy arrays, so I think the memory will be managed exactly as arrays are. As a result, I believe they will be ref counted and passed by reference. Feel free to correct me if I'm wrong reduz :)

@reduz
Copy link
Member Author

reduz commented Jul 19, 2023

@nlupugla that is correct!

@reduz
Copy link
Member Author

reduz commented Jul 19, 2023

The main idea of this proposal is to implement this with very minimal code changes to the engine, making most of the work on the side of the GDScript parser pretty much. Not even the VM needs to be modified.

@nlupugla
Copy link

Two follow up questions.

  1. Is it possible to have a const struct? In other words, in GDScript, will MyStruct(a = "hello", b = 22) be considered a literal in the same way that {a : "hello", b : 22} would be?
  2. If there are no user-defined defaults, how are the fields of "partially" constructed filled? In other words, what will MyStruct(a = "hello", c = 3.14).b return? Will it be null, a runtime error, an analyzer error, or the default of whatever type b is?

@adamscott
Copy link
Member

adamscott commented Jul 19, 2023

I think we should create a separate "entity" for flatten arrays of structs, even if it's pure syntactic sugar for GDScript.

struct Enemy:
   var position: Vector2
   var attacking : bool
   var anim_frame : int

var enemies : = StructArray(Enemy) # single struct
enemies.structs_resize(10000) # Make it ten thousand, flattened in memory
# or
var enemies :  = StructArray(Enemy, 10000) 

@sairam4123
Copy link

Can methods be supported like in Structs in C++?

struct Tile:
  var pos: Vector2
  var bitmask: Bitmask
  
  func mark_flag():
     bitmask |= Mask.FLAG
     
  func more_func():
    pass

it will be similar to classes.

@reduz
Copy link
Member Author

reduz commented Jul 19, 2023

@sairam4123 nope, if you want methods you will have to use classes.

@reduz
Copy link
Member Author

reduz commented Jul 19, 2023

@adamscott I thought about this, but ultimately these are methods of Array and will appear in the documentation, so IMO there is no point in hiding this.

The usage of the flattened version is already quite not-nice, but its fine because its very corner case, so I would not go the extra length to add a new way to construct for it.

Not even the syntax sugar is worth it to me due to how rarely you will make use of it.

@nlupugla
Copy link

nlupugla commented Jul 19, 2023

@adamscott I thought about this, but ultimately these are methods of Array and will appear in the documentation, so IMO there is no point in hiding this.

The usage of the flattened version is already quite not-nice, but its fine because its very corner case, so I would not go the extra length to add a new way to construct for it.

What about having StructArray inherit from Array? That way the methods for StructArray won't show up in the documentation for Array.

Another bonus would be that StructArray can have different operator overloads for math operations. For example my_array + 5 appends 5, but my_struct_array + 5 adds 5 element-wise.

@sairam4123
Copy link

sairam4123 commented Jul 19, 2023

@sairam4123 nope, if you want methods you will have to use classes.

@reduz
But C++ supports Struct Methods right? I don't think there is any use for Struct (for me atleast) if it does not support Methods.

methods help us change the state of the structs without having to resort to methods in the parent class.

Here's an example:

struct Tile:
   var grid_pos: Vector2
   var mask: BitMask

func flag_tile(tile: Tile):
   tile.mask |= Mask.FLAG

As you can see the methods are separated from the struct which is of no use. If you want the structs to be immutable, then provide us MutableStructs which can support methods that mutate it. Using classes is not a great idea for us.

@JuanFdS
Copy link

JuanFdS commented Jul 19, 2023

Could there be an special match case for structs? I think it could be useful in cases where there for example 2 different structs that represent bullets that behave in different ways, so one function with one match could concisely handle both cases (similar to how its handled in languages with functions and pattern matching).

@nlupugla
Copy link

@sairam4123 C++ structs are literally classes but with different defaults (public by default instead of private) so I don't think we should be looking to C++ as an example here. Ensuring that structs are as close to Plain Old Data as possible also makes them easy to serialize.

@KoBeWi
Copy link
Member

KoBeWi commented Jul 19, 2023

You could sort of use methods if you make a Callable field.

@nlupugla
Copy link

You could sort of use methods if you make a Callable field.

I thought about that too, although it would be super annoying without defaults as you'd have to assign every instance of the struct the same callables manually.

@tkgalk
Copy link

tkgalk commented Mar 22, 2024

I 100% get and agree about static typing and developer experience, especially in GDScript.

But this won't solve issues like a simple raycast requiring a PhysicsRayQueryParameters2D class allocation on the heap (it would be less than a class, at least, which is an improvement!) nor the result being a heap allocated object (a Dictionary) too. Currently, the class allocations in the hot-path for things like this is a big problem. The fact that the result could potentially be typed and wouldn't require a cast, awesome. But that's not a "struct". The proposal's name is very misleading. And that problem propagates to C# too, as the underlying API is the same.

var rayParams = PhysicsRayQueryParameters2D.Create(Position, Position + Target.Position);
var rayResult = _spaceRid.IntersectRay(rayParams);
var hitPosition = (Vector3)rayResult["position"];

This also impacts C# as there is no separate C# API that would go around that issue.

@AThousandShips
Copy link
Member

How could that be solved though? You'd have to use something else than a system usable by GDScript to define types

@tkgalk
Copy link

tkgalk commented Mar 22, 2024

Yeah, I think it still hits the wall of GDScript being the lowest common denominator, unfortunately.

I won't say "don't do it", because it's still a super-valuable proposal, of course. But I'd like to throw in my vote for not calling this a "struct", but, hell, a Record or something. Structs have a fairly well-understood meaning in computer science and people will assume they are on the stack.

@AThousandShips
Copy link
Member

AThousandShips commented Mar 22, 2024

Then class is a misnomer too, I disagree with the interpretation that struct is on the stack, it generally isn't in c++, unless it's specifically declared in the local scope and not in dynamic contexts, there's nothing about that name that implies this, that's just an individual assumption

And again, this is off topic, you're not going to have anything on the stack in an interpreted language like this, because it's by it's very nature dynamic, so it isn't relevant to this specific topic

What you're looking for is something like:

@tkgalk
Copy link

tkgalk commented Mar 22, 2024

Fair enough. If anything the fact that C#'s API is constrained by GDScript is a separate discussion and not on topic here. 👍🏻
Thanks for civil discourse!

@AThousandShips
Copy link
Member

AThousandShips commented Mar 22, 2024

It's not constrained by GDScript, it's constrained by Variant and it's own serialisation system, which still has to convert data so it can't send raw data directly either easily

If it's on the stack you also would be constrained to COW because it can't be copied or shared trivially, so then you lose a major performance benefit anyway

@lemonJumps
Copy link

To add my 2 cents.
I don't understand where the structs must be on heap/stack specifically? afaik storing data in each generally comes to the capabilities of the language, and discression of the programmer.

But struct has always been "structured data". And you'd use structs to share data that is structured and compatible across libraries, APIs and sockets.

Like the biggest reason why i want this, is dealing with compute shaders, and passing buffers of various data to them, which is currently very painful in gdscript.
Having these would also be good for storage buffers in visual shaders once there is official support for them as well.

@AThousandShips
Copy link
Member

AThousandShips commented Mar 22, 2024

The fact that it can't store it on the stack is because only known size data can be stored there and we're working with dynamic data here, but stack isn't always better and it depends on the context

But agreed (as I said some months ago) the name represents what it is, "structured data" (the name comes from just shortening "structure")

@zhehangd
Copy link

As we will certainly have a function to convert a struct object to PackedByteArray, I think we should be allowed to specify different layouts, in order to be compatible for different targets such as C/C++ structs, glsl uniform block. It will make the life a lot easier to use with GDExtension and compute shaders.

@Norrox
Copy link

Norrox commented Apr 3, 2024

It's great seeing the discussion here. i only have one question.
Is anyone working on this?

@ltecheroffical
Copy link

As we will certainly have a function to convert a struct object to PackedByteArray, I think we should be allowed to specify different layouts, in order to be compatible for different targets such as C/C++ structs, glsl uniform block. It will make the life a lot easier to use with GDExtension and compute shaders.

Like this?

# Pesudo code
var transfer: PackedByteArray # pretend anything here is to be transfered to glsl/C/C++ in the same var name

struct A:
    var integer: int
    var integer2: int
    
 A_instance = A(69, 420)
 transfer = some_function_to_serialize_it(A_instance, same_layout_as_defined)
// Pesudo code
struct A
{
    int integer;
    int integer2;
};

A A_instance = {};

A_instance = reinterpet_cast<A>(&transfer);

@nlupugla
Copy link

nlupugla commented Apr 3, 2024

It's great seeing the discussion here. i only have one question. Is anyone working on this?

Yup! See godotengine/godot#82198. I haven't made much progress in the past two months or so, but I should have time to jump back in to it soon :)

@FireCatMagic
Copy link

How would structs work in GDExtension from methods like raycasting? Would they automatically get converted to C++ structs (would be nice) or would they still basically be Arrays?

@geekley
Copy link

geekley commented Apr 24, 2024

IMHO ideally structs in GDScript should have semantics similar to C# structs as much as possible.
I think C# has the best struct semantics and implementation, and this should avoid potential problems in C# interop.

C# structs:

  • don't have their own intrinsic identity like objects (i.e. a pointer); its identity is defined by its contents
    • so they use value equality
  • are passed by value / copy
  • have its contents embedded in its definition; e.g. are in the stack if defined in a local variable, or embedded in its container if defined in a class or in another struct (i.e. it can be in heap)
    • this means declaring a struct variable/field is basically like "unwrapping" its field declarations in the same place
  • in C#, generics (arrays, dictionaries) of structs are automatically "packed"; that would be what is called FlatArray here (though I'd suggest PackedArray[MyStruct] for consistency if we're going that route)
  • however, structs can also be boxed/unboxed in an object, where it's passed by reference (think Variant in GDScript)
  • in C# structs fields are initialized to zero/false/null whenever you use the default allocation (e.g. when making/resizing an array with more entries)

This is what I'd expect of a good C#-compatible struct implementation in core. It would be awesome if as many of these as possible could work the same in GDScript.

Moreover, this would be also very useful:

  • Allow methods, even if it's just static methods
  • Allow static fields too
  • At least a constructor setting all fields (could be automatically defined) and a way to allocate setting no fields (i.e. init all to zero/false/null), think default(MyStruct) in C#
  • An automatic (default?) string representation in GDScript, e.g.: MyStruct(field1: "value1", field2: 3.14)
  • A way to pass a mutable struct to method arguments as a reference (think Span<T> or ref parameters in C#); this could be implemented by a Boxed[MyStruct] class or similar

Honestly, if there's any intention of ever implementing generics in GDScript, I think it'd be better to do it together with this, or at least have it in mind while defining this. Structs are a very important concept in performance-oriented languages/scenarios, and it's best to forward-think really carefully on how to do this right IMHO.

@nlupugla
Copy link

Hi @geekley, thanks for the thoughts!

One of the issues you brought up seems to boil down to nominal typing vs structural typing. This has more to do with the semantics of structs, rather than their implementation, and there has been great discussion in that area on this thread: #7903. The consensus so far favors nominal typing (struct name matters) over structural typing (structure of struct matters). If after skimming through the discussion you're still in favor of structural typing, feel free to law out your arguments there!

Regarding pass by value vs reference, I think there is much less room for debate here. In GDScript, there is no mechanism for turning a value into a reference, but you can always duplicate a reference. Making structs reference types lets users choose when and where they want to make copies, but having them be value types forces users to make copies even when they wouldn't want to.

Admittedly, I don't fully understand your points about being embedded and the box/unboxing. Maybe you can give some examples in pseudo GDScript to demonstrate?

Regarding defaults, users will be able to provide default values like they can with classes. The only difference is that the default values will be required to be known at compile time for structs. For example, you could set a default via preload, but not load. My current plan is to be able to initialize structs with a syntax like MyStruct(field1 = val1, field3 = val3), with any fields not specified being set to their default.

Regarding methods and static stuff, structs must be defined within a class (like GDScript enums). So static stuff can always go in the class that defines them. If GDScript operator overloading is ever introduced on the other hand, I could see a benefit in being allowed to do something like my_struct1 + my_struct2.

I've already implemented the stringification and it works pretty much like you described :)

@thimenesup
Copy link

I find concerning that in this sizeable discourse all I have seen are suggestions that are likely to add more troubles than solutions.

The core issue is that many of the exposed engine bindings are terrible and the root of the problem is that GDScript doesn't have structs.

Now, the thing everyone seems to be oblivious about is the fact that GDScript actually does have structs, also known as the core types like Vector2, Vector3, Vector4, Plane, AABB, Color etc... but the thing is that thats pretty much what it supports.

Their behaviour is identical to structs in other languages and already well understood by users (which I assume that may be a concern) and interfaces easily with the engine. As such, I genuinely fail to understand what is the point of any of the implementation proposals.

Now, their usability could still be improved, being able to use add a "ref" keyword like C# in function parameters to mutate these would help, but what I find the most egriegous (and obvious tech debt) is the existence of the packed arrays, where their behaviour is what you would expect for an array of structs, but for reasons, they are only implemented for a few of these core types, you want an array of AABB? Too bad, use a Typed Array instead (which is actually backed by Variants).

Why do even Packed Arrays need to be a thing and not just simply having Typed Arrays working exactly like them instead? Even the regular Array() could just simply internally be a TypedArray of Variant...

These are only the actual considerations of having a struct type implemented (which they already are, just poorly).

Everything was designed with the fact that the only structs there would ever be needed would be those few the engine gives, and to be able to define custom ones the engine must get rid of these bandaids.

@Calinou
Copy link
Member

Calinou commented May 2, 2024

Why do even Packed Arrays need to be a thing and not just simply having Typed Arrays working exactly like them instead? Even the regular Array() could just simply internally be a TypedArray of Variant...

PackedArrays are significantly faster than typed arrays thanks to their hardcoded nature, among other reasons. It's all about tradeoffs 🙂

@thimenesup
Copy link

PackedArrays are significantly faster than typed arrays thanks to their hardcoded nature, among other reasons. It's all about tradeoffs 🙂

You didn't understand the point, the point is that typed arrays should be packed arrays.

@WeaverSong
Copy link

You didn't understand the point, the point is that typed arrays should be packed arrays.

That would make typed arrays of custom classes impossible, as the devs can't hardcode every possible custom class.

@geekley
Copy link

geekley commented May 2, 2024

That would make typed arrays of custom classes impossible, as the devs can't hardcode every possible custom class.

I believe what thimenesup meant is that it should work like how it does in C#.

In C#, structs are ALWAYS packed (except for boxing), while classes are NEVER packed.
So when you do Array<MyStruct> it knows that because it's a struct, it should store its contents contiguously in the array, packed. But if you do Array<MyClass>, then only the references are contiguous, as they point to the object.

This is very different of how structs work in e.g. C/C++, where it makes no assumption about whether you allocate it packed or by reference on the heap, etc. (it depends on how you create each object e.g. with new).
In C# it does - structs are always packed (in stack, or embedded in its container class/struct or even collection types like arrays) except when boxing occurs, of course.

So I believe the point is that there should be some handling in the language that's able to statically determine whether a Array[T] should be packed or not based on whether T is a class or struct / basic type.

@geekley
Copy link

geekley commented May 3, 2024

@nlupugla

nominal typing (struct name matters) vs structural typing (structure of struct matters)

Not sure I understand, but if we were to go the C#-like route, everything matters. Type name, member name and member order. Structs have no implied equivalence with other struct types, even if their structure and member names match.
And you need to know what their type is statically to be able to handle them without boxing (e.g. into a Boxed[T] Variant, see below). If you need to box a value, it works similar to a class (handled by reference) until it's unboxed again.

Admittedly, I don't fully understand your points about being embedded and the box/unboxing. Maybe you can give some examples in pseudo GDScript to demonstrate?

Being embedded means it doesn't cause pointers/references in memory. Say we have this:

struct Placement:
  var position: Vector2i
  var rotation: float

class TargetObj:
  var name: String
  var placement: Placement

The memory layout would be as if the declaration really means this:

class TargetObj:
  var name: String
  var placement.position: Vector2i
  var placement.rotation: float

Meaning it's not a pointer to its contents elsewhere, the (member's) contents are literally THERE.
Same applies if it's defined within another struct, or in a local variable.

This is appropriate, because structs don't have intrinsic identity even conceptually (if it does, you're probably using it wrong and it should be a class instead). So you deal with structs without "referencing" it. Only if you absolutely need to deal with them as references (e.g. because you need to use dynamic typing) they become one using boxing.

Regarding boxing, if GDScript had struct semantics ideally similar to C#, we would be able to do something along these lines. I'm using Boxed[T] mostly for clarity, but it could be implemented as a new variant type, which would be what does the bridge between static/struct/value semantics and dynamic/class/reference semantics.

func change_placement(target: TargetObj, placement: Placement) -> void:
  target.placement = placement

func run() -> void:
  # allocated on the stack
  var value_1: Placement = Placement(Vector2i(1, 2), 45)
  # passed by value, so contents are copied
  var value_2: Placement = value_1
  # packed|flat array without pointers, contents are contiguous
  var packed_structs: PackedArray[Placement] = [value_1, value_2]
  # 2 struct values which are equal in content
  prints(packed_structs)
  
  # boxing operation (copies contents to a reference on the heap, which is a Variant)
  var boxed_1a: Boxed[Placement] = value_1 # `as Boxed[Placement]` can be implied
  # passed by reference, so only pointer is copied
  var boxed_1b: Boxed[Placement] = boxed_1a
  # array of pointers
  var references: Array = [boxed_1a, boxed_1b]
  # 2 pointers to the same boxed object
  prints(references)
  # when object is changed...
  boxed_1a.rotation = 10
  # since it's passed by reference, both pointers to the object reflect the above change
  prints(references)
  
  # auto-boxing happens when struct value needs to work as a Variant
  var mixed: Array = ['a', 0, value_1]
  # mixed[2] is Boxed[Placement]
  prints(mixed)
  
  # unboxing operation (copy contents from heap) happens automatically
  var unboxed_a: Placement = boxed_1a # `as Placement` is implied
  # changes only local copy on stack
  unboxed_a.rotation = 0
  # 1st has contents on the stack, 2nd is a reference; rotation is different
  prints(unboxed_a, boxed_1a)
  
  # passed by value (contents are copied)
  change_placement(target, packed_structs[0])
  # auto-unboxing operation: Boxed[Placement] -> Placement parameter
  change_placement(target, references[0])
  # auto-boxed (because it needs to become a Variant) then auto-unboxed inside the function
  change_placement.call(target, value_1)

I used PackedArray conceptually for clarity. Ideally we wouldn't even need a separate array type because the compiler would already interpret Array[MyStruct] as being packed. If you don't want that for some reason, you could use Array[Boxed[MyStruct]]; or just Array[Variant] or Array if you need to mix types.

@tektrip-biggles
Copy link

Being able to define small, lightweight, pass-by-value struct-like data in GDScript would be so so helpful.

I can sympathise with the argument that without a dereference operator, pass-by-reference gives users the most flexibility but even if they end up being called something else, I still feel pretty strongly that there needs to be some kind of user-definable pass-by-value type...

Agree 100% that things like Vector2 & Color etc are examples of how useful such things can be, though they only cover a small subset of use cases and there's currently no way to define one's own.

Can you imagine how tiresome it would be to have to manually call duplication functions every time you wanted to use assignment with one of those and the number of bugs people would be reporting whenever they forgot to do so?

@tektrip-biggles
Copy link

tektrip-biggles commented Jun 3, 2024

It would also (imho) help to simplify a lot of workflows that are currently very clunky in the editor due to having to use "Resource" types for so many things where you really just want some simple & nicely structured basic data types. Currently always having a reference to some data resource means you end up accidentally sharing those references all over the place because you forgot to mark each one as "local to scene"...

If you've spent much time in the various help forums etc, you'll see this tripping people up over & over again.

Then there's the issues with when you do want to duplicate a Resource which has other nested Resources & there's a whole lot of boilerplate required to write your own "copy" functions just to make sure the new copy has its own sub-resources & doesn't still reference the same things as the original. And that's just on the code side. When copying & pasting values in the editor, the potential for errors goes up dramatically!

Overall it just means you have to jump through way too many hoops to achieve things which should really be quite trivial imho, unless I'm totally missing something?

@Lucrecious
Copy link

@tektrip-biggles

Currently always having a reference to some data resource means you end up accidentally sharing those references all over the place

Then there's the issues with when you do want to duplicate a Resource which has other nested Resources & there's a whole lot of boilerplate required to write your own "copy" functions

These are really just general issues when deciding how you want to copy and/or move data. For these issues you're having, I think better editor and code options would be better.

For copying resources, I'd be surprised if there's not already a "duplicate with subresources recursively" function you can use, but if there isn't, you only need to write this function yourself once. The reflection in gdscript is good enough that you can just do this.

For accidentally sharing resources, I agree that there's friction there but I think this is still an editor issue. I think a good way to way to mark resources as "unique" or a "duplicate with recursively" option would go a long way if they aren't already present.

@lemonJumps
Copy link

@tektrip-biggles

For accidentally sharing resources, I agree that there's friction there but I think this is still an editor issue. I think a good way to way to mark resources as "unique" or a "duplicate with recursively" option would go a long way if they aren't already present.

This is also a common problem in python, and while the option for recursive copy is one way to fix it.

I'm thinking it might be worth it to somehow show to the user which part of data is used where as well during debug.
Since referencing the same data is often used to speed up things, it might be worth it for the user to see where everything is shared, so they can make better decisions.
I'm thinking something like a hierarchy graph that just shows users/contents, or a node graph would serve well for this purpose. :D

@tektrip-biggles
Copy link

@Lucrecious Sure, in my case I've already created a subclass of Resource which I use for my nested data structures that I want to be able to easily duplicate & a function that does exactly what you suggest via reflection. It works for now, but also feels very much like a duct-tape workaround / boilerplate-y way to do things that seems to be going "against the grain" for how the engine likes to do things. That always gives me pause since it can quickly spiral into increasingly convoluted workarounds & more duct-tape-y code being required down the line.

Like it feels like I'm using Resources wrong by pretty much never wanting to pass them by reference or save them as files or whatnot, I just want a properly typed container for an enum value and a couple of integers (or maybe a container of other pass-by-value struct-like objects etc) that I can return from a function or pass around between functions or build more complex data structures out of etc. Something that would probably only be a handful of bytes (rather than around the 13kb mentioned in the original post which could easily be 100-1000x bigger than necessary?!?). Tbh the memory thing is probably the least important aspect for my use case, but when something's 2-3 orders of magnitude off that does again give me pause.

At one point I was seriously considering just serialising and deserialising everything to/from json strings mainly to ensure that what's being passed around is just values and not references to values...

I guess the alternative would be to lean heavily on nested Dictionaries (as mentioned in the original post as well) but you lose any kind of type checking completely by going that route, right?

@tektrip-biggles
Copy link

Say I wanted to implement my own Vector5 that behaved like the existing Vector4 objects etc, what would be the best way currently to do that?

@azur-wolf
Copy link

azur-wolf commented Jun 6, 2024

Say I wanted to implement my own Vector5 that behaved like the existing Vector4 objects etc, what would be the best way currently to do that?

do it with C++ | GDExtension

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests