Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Implement structs (WIP) #13800

Draft
wants to merge 30 commits into
base: master
Choose a base branch
from
Draft

Conversation

iluuu1994
Copy link
Member

@iluuu1994 iluuu1994 commented Mar 25, 2024

This is some early experimentation on structs, which are objects with value semantics (the same as arrays and strings). The main motivation is making data structures modeled with classes more ergonomic. For example, there's a desire for a faster Vector implementation (like the one from php-ds). However, reference types can lead to defensive copying to avoid accidental changes at a distance to an owned value, or bugs if copying is mistakenly omitted. readonly solves the latter by completely disallowing mutation. However, this essentially just forces a copy, which is bad for performance, especially for big data structures.

Structs instead automatically "separate" (clone) themselves from any other reference only once a modification is made to the object. If it is only referenced from a single place, no copy is needed. These are also called CoW (copy-on-write) semantics.

Method calls pose an interesting problem: When there is a chain of property accesses, ending in a method call, the entire chain must be separated to avoid the change from leaking to other references. However, with the standard method call syntax, it is not clear whether a call will refer to a self-mutating method call (imagine a Vector::push() method, requiring separation), or an immutable method (not requiring separation) in advance. To signal to the engine that the chain needs to be separated, as it would for arrays on $a['b']['c] = 'c';, we use the $parent->children->push!($child); syntax instead. If $parent is referenced from multiple places, it will be cloned and the clone will be stored in $parent. The same happens for the children property. This ensures that any other reference to the same instance as $parent remains unmodified.

Side note: We're looking for a different syntax for method calls, because Bob would like to use it for macros at some point.

TODOs:

  • mutating at decl-site support
  • Disallow SplObjectStorage/WeakMap. Hashing structs is complex and should be solved in a separate RFC.
    • References are problematic for hashes. If a data object contains references, the reference may implicitly change the hash for objects that are already stored in some bucket. Any future lookup will be impossible. So, using data objects containing references as keys must necessarily clone the object and unref any references.
    • SplObjectStorage internally adds objects to an array with the handle as the key. This key is guaranteed to be unique. Using a hash for structs will not be straight forward, because we cannot avoid hash collisions. Thus, we will either need support for structs in arrays themselves, or we'll need to add a higher-level bucket to the underlying array so that we can handle collisions ourselves.
    • WeakMap is a weird use-case for data values, because their RCs change unpredictably. It might be best to disallow them as WeakMap keys. Moreover, WeakMap will not add a refcount for its object key, making it possible to change the hash by modifying the object in-place, leaving the object in the wrong bucket.
  • Disallow ArrayObject to avoid uncontrolled changes and integer keys
  • Opcache/Optimizer
    • RC inference
    • JIT
  • Disallow ReflectionProperty::setValue() and ReflectionMethod::invoke(). They require @prefer-ref in userland for overrides of these methods.

Benchmark:

Valgrind shows a small performance regression. The real-time showed a small slowdown of +0.07% for mean and +0.10% for fastest of 20 runs with -T10,300 of the Symfony Demo benchmark.

Details

Before:
2.995101 + 2.997169 + 2.998048 + 3.000360 + 3.000639 + 3.003076 + 3.003258 + 3.003537 + 3.003724 + 3.003811 + 3.004047 + 3.004114 + 3.004636 + 3.004637 + 3.004900 + 3.005175 + 3.005566 + 3.006546 + 3.007003 + 3.007804
After:
2.997989 + 3.000772 + 3.003236 + 3.003272 + 3.003358 + 3.003549 + 3.004557 + 3.004694 + 3.004716 + 3.004723 + 3.004893 + 3.005222 + 3.005734 + 3.007125 + 3.007395 + 3.007596 + 3.008235 + 3.008681 + 3.009244 + 3.010537

Mean: 100 / 3.00315755 * 3.0052764 = +0.07%
Fastest: 100 / 2.995101 * 2.997989 = +0.10%

@iluuu1994 iluuu1994 added the RFC label Apr 2, 2024
Zend/zend_compile.h Outdated Show resolved Hide resolved
Zend/zend_compile.c Show resolved Hide resolved
Zend/zend_types.h Show resolved Hide resolved
Zend/zend_operators.c Show resolved Hide resolved
Zend/zend_vm_def.h Outdated Show resolved Hide resolved
Zend/zend_vm_execute.h Show resolved Hide resolved
ext/zend_test/test.stub.php Show resolved Hide resolved
@@ -0,0 +1,27 @@
--TEST--
Copy link
Contributor

@mvorisek mvorisek Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see a test with spl_object_id() confirming the expected CoW semantic like:

$o = new data class;
$origId = spl_object_id($o);

$fx = function ($o) use ($origId) {
    var_dump(spl_object_id($o) - $origId);

    $o->x = 2; // mutate the data class instance to enforce object copy
    var_dump(spl_object_id($o) - $origId);
    $o->x = 3;
    var_dump(spl_object_id($o) - $origId);

    $o2 = $o;
    var_dump(spl_object_id($o2) - $origId);
    $o2->x = 3;
    var_dump(spl_object_id($o2) - $origId);
    $o2->x = 12;
    var_dump(spl_object_id($o2) - $origId);
    $o2->x = 13;
    var_dump(spl_object_id($o2) - $origId);
};
$fx();

var_dump(spl_object_id($o) - $origId);

@iluuu1994 iluuu1994 changed the title [RFC] Implement data classes (WIP) [RFC] Implement structs (WIP) May 8, 2024
Comment on lines +9112 to +9114
zend_object_clone_obj_t clone_call = Z_OBJ(EX(This))->handlers->clone_obj;
ZEND_ASSERT(clone_call);
ZVAL_OBJ(result, clone_call(Z_OBJ(EX(This))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you separate the object at this point? Why not to delay this for CoW?

Comment on lines +1042 to +1044
if (OP1_TYPE & (IS_VAR|IS_CV)) {
SEPARATE_DATA_OBJ(object);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is not necessary to separate data objects for $this?

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't like the "!" syntax at call-site. I understand that it mainly need for the efficient implementation. It reflects that we call the "mutating" method (and need to separate arryas and structs before INIT_METHOD_CALL). I would prefer to use regular call syntax and not to think about "!" at each call.

  2. Now it's allowed to modify $this "struct" in non-"mutating" methods. This should be prohibited or "mutating" flag should be add explicitly or it should be removed at all. There is some design problem...

  3. Now it's possible to have "struct" with dynamic properties - "#[AllowDynamicProperties]". Really I would expect "struct" to be non-extensible.

  4. I don't see why "struct" can't be inherited by another "struct". In C# this is done to keep structs small and allocate them on stack. We use regular heap-allocated "zend_object"s anyway.

  5. There is something wrong with comparison

<?php
struct S1 {
	function __construct(public int $a, public int &$b) {}
}
$c = $d = 2;
$a = new S1(1, $c);
$b = new S1(1, $d);
var_dump($a === $b);
var_dump($a, $b);
var_dump($a === $b);
?>
--OUTPUT--
bool(false)
object(S1)#1 (2) {
  ["a"]=>
  int(1)
  ["b"]=>
  &int(2)
}
object(S1)#2 (2) {
  ["a"]=>
  int(1)
  ["b"]=>
  &int(2)
}
bool(true)
  1. The behavior becomes too complex... Can you predict the output before running this? Did you guess and how many time did you think?
<?php
struct S1 {
	function __construct(public int $a) {}
	public mutating function foo() {
		echo __FUNCTION__,"\n";
		$this->a++;
	}
}
struct S2 {
	function __construct(public S1 $s) {}
	public mutating function bar() {
		$this->s->foo!();
		return $this->s;
	}
}
$s = new S2(new S1(1));
$copy = $s;
$s->bar!()->foo!();
var_dump($s, $copy);
?>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants