Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of references in Zephir #609

Open
andresgutierrez opened this issue Nov 4, 2014 · 29 comments
Open

Implementation of references in Zephir #609

andresgutierrez opened this issue Nov 4, 2014 · 29 comments

Comments

@andresgutierrez
Copy link
Contributor

andresgutierrez commented Nov 4, 2014

Regarding references, I've been mulling it over. While PHP provides references to simulate or implement a similar idea as pointers in C, Zephir was designed to implement a computational model as Javascript where references do not exist in the same manner as a pointer.

Javascript always pass by value or call-by-sharing, but when a variable refers to an object all its properties are treated as references. Zephir currently has this behavior, modifying the properties of an object does not perform separation of the underlying object.

As Zephir is intended to complement PHP, and many current libraries/frameworks depend on references, it raises the need of implement references in Zephir too.

Zephir currently maps dynamic/variant polymorphic values to zval structures, this structure looks like:

typedef union _zvalue_value {
    long lval;                  /* long value */
    double dval;                /* double value */
    struct {
        char *val;
        int len;
    } str;
    HashTable *ht;              /* hash table value */
    zend_object_value obj;
    zend_ast *ast;
} zvalue_value;

struct _zval_struct {
    /* Variable information */
    zvalue_value value;     /* value */
    zend_uint refcount__gc;
    zend_uchar type;    /* active type */
    zend_uchar is_ref__gc;
}

typedef struct _zval_struct zval;

Relevant part is zend_uchar is_ref__gc which allows to define a variable as a reference with 1 or non-reference with 0.

However, mark a variable as reference by changing is_ref__gc to 1 does not always work as the zvalue_value union that stores the zval's value is not a pointer so primitive values like null, long, double and bool aren't tied to the zval the reference points to. The following example illustrates this situation:

zval *a, *b;

// Assign $a = 10;
MAKE_STD_ZVAL(a);
ZVAL_LONG(a, 10);

// Create reference $b pointing to value $a
MAKE_STD_ZVAL(b);
Z_TYPE_P(b) = Z_TYPE_P(a);
b->value = a->value;
Z_SET_REFCOUNT_P(b, 1); \
Z_SET_ISREF_P(b); \

//Change value of $b
ZVAL_LONG(b, 10); // This changes the value of $b but not $a

Currently PHP handles references internally in its core/extensions by using a pointer to a zval pointer zval**. By using this, the implementation changes but it works as expected:

zval *a, **b;

// Assign $a = 10;
MAKE_STD_ZVAL(a);
ZVAL_LONG(a, 10);

// Create reference $b pointing to value $a
*b = a;

//Change value of $b
ZVAL_LONG(*b, 10); // This changes the value of $a and $b

Implementing this way of handling references won't be easy, a variable could be mutated between its non-reference state and then be converted to a reference:

$b = 100; // $b is not a reference (zval *)
$a = "hello";
$b = &$a;  // $b is now a reference (zval **)

Zephir generates C code for every symbol in a method and it ties a variable to a specific structure that must not change across execution, so a variable must remain zval* or zval**. In short, it is not possible for a variable type dynamic/polimorphic declared with 'var' being a reference and a dynamic value at the same time.

Now that I have explained the problem, you can understand why implement references the same way as PHP would not be possible.

We have two options here:

  • Not implement references at all, keeping the current behavior as is
  • Or, introduce a new type 'ref' to have monomorphic references in Zephir (variables that are always references)

The usage of these variables will be:

// Receive a parameter as a reference, this replaces &a
public static function someMethod(ref a)
{
    //...
}

public static function someMethod(var b)
{
    ref a;
    // Create a reference to b
    let a = b;
}

public static function someMethod()
{
    // Invalid default value
    ref a = 0;

    // Invalid assignment
    let a = 0;
}

//This method returns a reference
public static function someMethod() -> ref
{
    ref x;  var a;

    let a = 100, x = a;
    return x;
}

This implementation will remove the & operator to define/create references as the intention of create a reference is implicitly inferred by the assignment to a variable ref.

The disadvantage of this implementation is the need to implement all valid operations for this type 'ref' right through the whole language.

Thoughts?

@phalcon
Copy link

phalcon commented Nov 4, 2014

Related to #203

@carvajaldiazeduar
Copy link
Contributor

I agree with your approach but there is only one problem to solve. Using ref eliminates the ability to filter, validate or use a specific static data type for the parameter. If I want a variable be a reference to a type integer there would be no chance to do that.

I propose the following syntax to at least allow variables to be filtered or validated according to certain type:

function someMethod(ref:int a)
{
    let a = intval(a); //implicit type conversion by Zephir
}

function someMethod(ref:int! a)
{
    if (typeof a != "int") //throw exception if the variable does not match the type
}

$a = 7;
someMethod($a); //  pass addr of $a

This syntax could be also used to validate regular dynamic variables:

function someMethod(var:int! a) // variable is dynamic but type passed must be int
{
}

function someMethod(var:array a) // variable is dynamic but type passed must be converted to array
{
}

@andresgutierrez
Copy link
Contributor Author

@carvajaldiazeduar Good catch, hadn't thought about it.

Also, we need to think how memory will be managed on references, if the original memory a reference points to is freed we probably will have a segfault, so we need to increase the ref-count in the original variables and track references in the memory frame to release them properly.

@igorgolovanov
Copy link
Contributor

👍

@ovr
Copy link
Contributor

ovr commented Nov 21, 2014

Guys, Maybe someone knows about PHP CPP

@andresgutierrez
I think, C++ layer can help our to track variables (memory allocation/deallocation)

Like In PHP-CPP
https://github.com/CopernicaMarketingSoftware/PHP-CPP/blob/master/zend/value.cpp#L298

providers more easy way to allocate/deallocate memory

~ 5 month ago, @sjinks write PHP C++ exstension with
Allocation Strategy for Zval

ping @sjinks

@andresgutierrez
Copy link
Contributor Author

@ovr Maybe my explanation was not clear, but:

  • Implement a C++ layer is not viable now anyway, and not sure why it fixes the problem or what problem fixes, Zephir already has a memory manager to track and release variables
  • The problem is solved using a double pointer to a zval instead of a simple zval pointer

What problem are you trying to solve?

@ovr
Copy link
Contributor

ovr commented Nov 21, 2014

@andresgutierrez
I was trying to say
That if We use C++ it will be more clearly in code (not only for this task, for all Zephir kernel component)

@wicaksono
Copy link

I vote B,

I think it's cool if we are allowed to write some raw c code inside zephir, just like c allow programmers to write inline asm

@ghost
Copy link

ghost commented Dec 18, 2014

I vote B,

@kse300489
Copy link

Any progress?

@baszczewski
Copy link
Contributor

Hello. I would like to admit, that references is most needed functionality to implement in Zephir. Is there any progress?

@fezfez
Copy link
Contributor

fezfez commented Apr 28, 2015

@baszczewski : 👍

@valVk
Copy link

valVk commented May 6, 2015

I want to believe that this issue will be solved
B option is nice

@fezfez
Copy link
Contributor

fezfez commented Jul 12, 2015

Any news ?

@fezfez
Copy link
Contributor

fezfez commented Jul 17, 2015

With B option :

Php

$myArray = array(1, 2, 3, 4);

foreach ($myArray as &$value) {
    $value = $value * 2;
}

// $myArray is now array(2, 4, 6, 8)

Zephir

var myArray;

let myArray = [1, 2, 3, 4];

for ref value in myArray {
    value = value * 2;
}

// myArray is now [2, 4, 6, 8]

@andresgutierrez, @phalcon : Would be a correct translation ?

@tyurderi
Copy link

Something new? You know, that we all want this problem solved. @andresgutierrez

@lucups
Copy link

lucups commented Aug 31, 2015

How is everything going ?

@joeyhub
Copy link

joeyhub commented Oct 2, 2015

I'm not sure the problem can be solved well with a basic syntactical design. There are some core elements of the framework that could benefit review. It might not be a good idea to bolt this kind of feature on while the codebase is still unstable not only in terms of being an early version but also in terms of specification and design.

Phalcon was created to provide a fast framework for PHP. It gave performance at the cost of some flexibility and it only removed overhead around applications. Many people have particular bottlenecks in their own plain PHP code that are slow. Otherwise even plain PHP might be fast enough especially as there are many strategies for scaling it and making performance less important as long as latency is below a certain amount. The key here is that performance was the objective for phalcon. This is its selling point.

I assume that the problem at that time is that people who knew C and what they were doing wanted to make something that would let them be more productive but without significantly compromising the performance gains they sought out to make. We already have a slow but safe wrapper for C - PHP. With this tool creating a richer framework that could be more quickly altered is one benefit but it doesn't get around the fact that it is still optimising around others' code and if your code is slow it wont help. The drift starts here because the tool becomes targets at others. Here you have two things that conflict and you have to make some decisions what to stick to. I still think that performance over foolproofing comes first and foremost as this was as far as I can see always the original driving force for Phalon/Zephir.

I've written a few PHP extensions and I can understand the original idea behind behind zephir somewhat intimately. The PHP internals tend to change and are particularly complex (owing a lot to a lack of thorough documentation). A lot of us straight convert our PHP into C and optimise when we find a bottleneck or a hotspot. This is extremely tedious and if you do it a lot sooner or later you will end up building your own tools for it.

I've always wanted something that would convert PHP straight to it's equivalent C because basically that's what PHP does as a C wrapper and a language based in C at runtime. There is phpc but this is unmaintained and in a woeful condition. I also could not make heads or tails of the C it produced. The PHP syntax is a bit over expressive for many purposes and not really properly designed so a new syntax and parser as we see in zephir makes sense. However this should to be a full well crafted parser that builds an AST and can handle recursion automatically so any expression just works (like with Javascript) otherwise it doesn't make much sense.

The issue I have with Zephir on this front is that is deviates away from basics with memory management in an attempt to provide perfect safety and turns away from being a tool for experts. At least this is in my opinion, I have not investigated deeply to confirm this. In my opinion it might be better to leave more aspects of memory management to the user and zephir should not be reinventing the wheel. Instead it might provide warnings or static analysis to help the users. This would make zephir something that stands out from the crowd and does things differently. Also from this position it will be easier to gain metrics and information to make parsing, compilation and static analysis better able to detect potential issues making it easier to progressively improve the memory management situation without impacting performance. This would simplify zephir greatly and keep it on it's original path. I can create extensions easily with zephir. I can't good high performing extensions with it by a long shot. For me this comes down largely to an attempt to automatically manage memory and basically try to do too much of the programmers work for them, at least in what I have seen thus far and from my perspective as an advanced programmer.

Versus allowing any PHP developer to create extensions or and any C/PHP developer that knows what they are doing to create extensions I would prefer the latter. We already have PHP-CPP, PHP 5.7 and Hack. v8 extensions suffer a similar fate where they are so high level with a lack of internals access that code is often slower in C++ extensions than in native JS (both are still sometimes 10 times slower or more than PHP-C). Not to mention other languages people could migrate to. None of these give the consistent speed improvements you get with C. If I wanted an "easy" way to make a small performance improvement I would use one of the more stable options (hiphop). I see the potential in zephir to make huge performance improvements at a low cost. Sometimes you might get lucky but time after time I implement an extension in C and it is faster than the other solutions from two times to over ten times. zephir is close to offering close to the same performance gains in principle that PHP-C can achieve but if it continues down the current path I have doubts that it will achieve such performance.

Perhaps it could be possible to make a memory manager that works and allows zephir to offer comparable performance to handwritten C but I am skeptical and know that complex automatic memory management is a lot harder to implement. Personally I would rather sacrifice some safety for massive performance gains. This is also effort that could be better spent in other areas.

Zephir currently maps dynamic/variant polymorphic values to zval structures, this structure looks like

If this changes, I suspect it will make zephir slower in certain cases unless phalcon manages to create something amazing. The same problem already exists in PHP-CPP. This will turn it into an inner platform with layers of wrapping and overhead that might prove difficult to optimise.

I believe that zephir would best fill a strong use case if it stuck to producing as raw PHP C as possible for extensions while avoiding adding any kind of auxiliary automatic functionality to wrapper/helper functions and macros. It is alpha and early days so the best time to consider these things and the direction to be taken. Currently it is very far from improving performance. I can understand stability problems with an alpha release but if this were sticking to producing equivalent C I strongly doubt it would have the severe performance issues it currently has so easily. I find that it performs exponentially slower and slower so a task that takes PHP 15 seconds takes zephir two hours.

The choice is between making something that anyone including your uncle can use or making a top performance framework. If the library does not offer worthwhile performance gains against competitors if asked about zephir and what it is good for my answer will be that it isn't worth it. All that wrapping for management is making it much harder for me to fix bugs or improve performance as well as to find possible bugs in the output that I might report. This is adding hundreds or thousands of lines of code to solve a problem for everyone and every conceivable case that a programmer can solve themselves for each their own cases with a few lines of code here or there and a basic knowledge of memory leak related issues. PHP lets you make memory leaks in many ways. It is a problem that might be better solved with education. I really doubt that much can easily beat native PHP-C without a huge effort. On that front, it would be nice to know what zephir plans or ideas are for adding custom C, integrating with other hand written C extensions, etc.

Back onto the subject of references...

I usually don't have a problem using a zval pointer everytime. Occasionally I pass a pointer pointer internally when I want to do something such as initialize a zval on demand. If something goes wrong later I will rid myself of if it isn't NULL.

If I declare a zval on the stack this is usually an optimisation for a temporary variable (business as usual). It is rare that I do that.

PHP appears to be trying to do away with pass by reference but this is still a bit of a problem for passing variables. To move on with this the PHP way of doing things should be studied. For example, does changing a zval (value or type), create an entire new _zval_struct or only changes the value. I don't know this intimately.

I think everything should be "by reference" unless deliberately copied/separated or implicitly by operation where it makes sense to do so and that only the value should change rather than adding references ontop of something that internally already uses a referential system. If some rules are laid out this becomes easier.

I'm sure that if it worked like that I wouldn't have this:

image

I'm not sure that there is a real value in passing by reference in PHP to a C function or method. I don't see a case in that scope where an unitialized zval would be passed. All it will allow is something such as if you pass the reference to the zval in $a[0] you can change it to point to the value in $a[1] which is some potentially bizarre behaviour and might lead to even more questionable or exotic code than just zval pointers. ** seems largely redundant.

I don't know enough yet about what zephir is trying to do internally and exactly what is needed to move in this direction but hope this will generate a new look at it. With PHP7 coming out, which will later lead on to providing JIT and other features I think zephir will have to provide a serious performance edge to compete and remain relevant in the performance arena.

@dahweng
Copy link

dahweng commented Jul 19, 2016

(I'm posting my response from the other issue #203)
Hi everyone, I have a simple solution for this problem: use a PHP function for the task of returning a reference.
I have been searching for a solution to this problem for the past couple of days, and managed to get the function that requires references to work by doing the following;

<php

// In your PHP

function refValue( $val )
{
    $ref = &$val;
    return $ref;
}

?>

After you have created the refValue function in your PHP file, you can then use it in Zephir as follows;

// In your Zephir source code
var avar = "passed by value";
var bvar = "passed by reference";

some_function( avar, refValue( bvar ) );

I have successfully used this method to get a function that requires references ($mysqli->bind_param()) to work in Zephir.

Goodluck people! HTH

@mervick
Copy link
Contributor

mervick commented Oct 14, 2016

@dahweng your function does nothing

function refValue($val) {
    $ref = &$val;
    return $ref;
}
function inc($x) {
    $x++;
}
$y = 0;
inc(refValue($y));
echo $y; // still 0

@Jurigag
Copy link
Contributor

Jurigag commented Mar 8, 2017

Exactly. If not added references directly in zephir at least yet i would like see support for something like this code maybe earlier if there is less problem:

class MyValidation extends \Phalcon\Validation
{
    public function beforeValidation(&$data, $entity, $messages)
    {
        $data['test'] = 'asdasdasd';
    }
}

$validation = new MyValidation();
$validation->add('test', new \Phalcon\Validation\Validator\PresenceOf());
var_dump($validation->validate(['asd' => 'xyz']));

validate method executes beforeValidation and further checks data variable, with this code data is just becoming unknown type.

@fezfez
Copy link
Contributor

fezfez commented Apr 27, 2017

any news ?

@mervick
Copy link
Contributor

mervick commented Apr 27, 2017

@fezfez I use my own function for this (it works on php5.6):
create ext/zval_ref.c

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include <php.h>
#include "php.h"
#include "php_ext.h"
#include "php_main.h"
#include "ext.h"
#include "kernel/main.h"
#include "kernel/exit.h"
#include <Zend/zend.h>
#include <Zend/zend_API.h>
#include "zval_ref.h"

void ZVAL_REF(zval* dst, zval* src)
{
    int refcount__gc = dst->refcount__gc;
    dst->refcount__gc = 1;

    zval_ptr_dtor(&dst);
    MAKE_STD_ZVAL(dst);

    ZVAL_ZVAL(dst, src, 1, 0);
    dst->refcount__gc = refcount__gc;
}

ext/zval_ref.h

#ifndef ZVAL_REF_H
#define ZVAL_REF_H

#include <Zend/zend.h>
#include <Zend/zend_API.h>

void ZVAL_REF(zval* dst, zval* src);

#endif

ext/optimizers/ZvalRefOptimizer.php

<?php
namespace Zephir\Optimizers\FunctionCall;

use Zephir\Call;
use Zephir\CompilationContext;
use Zephir\CompiledExpression;
use Zephir\CompilerException;
use Zephir\Optimizers\OptimizerAbstract;

class ZvalRefOptimizer extends OptimizerAbstract
{
    public function optimize(array $expression, Call $call, CompilationContext $context)
    {
        $context->headersManager->add('utils/zval_ref');
        if (count($expression['parameters']) != 2) {
            throw new CompilerException("'zval_ref' requires two parameter", $expression);
        }
        $resolvedParams = $call->getReadOnlyResolvedParams($expression['parameters'], $context, $expression);
        $context->codePrinter->output(
            sprintf('ZVAL_REF(%s, %s);', $resolvedParams[0], $resolvedParams[1])
        );
        return new CompiledExpression('null', null, $expression);
    }
}

in your config.json add lines:

    "optimizer-dirs": [
        "ext/optimizers"
    ],
    "extra-sources": [
        "zval_ref.c"
    ]

After that in your zephir code you can use references,
but unlike php you must be sure that reference variable passed to method,
example:

public function awesomeMethod(/* reference, without type! */ ref = "undefined")
{
    var hello;
    let hello = "Hello World";
    
    if ref !== "undefined" {
        zval_ref(ref, hello);
    }
}

usage

   var ref;
   let ref = null; /* important to set some value */
   obj->awesomeMethod(ref);
   echo ref; /* output: "Hello World" */

@fezfez
Copy link
Contributor

fezfez commented May 15, 2017

@mervick Why dont you make a PR ?

@mervick
Copy link
Contributor

mervick commented May 15, 2017

@fezfez cause it works only under certain conditions (indicated above)

@JellyBrick
Copy link
Contributor

JellyBrick commented Feb 14, 2020

Is there any updates about 'implementing references'?

@sergeyklay
Copy link
Contributor

sergeyklay commented Feb 14, 2020

N0, there is no update. This was just a discussion, not a plan.

@Ilhampasya
Copy link

any news? 5 years have passed

@benalf
Copy link

benalf commented May 30, 2020

Considering zval internals have substantially changed (especially refcounting) from PHP5 through PHP7, is this not somewhat viable again?

I needed a method which receives an arg via reference in my project which I did by doing minor C code changes directly:
ZEND_ARG_INFO(0, arg) -> ZEND_ARG_INFO(1, arg) (pass by ref)
and
ZVAL_DEREF(arg); SEPARATE_ZVAL_NOREF(arg);
instead of
ZEPHIR_SEPARATE_PARAM(arg).

Worked fine for my specific needs (php7.3) and all tests passed without any segfaults. However, there might be some edge cases and I would love some input from somebody who actually knows what he is doing.

@Jeckerson Jeckerson added this to the Backlog milestone Apr 12, 2021
@Jeckerson Jeckerson removed this from the Backlog milestone Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests