Skip to content

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Jan 18, 2021

RFC: https://wiki.php.net/rfc/readable_var_representation

Prior RFCs have proposed adding to the option list of var_export and been
rejected or had development stop:

  • A lot of code (php-src tests, PECL tests, and applications written in PHP) is already using var_export, and may be relying on the old
    behaviors for unit tests (asserting exact representation), etc.

    (Changing var_export would result in thousands of failures in php-src alone) (haven't counted yet)

  • Code compatibility with PHP 5 has also been brought up as a reason to keep
    var_export output the same.

  • See var_export - use short array syntax for readability #2699 and linked PRs.

  • Extending var_export by adding a third argument is impossible to polyfill in
    older php versions, and would result in a ArgumentCountError in php 8.0

  • See https://externals.io/message/109415

This has the following differences from var_export:

  1. var_representation() unconditionally returns a string

  2. Use null instead of NULL - the former is recommended by more coding
    guidelines (https://www.php-fig.org/psr/psr-2/).

  3. Change the way indentation is done for arrays/objects.
    See ext/standard/tests/general_functions/short_var_export1.phpt
    (e.g. always add 2 spaces, never 3 in objects, and put the array start on the
    same line as the key)

  4. Render lists as "[\n 'item1',\n]" (or "['item1']" in single-line mode) rather than "array(\n 0 => 'item1',\n)"

    Always render empty lists on a single line, render multiline by default when there are 1 or more elements

  5. Prepend \ to class names so that generated code snippets can be used in
    namespaces without any issues.

  6. Support VAR_REPRESENTATION_SINGLE_LINE in $flags.
    This will use a single-line representation for arrays/objects.

  7. Escape control characters("\x00"-"\x1f" and "\x7f"(backspace)) inside of double quoted strings
    instead of single quoted strings with unescaped control characters mixed with . "\0" ..

Having this available in php-src is useful for the following reasons:

  1. It's convenient to be able to dump a snippet
    and use that in your source code with less reformatting, updating
    namespaces, etc. (or to dump values to see in utilities)
  2. This often saves a few bytes (except for some binary strings) if the output of var_export() for an object/array
    is saved to disk, used as an array key, etc.
  3. This is possibly more readable for list-like arrays
  4. This can start being used in short, self-contained scripts if available.
  5. The checks for infinite recursion are possible to polyfill with
    ReflectionReference or other approaches,
    but error prone to implement from scratch.

Example short_var_export output (see ext/standard/tests/general_functions/var_representation1.phpt for more examples):

\ArrayObject::__set_state([
  'a',
  [
    "x\r\ny",
  ],
  [],
])
oneline: \ArrayObject::__set_state(['a', ["x\r\ny"], []])

Double quoted strings are used if the string contains \x00-\x1f or \x7f (control characters including \r, \n, \t, and backspace)

The ascii range 0x00-0x7f is encoded as follows if double quoted strings are used (\r, \n, \t, \$, \\, \", in addition to other control characters and backspace encoded as hex escaped characters)

// 0x00-0x1f
"\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
// 0x20-0x7f
" !\"#\$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f"

@stollr
Copy link

stollr commented Jan 19, 2021

As @nikic has already pointed out in the mailing list: Please remove the $return parameter and let it always return. This improves the usage and readability and I assume that this function will most of the times being used with $return = true. For the other rare cases, prepending echo is simple ;-)

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jan 21, 2021

As nikic has already pointed out in the mailing list: Please remove the $return parameter and let it always return. This improves the usage and readability and I assume that this function will most of the times being used with $return = true. For the other rare cases, prepending echo is simple ;-)

If I do that, then it'd make sense to go with an entirely different name such as var_representation() or something. It'd be incredibly inconsistent to have two functions called var_export() where one prints to stdout by default and the other can't print to stdout.

I'm still undecided on the approach - var_export_extended(), var_export_pretty(), etc. may be better names than var_export if I choose bool $return = false

@TysonAndre TysonAndre force-pushed the short_var_export branch 3 times, most recently from 06fe0a1 to 07a80ee Compare January 22, 2021 23:12
@TysonAndre TysonAndre changed the title Proposal: short_var_export($value, bool $return=false, int $flags=0) Proposal: var_representation($value, int $flags=0): string as alternative for var_export Jan 23, 2021
@Krinkle
Copy link
Contributor

Krinkle commented Jan 23, 2021

I like the end result of this, and am curious if it was considered to add this behaviour to var_export instead?

I understand that re-purposing var_export's $returns parameter is not feasible since older implementations likely cast unexpected integers to boolean true which would be a breaking change. However, could $flags not be added as third parameter? Especially now that PHP supports named arguments it would not require an awkward filler argument in callers. E.g. instead of var_export($val, false, VAR_EXPORT_PSR2) one could use var_export($val, flags: VAR_EXPORT_PSR2).

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jan 23, 2021

I understand that re-purposing var_export's $returns parameter is not feasible since older implementations likely cast unexpected integers to boolean true which would be a breaking change. However, could $flags not be added as third parameter? Especially now that PHP supports named arguments it would not require an awkward filler argument in callers. E.g. instead of var_export($val, false, VAR_EXPORT_PSR2) one could use var_export($val, flags: VAR_EXPORT_PSR2).

It was considered and has been proposed before and is definitely doable, I just prefer adding a separate function instead.

In PHP 8.0, passing too many arguments to an internal function or an unrecognized argument name becomes a fatal error instead of just a warning in php 7, see https://3v4l.org/X6jPG

Fatal error: Uncaught ArgumentCountError: var_export() expects at most 2 arguments, 3 given in ...

It's also less convenient to need to pass flags: VAR_EXPORT_PSR2 in every call, and is inconvenient to get third party libraries that start using that parameter with a third argument to work in older php versions.

  • With a brand new function, you could install a polyfill for php 8.0 and older and use code using var_representation in php 8.0 and even older functions (aside: even just return var_export($value, true) and tolerate the different output in the worst case, but I think that token_get_all() makes more sophisticated polyfills doable)
  • For modifying var_export(), you'd have to wait until php 8.0 support was dropped to unconditionally pass in the extra flag such as VAR_EXPORT_PSR2, which means a few extra years to actually start using it without a wrapper function
  • EDIT: And the default of var_export()'s output may never change to VAR_EXPORT_PSR2 with that approach.

@Krinkle
Copy link
Contributor

Krinkle commented Jan 23, 2021

Those are good points. I do worry, however, that those same reasons (passing an option, harder to polyfill) generally hold true for all options. In which case, does this mean we should expect opposition when proposing to add new options via the $flags parameter in the future? Or do we expect that most options would be "minor" and that the change proposed here "big enough" to warrant a new top-level function?

Fatal error: Uncaught ArgumentCountError

Aye, that's a bummer. I was indeed thinking it would degrade fine on older versions. Which is true for 7.x and for 8.1+ for future flags, but for 8.0 specifically we'd indeed require a wrapper.

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jan 23, 2021

Those are good points. I do worry, however, that those same reasons (passing an option, harder to polyfill) generally hold true for all options. In which case, does this mean we should expect opposition when proposing to add new options via the $flags parameter in the future? Or do we expect that most options would be "minor" and that the change proposed here "big enough" to warrant a new top-level function?

The feature freeze for 8.1 isn't until August or so based on the timeline for last year's php 8.0 release https://wiki.php.net/todo/php80 - there's 7 months to propose additional amendments or new features before the feature freeze with little objection

New parameters have been added before, e.g. https://www.php.net/manual/en/function.unserialize.php#refsect1-function.unserialize-changelog added allowed_classes to unserialize in 7.0.0

  • That case was something I'd consider the most appropriate as an additional parameter - the serialization format didn't change at all and there was no (good) way to polyfill allowed_classes
  • I want the new behaviors of var_serialize to be the default, and I don't think that switch would be made for a long, long time if adding a flag to var_export

But adding a lot of features adds the problems of making it much more time consuming to implement, discuss, for people to review the rfc, etc, and adding more flags that seem like a narrow use case would be a reason to vote against the rfc.

The other thing is that var_representation() may diverge a lot more from var_export before the feature freeze in theory, e.g. hypothetically we may want to support individual classes opting into new Point(x: 123, y: 2) instead of Point::__set_state(['x' => 123, 'y' => 2]) with some form of attribute, interface, or magic method, which may or may not be desirable for old code using
var_export but would be more desirable for var_representation
Unrecognized integer flags are deliberately ignored by the implementation, so define('VAR_REPRESENTATION_SOME_OTHER_FLAG', 2); if undefined would mostly work in a polyfill

@@ -641,6 +938,26 @@ PHP_FUNCTION(var_export)
}
/* }}} */

/* {{{ Outputs or returns a short string representation of a variable */
Copy link
Contributor

@guilliamxavier guilliamxavier Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* {{{ Outputs or returns a short string representation of a variable */
/* {{{ Returns a short string representation of a variable */

(also not sure about "short" since the rename)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 803 to 808
/*
if (level > 1) {
smart_str_appendc(buf, '\n');
buffer_append_spaces(buf, level - 1);
}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is that? This seems to need a test that would fail if uncommenting it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's left over from var_export which printed keys and values on separate lines. It affected rendering of arrays nested within other objects/arrays, and was fixed.

'inner' => [
1.0,
],
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug here: one space too many (also on lines 135 and 147)

(BTW, you mention PSR-2, but it mandates 4 spaces for indentation, not 2... but I guess that would kind of defeat the "byte saving"?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, I misread the indentation when cleaning up the indentation for the start of objects in var_export. Updated in https://github.com/php/php-src/pull/6619/files#diff-3a60c54a30e199e1dbbeed2151af8909171a5a6a7944cdca0f52411ba302a78c

[
  (object) [
    'key' => (object) [
      'inner' => [
        1.0,
      ],
    ],
    'other' => 'ref shown as value like var_export',
  ],
]
oneline: [(object) ['key' => (object) ['inner' => [1.0]], 'other' => 'ref shown as value like var_export']]
\Example::__set_state([
  'untyped2' => null,
  'untyped3' => 3,
  'typed2' => [
    \Example::__set_state([
      'untyped1' => null,
      'untyped2' => null,
      'untyped3' => 3,
      'typed2' => [],
    ]),
  ],
])
oneline: \Example::__set_state(['untyped2' => null, 'untyped3' => 3, 'typed2' => [\Example::__set_state(['untyped1' => null, 'untyped2' => null, 'untyped3' => 3, 'typed2' => []])]])

The fact that embedded newlines are now no longer emitted as parts of strings makes it easier to efficiently convert the indentation to spaces or tabs using preg_replace or preg_replace_callback,

// convert 2 space indent to 4 space
php > echo preg_replace('/^((  )+)/m', '\1\1', var_representation([[['key' => 'value  with  space']]]));
[
    [
        [
            'key' => 'value  with  space',
        ],
    ],
]

…tive for var_export

RFC: https://wiki.php.net/rfc/readable_var_representation

Prior RFCs have proposed adding to the option list of var_export and been
rejected or had development stop:

- A lot of code (php-src tests, PECL tests, and applications written in PHP) is already using var_export, and may be relying on the old
  behaviors for unit tests (asserting exact representation), etc.

  ~~(Changing var_export would result in thousands of failures in php-src alone)~~ (haven't counted yet)
- Code compatibility with PHP 5 has also been brought up as a reason to keep
  var_export output the same.
- See phpGH-2699 and linked PRs.
- Extending `var_export` by adding a third argument is impossible to polyfill in
  older php versions, and would result in a ArgumentCountError in php 8.0
- See https://externals.io/message/109415

This has the following differences from var_export:

1. var_representation() unconditionally returns a string
2. Use `null` instead of `NULL` - the former is recommended by more coding
   guidelines (https://www.php-fig.org/psr/psr-2/).
3. Change the way indentation is done for arrays/objects.
   See ext/standard/tests/general_functions/short_var_export1.phpt
   (e.g. always add 2 spaces, never 3 in objects, and put the array start on the
   same line as the key)
4. Render lists as `"['item1']"` rather than `"array(\n  0 => 'item1',\n)"`

   Always render empty lists on a single line, render multiline by default when there are 1 or more elements
5. Prepend `\` to class names so that generated code snippets can be used in
   namespaces without any issues.
6. Support `VAR_REPRESENTATION_SINGLE_LINE` in $flags.
   This will use a single-line representation for arrays/objects, though
   strings with embedded newlines will still cause newlines in the output.
7. Escape control characters("\x00"-"\x1f" and "\x7f"(backspace)) inside of double quoted strings
   instead of single quoted strings with unescaped control characters mixed with ` . "\0" . `.

Having this available in php-src is useful for the following reasons:
1. It's convenient to be able to dump a snippet
   and use that in your source code with less reformatting, updating
   namespaces, etc. (or to dump values to see in utilities)
2. This often saves a few bytes (except for some binary strings) if the output of var_export() for an object/array
   is saved to disk, used as an array key, etc.
3. This is possibly more readable for list-like arrays
4. This can start being used in short, self-contained scripts if available.
5. The checks for infinite recursion are possible to polyfill with
   ReflectionReference or other approaches,
   but error prone to implement from scratch.

Example short_var_export output (see ext/standard/tests/general_functions/var_representation1.phpt for more examples):

```php
\ArrayObject::__set_state([
  'a',
  [
    "x\r\ny",
  ],
  [],
])
oneline: \ArrayObject::__set_state(['a', ["x\r\ny"], []])
```

Double quoted strings are used if the string contains \x00-\x1f or \x7f (control characters including `\r`, `\n`, `\t`, and backspace)

The ascii range 0x00-0x7f is encoded as follows if double quoted strings are used (`\r`, `\n`, `\t`, `\$`, `\\`, `\"`, in addition to other control characters and backspace encoded as hex escaped characters)

```php
// 0x00-0x1f
"\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
// 0x20-0x7f
" !\"#\$%&'()*+,-./0123456789:;<=>?@abcdefghijklmnopqrstuvwxyz[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants