Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Only unserialize Phar metadata when getMetadata() is called #5855

Closed
wants to merge 1 commit into from

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Jul 15, 2020

In other words, don't automatically unserialize when the magic
phar:// stream wrappers are used.
RFC: https://wiki.php.net/rfc/phar_stop_autoloading_metadata

See https://externals.io/message/110856 and
https://bugs.php.net/bug.php?id=76774

This was refactored to add a phar_metadata_tracker for the following reasons:

  • The way to properly copy a zval was previously implicit and undocumented
    (e.g. is it a pointer to a raw string or an actual value)
  • Avoid unnecessary serialization and unserialization in the most common case
  • If a metadata value is serialized once while saving a new/modified phar file,
    this allows reusing the same serialized string.
  • Have as few ways to copy/clone/lazily parse metadata (etc.) as possible,
    so that code changes can be limited to only a few places in the future.
  • Performance is hopefully not a concern - copying a string should be faster
    than unserializing a value, and metadata should be rare in most cases.

@TysonAndre
Copy link
Contributor Author

@bishopb @smalyshev - this took longer than I thought. Tests were passing locally.

  • I was focusing on getting tests to pass (locally) first. I apparently didn't enable zlib locally for the minimal debug build, and will work on that next. Given that 8.0's feature freeze is near, I'd still like early feedback. From phar_split_cache_list, It looks like PHAR_G(persist) and is_persistent are only set for Phars that were in phar.cache_list (https://www.php.net/manual/en/phar.configuration.php#ini.phar.cache-list), but I'm not 100% sure about that.

    Is it possible for Phar->getMetadata() or Phar->setMetadata() to end up getting called on a phar where is_persistent is true?

Phar: test that creation of tar-based phar generates valid tar with all bells/whistles [ext/phar/tests/tar/all.phpt]
Phar::convertToPhar() with global metadata [ext/phar/tests/tar/phar_convert_phar4.phpt]
Phar: test that creation of zip-based phar generates valid zip with all bells/whistles [ext/phar/tests/zip/all.phpt]
SKIP Phar: test that creation of zip-based phar generates valid zip with all bells/whistles [ext/phar/tests/zip/all.phpt] reason: zlib not available (locally)

@TysonAndre TysonAndre changed the title [RFC] [WIP] Only unserialize Phar metadata when getMetadata() is called [RFC] Only unserialize Phar metadata when getMetadata() is called Jul 21, 2020
@nikic nikic added this to the PHP 8.0 milestone Jul 22, 2020
@nikic
Copy link
Member

nikic commented Jul 22, 2020

Is this one ready for review? It looks like the new parameter for getMetadata from the RFC is not implemented.

@TysonAndre
Copy link
Contributor Author

Is this one ready for review? It looks like the new parameter for getMetadata from the RFC is not implemented.

I'd gotten distracted when investigating how Phars track metadata internally, and the best way to refactor it to implement the main part of this RFC. This should be implemented by Friday or earlier (implementation details can be copied from unserialize()).

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jul 22, 2020

To clarify what I plan to implement:

function getMetadata() would get changed to `function getMetadata(array|bool $allowed_classes = true)

The handling and checking of $allowed_classes would be similar to the handling of $options['allowed_classes'] in https://www.php.net/manual/en/function.unserialize.php#refsect1-function.unserialize-parameters

Either an array of class names which should be accepted, FALSE to accept no classes, or TRUE to accept all classes. If this option is defined and unserialize() encounters an object of a class that isn't to be accepted, then the object will be instantiated as __PHP_Incomplete_Class instead. Omitting this option is the same as defining it as TRUE: PHP will attempt to instantiate objects of any class.

If there is metadata, then for anything other than $allowed_classes = true, the returned metadata will not be stored or loaded from the cached value in metadata_tracker.

On second thought, https://externals.io/message/111093#111121 sound like a much better idea if future options are changed in unserialize() in the future (e.g. the 'max_depth' option already added in php 8.0, or future options such as 'forbid_references')

@TysonAndre
Copy link
Contributor Author

https://wiki.php.net/rfc/phar_stop_autoloading_metadata#proposal has been amended as version 0.4 with the intent to change the signature to getMetadata(array $unserialize_options = []). Again, I plan to implement this by Friday.

ext/standard/php_var.h Outdated Show resolved Hide resolved
@TysonAndre
Copy link
Contributor Author

https://wiki.php.net/rfc/phar_stop_autoloading_metadata#proposal has been amended as version 0.4 with the intent to change the signature to getMetadata(array $unserialize_options = []). Again, I plan to implement this by Friday.

That's implemented now, and tests passed locally.

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial notes.

ext/phar/phar.c Outdated Show resolved Hide resolved
ext/phar/phar.c Outdated Show resolved Hide resolved
ext/phar/phar.c Outdated Show resolved Hide resolved
ext/phar/phar.c Outdated Show resolved Hide resolved
ext/phar/phar.c Outdated Show resolved Hide resolved
ext/phar/phar.c Outdated
/* Assert it should not be possible to create raw data in a persistent phar (i.e. from cache_list) */

ZEND_ASSERT(has_unserialize_options || Z_ISUNDEF(tracker->val));
if (has_unserialize_options && tracker->str == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is the idea here that we can call getMetadata multiple times with different unserialize options and because of that have to go through an artificial serialize + unserialize cycle here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is expending unnecessary effort and complexity for something nobody is going to care about. I would just spec the function to say "returns the metadata if already unserialized, or unserializes it with the given options if not".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is the idea here that we can call getMetadata multiple times with different unserialize options and because of that have to go through an artificial serialize + unserialize cycle here?

Only if there wasn't serialized data. But in the latest commit, I realize this is unnecessary - both setMetaData and reading a phar file ensure there is serialized data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that a few days ago but forgot to push that.

I also added another commit

    Improve robustness of `Phar*->setMetadata()`
    
    Add sanity checks for edge cases freeing metadata, when destructors
    or serializers modify the phar recursively.
    
    Typical use cases of php have read-only phars (from the ini setting) and would not be affected.

ext/phar/phar.c Show resolved Hide resolved
ext/standard/php_var.h Outdated Show resolved Hide resolved
ext/standard/php_var.h Outdated Show resolved Hide resolved
@nikic
Copy link
Member

nikic commented Jul 28, 2020

Apart from the one unresolved comment, this all looks okay-ish, but I also have pretty much zero familiarity with phar and the issues that may be involved here.

Maybe @bishopb wants to take a look at this as well.

In other words, don't automatically unserialize when the magic
phar:// stream wrappers are used.
RFC: https://wiki.php.net/rfc/phar_stop_autoloading_metadata

Also, change the signature from `getMetadata()`
to `getMetadata(array $unserialize_options = [])`.
Start throwing earlier if setMetadata() is called and serialization threw.

See https://externals.io/message/110856 and
https://bugs.php.net/bug.php?id=76774

This was refactored to add a phar_metadata_tracker for the following reasons:
- The way to properly copy a zval was previously implicit and undocumented
  (e.g. is it a pointer to a raw string or an actual value)
- Avoid unnecessary serialization and unserialization in the most common case
- If a metadata value is serialized once while saving a new/modified phar file,
  this allows reusing the same serialized string.
- Have as few ways to copy/clone/lazily parse metadata (etc.) as possible,
  so that code changes can be limited to only a few places in the future.
- Performance is hopefully not a concern - copying a string should be faster
  than unserializing a value, and metadata should be rare in most cases.

Remove unnecessary skip in a test(Compression's unused)

Add additional assertions about usage of persistent phars

Improve robustness of `Phar*->setMetadata()`

- Add sanity checks for edge cases freeing metadata, when destructors
  or serializers modify the phar recursively.
- Typical use cases of php have phar.readonly=1 and would not be affected.
@TysonAndre
Copy link
Contributor Author

I updated the UPGRADING documentation and squashed the commits and rebased with no other changes.

At the direction of the release manager for php 8.0, I'll be merging this PR this evening in the absence of major issues. https://externals.io/message/111286 (there are currently 24 yes votes and 0 no votes)

Tyson, please get this merged Monday evening to avoid timing issues
with the FF branch. It's okay that the vote is technically open till
Tuesday. In the extremely unlikely scenario that enough people change their
vote at the last minute, we can deal with that.

To avoid potential merge conflicts, the update to NEWS was left out of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants