Skip to content

Conversation

bishopb
Copy link
Contributor

@bishopb bishopb commented Jan 6, 2018

https://bugs.php.net/bug.php?id=54289

A Phar object's manifest is a hash of file leaves: the directories are not held as distinct references. So, when the user gives a request for a directory, we need to find the list of file leaves having that directory as a prefix. If you're familiar with AWS, this is like how S3 works -- there aren't "directories" in the sense of a directed node graph: rather, there are just files, which optionally have a prefix we interpret semantically as a directory.

Ideally, we'd rather want an index of files by their directories, because with rather large .phar this can be an expensive operation. Might address that in a future refactoring, but this was the most obvious implementation: the cost being that extracting directories from large phar may be slow.

@@ -4417,19 +4417,40 @@ PHP_METHOD(Phar, extractTo)
phar_archive_data *phar;
all_files:
phar = phar_obj->archive;
/* Extract all files */
/* Extract all files, or if we have jumped here from a potential
directory match, just those manifest entries matching the dir */
if (!zend_hash_num_elements(&(phar->manifest))) {
Copy link
Member

@nikic nikic Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check may prevent an exception from being thrown if the manifest is empty and a directory was specified. It should be fine to just drop this check.

zend_throw_exception_ex(phar_ce_PharException, 0,
"Phar Error: attempted to extract non-existent file \"%s\" from phar \"%s\"", Z_STRVAL_P(zval_file), phar_obj->archive->fname);
goto all_files; /* potential directory match */
dropstop:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much prefer to have the logic for extracting a directory factored out into a function than doing these goto acrobatics :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: the logic should be factored out. I thought to go with this gymnastic (-:) approach because (a) existing code already set the goto precedent, (b) the refactoring would cloud the clarity of a bug fix diff, and (c) after tackling the ~40 other phar bugs I think a wholistic refactoring would be more efficient overall.

Still, I'm happy to refactor if you feel it's important to do so. I'd started to initially, but gave it thought and talked myself out of it per above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is not so much the goto itself, as the goto jumping back into a loop, essentially reimplementing a function call using gotos... I'd really like to avoid that. Incidentally, PHP itself prohibits such gotos at compile time :)

filename = ZSTR_VAL(Z_STR_P(zval_file));
}
if (NULL != filename) {
if (strncmp(filename, entry->filename, strlen(filename)) || '/' != entry->filename[strlen(filename)]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, this requires you to pass dir and does not allow dir/. Is that intentional / consistent with other functionality? Also, is it necessary to account for \\ or is this normalized in phars?

Copy link
Contributor Author

@bishopb bishopb Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your read is correct, and I did implement it that way intentionally - under the expectation I'd also update the documentation to call out this requirement and would plan to improve support in a later round of refactoring / improvements to Phar.

I do feel it's consistent. For example, if the manifest contains a file foo at the top-level, it can only be accessed by foo: /foo does not work. Likewise, if a manifest contains bar/foo, then only bar/foo works: /bar/foo, bar//foo, and other conventions one might think should work from experience with shell path behavior does not apply.

Happy to address it, but I feel allowing variability here would be inconsistent with the specific path requirements of files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, though I'm still a bit leery about this, especially as the original bug report specifically makes use of the trailing slash.

Maybe @remicollet as the maintainer of the zip extension can chime in? It also has a ZipArchive::extractTo() method, which also does not seem to support directories, so I'd like to have your opinion on both the change in general and this specific question.

} ZEND_HASH_FOREACH_END();

if (NULL != filename && 0 == extracted) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Indentation.

Addresses two comments from @nikic:

1. The check for empty manifest no longer need to be made
2. Use of spaces when tab were called for
Refactor the extraction loop to a function, and make it aware of
directories. As before, to signal a directory extraction pass the
directory name with a terminal slash: "foo/" for example. Multiple
slashes not supported and forward slash is the canonical directory
separator.
@bishopb
Copy link
Contributor Author

bishopb commented Jan 10, 2018

As per @nikic earlier comments, I've refactored the extraction into a function and required that directories be given with a terminal slash.

@bishopb
Copy link
Contributor Author

bishopb commented Jan 10, 2018

My guess on these CI failures: the build environment isn't wiped clean on each invocation. As is, the test run is not idempotent, because it creates directories later in the test that fail earlier on subsequent runs.

Remove incorrectly committed test run artifacts. Meanwhile, clean them
up after the test runs -- if they remain, a second test run will fail as
artifacts created late in the test are an error if found earlier in the
test.
@bishopb
Copy link
Contributor Author

bishopb commented Jan 23, 2018

Ping @nikic. I've got a queue of Phar bugs fixed, but not wanting to flood PR with them. Anything else I need do on this one?


/* nothing to match, or matching "/" -- extract all files */
len = search ? strlen(search) : 0;
if (0 == len || 0 == strcmp((const char *)search, "/")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (const char *) cast here shouldn't be necessary. I also want to double check if we really want to support the "/" case. As phar paths don't have a leading /, this case stands out from the others, as it's not a prefix of the path.


/* otherwise, looking for an exact match */
} else {
entry = zend_hash_str_find_ptr(&archive->manifest, search, len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code previously used, depending on the case, either the stored length here, or directly used the zend_string for the lookup. I assume that phar paths can't contain null bytes, so binary safety is likely not an issue, but it would still be better to pass in a zend_string as search and then use zend_hash_find_ptr here.

@nikic
Copy link
Member

nikic commented Jan 24, 2018

@bishopb Sorry for the delay, I'm a bit preoccupied at the moment. Feel free to already send other phar PRs (unless you're concerned about conflicts).

Restores behavior before adding helper, which is itself consistent with
ZipArchive. The concept is this: paths in Phar do not begin with "/", so
the representation of the root dir is the lambda string, not "/". IOW,
to extract all files, pass null or lambda then be done.
Upon advice from @nikic, pass along zend_string* and only work with
char* when absolutely needed.
@bishopb
Copy link
Contributor Author

bishopb commented Jan 25, 2018

Thanks for taking a look at these, @nikic. I've touched up the code!

@nikic
Copy link
Member

nikic commented Jan 28, 2018

Sorry for the long delay, now merged as fa586ce. Thanks!

@bishopb
Copy link
Contributor Author

bishopb commented Jan 31, 2018

Thank you, @nikic, for shepherding it through! More bug fixes coming soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants