-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fixes bug #54289 #3009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes bug #54289 #3009
Conversation
ext/phar/phar_object.c
Outdated
@@ -4417,19 +4417,40 @@ PHP_METHOD(Phar, extractTo) | |||
phar_archive_data *phar; | |||
all_files: | |||
phar = phar_obj->archive; | |||
/* Extract all files */ | |||
/* Extract all files, or if we have jumped here from a potential | |||
directory match, just those manifest entries matching the dir */ | |||
if (!zend_hash_num_elements(&(phar->manifest))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check may prevent an exception from being thrown if the manifest is empty and a directory was specified. It should be fine to just drop this check.
ext/phar/phar_object.c
Outdated
zend_throw_exception_ex(phar_ce_PharException, 0, | ||
"Phar Error: attempted to extract non-existent file \"%s\" from phar \"%s\"", Z_STRVAL_P(zval_file), phar_obj->archive->fname); | ||
goto all_files; /* potential directory match */ | ||
dropstop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd much prefer to have the logic for extracting a directory factored out into a function than doing these goto acrobatics :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree: the logic should be factored out. I thought to go with this gymnastic (-:) approach because (a) existing code already set the goto
precedent, (b) the refactoring would cloud the clarity of a bug fix diff, and (c) after tackling the ~40 other phar bugs I think a wholistic refactoring would be more efficient overall.
Still, I'm happy to refactor if you feel it's important to do so. I'd started to initially, but gave it thought and talked myself out of it per above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is not so much the goto itself, as the goto jumping back into a loop, essentially reimplementing a function call using gotos... I'd really like to avoid that. Incidentally, PHP itself prohibits such gotos at compile time :)
ext/phar/phar_object.c
Outdated
filename = ZSTR_VAL(Z_STR_P(zval_file)); | ||
} | ||
if (NULL != filename) { | ||
if (strncmp(filename, entry->filename, strlen(filename)) || '/' != entry->filename[strlen(filename)]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly, this requires you to pass dir
and does not allow dir/
. Is that intentional / consistent with other functionality? Also, is it necessary to account for \\
or is this normalized in phars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your read is correct, and I did implement it that way intentionally - under the expectation I'd also update the documentation to call out this requirement and would plan to improve support in a later round of refactoring / improvements to Phar.
I do feel it's consistent. For example, if the manifest contains a file foo
at the top-level, it can only be accessed by foo
: /foo
does not work. Likewise, if a manifest contains bar/foo
, then only bar/foo
works: /bar/foo
, bar//foo
, and other conventions one might think should work from experience with shell path behavior does not apply.
Happy to address it, but I feel allowing variability here would be inconsistent with the specific path requirements of files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, though I'm still a bit leery about this, especially as the original bug report specifically makes use of the trailing slash.
Maybe @remicollet as the maintainer of the zip extension can chime in? It also has a ZipArchive::extractTo()
method, which also does not seem to support directories, so I'd like to have your opinion on both the change in general and this specific question.
ext/phar/phar_object.c
Outdated
} ZEND_HASH_FOREACH_END(); | ||
|
||
if (NULL != filename && 0 == extracted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Indentation.
Addresses two comments from @nikic: 1. The check for empty manifest no longer need to be made 2. Use of spaces when tab were called for
Refactor the extraction loop to a function, and make it aware of directories. As before, to signal a directory extraction pass the directory name with a terminal slash: "foo/" for example. Multiple slashes not supported and forward slash is the canonical directory separator.
As per @nikic earlier comments, I've refactored the extraction into a function and required that directories be given with a terminal slash. |
My guess on these CI failures: the build environment isn't wiped clean on each invocation. As is, the test run is not idempotent, because it creates directories later in the test that fail earlier on subsequent runs. |
Remove incorrectly committed test run artifacts. Meanwhile, clean them up after the test runs -- if they remain, a second test run will fail as artifacts created late in the test are an error if found earlier in the test.
Ping @nikic. I've got a queue of Phar bugs fixed, but not wanting to flood PR with them. Anything else I need do on this one? |
ext/phar/phar_object.c
Outdated
|
||
/* nothing to match, or matching "/" -- extract all files */ | ||
len = search ? strlen(search) : 0; | ||
if (0 == len || 0 == strcmp((const char *)search, "/")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (const char *)
cast here shouldn't be necessary. I also want to double check if we really want to support the "/"
case. As phar paths don't have a leading /
, this case stands out from the others, as it's not a prefix of the path.
ext/phar/phar_object.c
Outdated
|
||
/* otherwise, looking for an exact match */ | ||
} else { | ||
entry = zend_hash_str_find_ptr(&archive->manifest, search, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code previously used, depending on the case, either the stored length here, or directly used the zend_string for the lookup. I assume that phar paths can't contain null bytes, so binary safety is likely not an issue, but it would still be better to pass in a zend_string
as search
and then use zend_hash_find_ptr
here.
@bishopb Sorry for the delay, I'm a bit preoccupied at the moment. Feel free to already send other phar PRs (unless you're concerned about conflicts). |
Restores behavior before adding helper, which is itself consistent with ZipArchive. The concept is this: paths in Phar do not begin with "/", so the representation of the root dir is the lambda string, not "/". IOW, to extract all files, pass null or lambda then be done.
Upon advice from @nikic, pass along zend_string* and only work with char* when absolutely needed.
Thanks for taking a look at these, @nikic. I've touched up the code! |
Sorry for the long delay, now merged as fa586ce. Thanks! |
Thank you, @nikic, for shepherding it through! More bug fixes coming soon. |
https://bugs.php.net/bug.php?id=54289
A Phar object's manifest is a hash of file leaves: the directories are not held as distinct references. So, when the user gives a request for a directory, we need to find the list of file leaves having that directory as a prefix. If you're familiar with AWS, this is like how S3 works -- there aren't "directories" in the sense of a directed node graph: rather, there are just files, which optionally have a prefix we interpret semantically as a directory.
Ideally, we'd rather want an index of files by their directories, because with rather large .phar this can be an expensive operation. Might address that in a future refactoring, but this was the most obvious implementation: the cost being that extracting directories from large phar may be slow.