New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DI] Allow binary values in parameters. #25928

Closed
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@bburnichon
Contributor

bburnichon commented Jan 25, 2018

Q A
Branch? master
Bug fix? yes
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #25916
License MIT
Doc PR none yet

This adds binary type in container xsd definition file

@xabbuh

This comment has been minimized.

Member

xabbuh commented Jan 25, 2018

Doesn't this apply to 2.7 too?

@xabbuh xabbuh added DependencyInjection and removed Feature labels Jan 25, 2018

@@ -1775,7 +1775,12 @@ private function export($value)
private function doExport($value, $resolveEnv = false)
{
if (is_string($value) && false !== strpos($value, "\n")) {
if (is_string($value) && preg_match('/[\x00-\x08\x0b-\x1f\x7f-\xff]+/', $value,$matches)) {

This comment has been minimized.

@xabbuh

xabbuh Jan 25, 2018

Member

Is this really necessary?

This comment has been minimized.

@bburnichon

bburnichon Jan 25, 2018

Contributor

Yes, otherwise, PhpDumper generates invalid UTF-8 files. As I was unable to properly add safely required characters

@@ -298,6 +298,10 @@ private function convertParameters(array $parameters, $type, \DOMElement $parent
$element->setAttribute('type', 'expression');
$text = $this->document->createTextNode(self::phpToXml((string) $value));
$element->appendChild($text);
} elseif (is_string($value) && preg_match('/[\x00-\x08\x0b-\x1f\x7f-\xff]/', $value)) {

This comment has been minimized.

@xabbuh

xabbuh Jan 25, 2018

Member

What about !preg_match('//u', $value) instead?

This comment has been minimized.

@bburnichon

bburnichon Jan 25, 2018

Contributor

I tried /\p{C}/u but it was not working as expected

This comment has been minimized.

@bburnichon

bburnichon Jan 26, 2018

Contributor

I retested it and tried to detect control characters instead except whitespaces

@@ -511,6 +511,12 @@ private function getArgumentsAsPhp(\DOMElement $node, $name, $file, $lowercase =
}
$arguments[$key] = new TaggedIteratorArgument($arg->getAttribute('tag'));
break;
case 'binary':
if (!preg_match('#^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$#', $arg->nodeValue)) {

This comment has been minimized.

@xabbuh

xabbuh Jan 25, 2018

Member

Is that necessary? base64_decode() would return false anyway which we could check for to detect errors, can't we?

This comment has been minimized.

@bburnichon

bburnichon Jan 25, 2018

Contributor

Yes, could be done this way also. I generally tend to avoid calling methods with invalid parameters.

This comment has been minimized.

@derrabus

derrabus Jan 25, 2018

Contributor

I would side with @xabbuh here: This class should not care about how a valid base64 string looks like. This is an implementation detail of the base64_decode function.

@bburnichon

This comment has been minimized.

Contributor

bburnichon commented Jan 25, 2018

I've done this PR against master only as it requires xsd to have a new binary definition. But the actual fix can be applied to all versions I think.

fabbot.io gives an error in a fixture file. Could this be avoided?

@@ -511,6 +511,12 @@ private function getArgumentsAsPhp(\DOMElement $node, $name, $file, $lowercase =
}
$arguments[$key] = new TaggedIteratorArgument($arg->getAttribute('tag'));
break;
case 'binary':
if (!preg_match('#^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$#', $arg->nodeValue)) {

This comment has been minimized.

@derrabus

derrabus Jan 25, 2018

Contributor

I would side with @xabbuh here: This class should not care about how a valid base64 string looks like. This is an implementation detail of the base64_decode function.

@derrabus

This comment has been minimized.

Contributor

derrabus commented Jan 25, 2018

fabbot.io gives an error in a fixture file. Could this be avoided?

You did not cause the error, so you can ignore it, imho.

@xabbuh xabbuh added this to the 2.7 milestone Jan 26, 2018

@dunglas

This comment has been minimized.

Member

dunglas commented Jan 26, 2018

It deserves a unit test.

I would be more comfortable to merge this change in master instead of in 2.7. It's not that trivial.

@bburnichon

This comment has been minimized.

Contributor

bburnichon commented Jan 26, 2018

I made PR against master, not 2.7. Also, what kind of unit test need to be added? I already added the use case in Dumpers and Loaders. Do you have another type of parameter to check?

@bburnichon

This comment has been minimized.

Contributor

bburnichon commented Feb 9, 2018

ping @derrabus @dunglas @xabbuh Can you change target to master instead of 2.7 I do agree that this is a new feature that should go to next version of php. On older version, a bug warning should indicate binary parameters are not well handled.

@nicolas-grekas nicolas-grekas modified the milestones: 2.7, 4.1 Feb 27, 2018

@nicolas-grekas nicolas-grekas changed the title from Allow binary values in parameters. to [DI] Allow binary values in parameters. Feb 27, 2018

@xabbuh

This comment has been minimized.

Member

xabbuh commented Mar 4, 2018

The CS complaints in the fixtures file can indeed be ignored. But can you please switch back to using explicit array() instead of the short array notation in the dumper class? :)

@@ -511,6 +511,12 @@ private function getArgumentsAsPhp(\DOMElement $node, $name, $file, $lowercase =
}
$arguments[$key] = new TaggedIteratorArgument($arg->getAttribute('tag'));
break;
case 'binary':
if (false === $value = base64_decode($arg->nodeValue)) {
throw new InvalidArgumentException(sprintf('Tag "<%s>" with type="binary" is not valid base64 encoded string', $name));

This comment has been minimized.

@xabbuh

xabbuh Mar 4, 2018

Member

[...] is not a valid [...]

This comment has been minimized.

@xabbuh

xabbuh Mar 4, 2018

Member

Please also terminate the exception message with a dot.

@@ -1775,7 +1775,12 @@ private function export($value)
private function doExport($value, $resolveEnv = false)
{
if (is_string($value) && false !== strpos($value, "\n")) {
if (is_string($value) && \in_array(preg_match('/[^\s\P{Cc}]/u', $value), [false, 1], true)) {

This comment has been minimized.

@xabbuh

xabbuh Mar 4, 2018

Member

Is the change for the PHP dumper really necessary? Was there anything breaking without this change?

This comment has been minimized.

@bburnichon

bburnichon Mar 4, 2018

Contributor

Not really but then generated file could contain non-utf8 characters and opening generated file with an ide leads to warnings about file encoding.

This comment has been minimized.

@xabbuh

xabbuh Mar 5, 2018

Member

nothing we need to worry about IMO

@@ -298,6 +298,10 @@ private function convertParameters(array $parameters, $type, \DOMElement $parent
$element->setAttribute('type', 'expression');
$text = $this->document->createTextNode(self::phpToXml((string) $value));
$element->appendChild($text);
} elseif (is_string($value) && \in_array(preg_match('/[^\s\P{Cc}]/u', $value), [false, 1], true)) {

This comment has been minimized.

@xabbuh

xabbuh Mar 4, 2018

Member

Why do we need to check for both false and 1? Do we cover both values with the current test suite?

This comment has been minimized.

@bburnichon

bburnichon Mar 4, 2018

Contributor

false is returned when value contains non-utf8 characters. 1 is returned when a control character is present in string. You're right a case is missing in current test suite.

@bburnichon

This comment has been minimized.

Contributor

bburnichon commented Mar 5, 2018

Used traditional array syntax and added test on parameter with a single control char.

@@ -511,6 +511,12 @@ private function getArgumentsAsPhp(\DOMElement $node, $name, $file, $lowercase =
}
$arguments[$key] = new TaggedIteratorArgument($arg->getAttribute('tag'));
break;
case 'binary':

This comment has been minimized.

@nicolas-grekas

nicolas-grekas Mar 19, 2018

Member

It this really legitimate? I mean: if !!binary is a core yaml feature, it should be transparent. Yaml tagged-values are only for custom tags, not core ones AFAIK.

This comment has been minimized.

@bburnichon

bburnichon Mar 19, 2018

Contributor

I looked for a way to indicate in a XML file (which should be a valid UTF-8 file) that a value was base64_encoded(). I though this was the exact same purpose as in yaml file. I did not knew that !!binary was part of the yaml specification.

This comment has been minimized.

@xabbuh

xabbuh Mar 19, 2018

Member

IMO using the same term here is legitimate. I would keep it as is.

This comment has been minimized.

@nicolas-grekas

nicolas-grekas Mar 20, 2018

Member

yes, sorry, I misread the code, all good here

@@ -1811,7 +1811,12 @@ private function export($value)
private function doExport($value, $resolveEnv = false)
{
if (is_string($value) && false !== strpos($value, "\n")) {
if (is_string($value) && \in_array(preg_match('/[^\s\P{Cc}]/u', $value), array(false, 1), true)) {

This comment has been minimized.

@nicolas-grekas

nicolas-grekas Mar 19, 2018

Member

When can preg_match return false here?
What's the reasoning for choosing this regexp? (any reference/inspiration maybe?)

This comment has been minimized.

@bburnichon

bburnichon Mar 19, 2018

Contributor

If $value is not an UTF-8 character string, preg_match will return false because of the u modifier.
Regexp was chosen to detects all characters which are control characters \p{Cc} and not a space (which are valid ascii). I mean, putting a bell character in a string is very unlikely done in purpose. I don't know of a way to put this in a file except by copy/paste.

This comment has been minimized.

@nicolas-grekas

nicolas-grekas Mar 19, 2018

Member

Hum, I see. Then why unicode at all? No need for the u modifier, isn't it?

This comment has been minimized.

@bburnichon

bburnichon Mar 19, 2018

Contributor

If I not set u modifier, I can NOT use \p{Cc} to find the allowed characters as its extension for unicode. My first try was just selecting all characters outside standard ASCII but it would filter too much IMO.

This comment has been minimized.

@nicolas-grekas

nicolas-grekas Mar 19, 2018

Member

no need for unicode, let's list the ASCII Cc explicitly, here they are:
https://en.wikipedia.org/wiki/Unicode_control_characters

This comment has been minimized.

@bburnichon

bburnichon Mar 20, 2018

Contributor

Do you want a different regex for XML and PHP serialization? I mean, in XML, I need to check whether string contains only valid UTF-8 characters implying check with the u modifier.
For PHP, I can only check for [\0-\x1f\x7f] but then simple strings like "Foo\tTab" would be base64 encoded whereas "\xf0" would not.

This comment has been minimized.

@nicolas-grekas

nicolas-grekas Mar 20, 2018

Member

I'd suggest only one way to detect binary, and using this:
[\x00-\x08\x0B\x0E-\x1A\x1C-\x1F\x7F]

@@ -1811,7 +1811,12 @@ private function export($value)
private function doExport($value, $resolveEnv = false)
{
if (is_string($value) && false !== strpos($value, "\n")) {
if (is_string($value) && \in_array(preg_match('/[\x00-\x08\x0B\x0E-\x1A\x1C-\x1F\x7F]/u', $value), array(false, 1), true)) {

This comment has been minimized.

@nicolas-grekas

nicolas-grekas Mar 22, 2018

Member

no need for the u modifier, thus no need for in_array

bburnichon added some commits Mar 22, 2018

@nicolas-grekas

This comment has been minimized.

Member

nicolas-grekas commented Mar 22, 2018

as bugfix on 3.4?

@fabpot

This comment has been minimized.

Member

fabpot commented Mar 27, 2018

Thank you @bburnichon.

@fabpot fabpot closed this Mar 27, 2018

fabpot added a commit that referenced this pull request Mar 27, 2018

feature #25928 [DI] Allow binary values in parameters. (bburnichon)
This PR was squashed before being merged into the 4.1-dev branch (closes #25928).

Discussion
----------

[DI] Allow binary values in parameters.

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | yes
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #25916
| License        | MIT
| Doc PR        | none yet

This adds `binary` type in container xsd definition file

Commits
-------

cb23134 [DI] Allow binary values in parameters.
@stof

This comment has been minimized.

Member

stof commented Mar 27, 2018

XML Dumper changes are missing their tests

@bburnichon bburnichon deleted the bburnichon:container-with-non-utf8-parameters branch Mar 27, 2018

@stof

This comment has been minimized.

Member

stof commented Mar 27, 2018

We should also support it for argument types for consistency

@@ -298,6 +298,10 @@ private function convertParameters(array $parameters, $type, \DOMElement $parent
$element->setAttribute('type', 'expression');
$text = $this->document->createTextNode(self::phpToXml((string) $value));
$element->appendChild($text);
} elseif (\is_string($value) && !preg_match('/^[^\x00-\x08\x0B\x0E-\x1A\x1C-\x1F\x7F]*+$/u', $value)) {

This comment has been minimized.

@xabbuh

xabbuh Mar 29, 2018

Member

Do we still need the u modifier here?

This comment has been minimized.

@bburnichon

bburnichon Mar 29, 2018

Contributor

Yes, to ensure string is a valid UTF8 string for XML file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment