New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#20411 fix Yaml parsing for very long quoted strings #21523

Closed
wants to merge 3 commits into
base: 2.7
from

Conversation

Projects
None yet
7 participants
@RichardBradley
Contributor

RichardBradley commented Feb 3, 2017

Q A
Branch? 2.7
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #20411
License MIT
Doc PR no

This is a second fix for the issue discussed in #20411. My first PR (#21279) didn't fix the bug in all cases, sorry.

If a YAML string has too many spaces in the value, it can trigger a PREG_BACKTRACK_LIMIT_ERROR error in the Yaml parser.

There should be no behavioural change other than the bug fix

I have included a test which fails before this fix and passes after this fix.

I have also added checks that detect other PCRE internal errors and throw a more descriptive exception. Before this patch, the YAML engine would often give incorrect results, rather than throwing, on a PCRE PREG_BACKTRACK_LIMIT_ERROR error.

@@ -92,13 +92,13 @@ public function parse($value, $exceptionOnInvalidType = false, $objectSupport =
}
$isRef = $mergeNode = false;
if (preg_match('#^\-((?P<leadspaces>\s+)(?P<value>.+?))?\s*$#u', $this->currentLine, $values)) {
if (self::preg_match('#^\-((?P<leadspaces>\s+)(?P<value>.+))?$#u', rtrim($this->currentLine), $values)) {

This comment has been minimized.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

See comment on line 127 below

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

See comment on line 127 below

Show outdated Hide outdated src/Symfony/Component/Yaml/Parser.php
@@ -124,7 +124,10 @@ public function parse($value, $exceptionOnInvalidType = false, $objectSupport =
if ($isRef) {
$this->refs[$isRef] = end($data);
}
} elseif (preg_match('#^(?P<key>'.Inline::REGEX_QUOTED_STRING.'|[^ \'"\[\{].*?) *\:(\s+(?P<value>.+?))?\s*$#u', $this->currentLine, $values) && (false === strpos($values['key'], ' #') || in_array($values['key'][0], array('"', "'")))) {
} elseif (self::preg_match('#^(?P<key>'.Inline::REGEX_QUOTED_STRING.'|[^ \'"\[\{].*?) *\:(\s+(?P<value>.+))?$#u', rtrim($this->currentLine), $values)

This comment has been minimized.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

Here, as well as wrapping "preg_match", I have fixed the regex to avoid large numbers of pcre backtracks by moving the trailing whitespace trimming behaviour out of the regex pattern and into a "rtrim" in the argument list.

This may potentially be less performant in some cases (but I expect more performant in most cases, actually), but I have not measured this.

It demonstrably fixes a bug, as can be seen by the unit test I have added to this commit, which fails without this change and passes with it.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

Here, as well as wrapping "preg_match", I have fixed the regex to avoid large numbers of pcre backtracks by moving the trailing whitespace trimming behaviour out of the regex pattern and into a "rtrim" in the argument list.

This may potentially be less performant in some cases (but I expect more performant in most cases, actually), but I have not measured this.

It demonstrably fixes a bug, as can be seen by the unit test I have added to this commit, which fails without this change and passes with it.

This comment has been minimized.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

The specific problem with the regex here was the ".+?" followed by a "\s*$", which leads to a great deal of backtracking behaviour in long strings. I could not find a simpler fix than the one I propose here (the possessive quantifier fix used in #21279 would not work here)

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

The specific problem with the regex here was the ".+?" followed by a "\s*$", which leads to a great deal of backtracking behaviour in long strings. I could not find a simpler fix than the one I propose here (the possessive quantifier fix used in #21279 would not work here)

@@ -108,7 +108,7 @@ public function parse($value, $exceptionOnInvalidType = false, $objectSupport =
$data[] = $this->parseBlock($this->getRealCurrentLineNb() + 1, $this->getNextEmbedBlock(null, true), $exceptionOnInvalidType, $objectSupport, $objectForMap);
} else {
if (isset($values['leadspaces'])
&& preg_match('#^(?P<key>'.Inline::REGEX_QUOTED_STRING.'|[^ \'"\{\[].*?) *\:(\s+(?P<value>.+?))?\s*$#u', $values['value'], $matches)
&& self::preg_match('#^(?P<key>'.Inline::REGEX_QUOTED_STRING.'|[^ \'"\{\[].*?) *\:(\s+(?P<value>.+))?$#u', rtrim($values['value']), $matches)

This comment has been minimized.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

see comment on line 127

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

see comment on line 127

$error = 'Error.';
}
throw new ParseException($error);

This comment has been minimized.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

This should not be reachable, but should hopefully mean that any further undetected "backtrack_limit" bugs, or any similar bugs added in the future, will result in an exception rather than an incorrect result.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

This should not be reachable, but should hopefully mean that any further undetected "backtrack_limit" bugs, or any similar bugs added in the future, will result in an exception rather than an incorrect result.

Show outdated Hide outdated src/Symfony/Component/Yaml/Parser.php
* @throws ParseException on a PCRE internal error
* @see preg_last_error()
*/
static function preg_match($pattern, $subject, &$matches = null, $flags = 0, $offset = 0)

This comment has been minimized.

@iltar

iltar Feb 3, 2017

Contributor

Should probably be called pregMatch if you want to call it like this. Perhaps match would be a better name

@iltar

iltar Feb 3, 2017

Contributor

Should probably be called pregMatch if you want to call it like this. Perhaps match would be a better name

This comment has been minimized.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

I deliberately named it "preg_match" so that it was a drop-in replacement for the builtin preg_match. Is that disallowed?

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

I deliberately named it "preg_match" so that it was a drop-in replacement for the builtin preg_match. Is that disallowed?

*
* This avoids us needing to check for "false" every time PCRE is used
* in the YAML engine
*

This comment has been minimized.

@stof

stof Feb 3, 2017

Member

must be @internal as we clearly don't want to support BC on it

@stof

stof Feb 3, 2017

Member

must be @internal as we clearly don't want to support BC on it

This comment has been minimized.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

ok, will do

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

ok, will do

$error = 'Error.';
}
throw new ParseException($error);

This comment has been minimized.

@stof

stof Feb 3, 2017

Member

this misses the location of the error in the ParseException

@stof

stof Feb 3, 2017

Member

this misses the location of the error in the ParseException

This comment has been minimized.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

The location is not always available, unless I make two wrappers -- one static, for Inline and other callers, and one non-static, for use during the parse. Do you think that's worthwhile?

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

The location is not always available, unless I make two wrappers -- one static, for Inline and other callers, and one non-static, for use during the parse. Do you think that's worthwhile?

This comment has been minimized.

@xabbuh

xabbuh Mar 2, 2017

Member

You could pass the needed context to the new method. Might not look nice, but will do what it should.

@xabbuh

xabbuh Mar 2, 2017

Member

You could pass the needed context to the new method. Might not look nice, but will do what it should.

Show outdated Hide outdated src/Symfony/Component/Yaml/Parser.php
@@ -61,7 +61,7 @@ public function __construct($offset = 0, $totalNumberOfLines = null, array $skip
*/
public function parse($value, $exceptionOnInvalidType = false, $objectSupport = false, $objectForMap = false)
{
if (!preg_match('//u', $value)) {
if (!self::preg_match('//u', $value)) {

This comment has been minimized.

@stof

stof Feb 3, 2017

Member

I would not replace this, as it would throw an exception saying Malformed UTF-8 data. instead of the expected one

@stof

stof Feb 3, 2017

Member

I would not replace this, as it would throw an exception saying Malformed UTF-8 data. instead of the expected one

This comment has been minimized.

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

Good point. I'll add a test to cover this

@RichardBradley

RichardBradley Feb 3, 2017

Contributor

Good point. I'll add a test to cover this

@nicolas-grekas nicolas-grekas added this to the 2.7 milestone Feb 6, 2017

@fabpot

This comment has been minimized.

Show comment
Hide comment
@fabpot

fabpot Feb 12, 2017

Member

fabbot failures must be fixed.

Member

fabpot commented Feb 12, 2017

fabbot failures must be fixed.

@xabbuh

This comment has been minimized.

Show comment
Hide comment
@xabbuh

xabbuh Feb 27, 2017

Member

@RichardBradley Do you have time to finish here? :)

Member

xabbuh commented Feb 27, 2017

@RichardBradley Do you have time to finish here? :)

@RichardBradley

This comment has been minimized.

Show comment
Hide comment
@RichardBradley

RichardBradley Feb 28, 2017

Contributor

Sorry, I have been busy. I should be able to look over the next couple of days, yes.

Contributor

RichardBradley commented Feb 28, 2017

Sorry, I have been busy. I should be able to look over the next couple of days, yes.

@RichardBradley

This comment has been minimized.

Show comment
Hide comment
@RichardBradley

RichardBradley Mar 1, 2017

Contributor

I have pushed an update which addresses the review comments above and fixes the "fabbot" style checks.

Contributor

RichardBradley commented Mar 1, 2017

I have pushed an update which addresses the review comments above and fixes the "fabbot" style checks.

Show outdated Hide outdated src/Symfony/Component/Yaml/Parser.php
} elseif (preg_match('#^(?P<key>'.Inline::REGEX_QUOTED_STRING.'|[^ \'"\[\{].*?) *\:(\s+(?P<value>.+?))?\s*$#u', $this->currentLine, $values) && (false === strpos($values['key'], ' #') || in_array($values['key'][0], array('"', "'")))) {
} elseif (self::preg_match('#^(?P<key>'.Inline::REGEX_QUOTED_STRING.'|[^ \'"\[\{].*?) *\:(\s+(?P<value>.+))?$#u', rtrim($this->currentLine), $values)
&& (false === strpos($values['key'], ' #')
|| in_array($values['key'][0], array('"', "'")))) {

This comment has been minimized.

@xabbuh

xabbuh Mar 2, 2017

Member

The way the if condition is wrapped now IMO doesn't make it more readable. Maybe reformat it to something like this:

} elseif (
    self::preg_match('#^(?P<key>'.Inline::REGEX_QUOTED_STRING.'|[^ \'"\[\{].*?) *\:(\s+(?P<value>.+))?$#u', rtrim($this->currentLine), $values)
    && (false === strpos($values['key'], ' #') || in_array($values['key'][0], array('"', "'")))
) {
@xabbuh

xabbuh Mar 2, 2017

Member

The way the if condition is wrapped now IMO doesn't make it more readable. Maybe reformat it to something like this:

} elseif (
    self::preg_match('#^(?P<key>'.Inline::REGEX_QUOTED_STRING.'|[^ \'"\[\{].*?) *\:(\s+(?P<value>.+))?$#u', rtrim($this->currentLine), $values)
    && (false === strpos($values['key'], ' #') || in_array($values['key'][0], array('"', "'")))
) {

This comment has been minimized.

@RichardBradley

RichardBradley Mar 10, 2017

Contributor

I have pushed a new version with your preferred indentation

@RichardBradley

RichardBradley Mar 10, 2017

Contributor

I have pushed a new version with your preferred indentation

}
throw new ParseException($error, $this->getRealCurrentLineNb() + 1, $this->currentLine);
throw new ParseException('Unable to parse', $this->getRealCurrentLineNb() + 1, $this->currentLine);

This comment has been minimized.

@xabbuh

xabbuh Mar 2, 2017

Member

Will this now ever be reached anymore?

@xabbuh

xabbuh Mar 2, 2017

Member

Will this now ever be reached anymore?

This comment has been minimized.

@RichardBradley

RichardBradley Mar 10, 2017

Contributor

Yes, this line is covered by 3 tests:

  • ParserTest::testUnindentedCollectionException
  • ParserTest::testShortcutKeyUnindentedCollectionException
  • ParserTest::testScalarInSequence
@RichardBradley

RichardBradley Mar 10, 2017

Contributor

Yes, this line is covered by 3 tests:

  • ParserTest::testUnindentedCollectionException
  • ParserTest::testShortcutKeyUnindentedCollectionException
  • ParserTest::testScalarInSequence
@RichardBradley

This comment has been minimized.

Show comment
Hide comment
@RichardBradley

RichardBradley Mar 10, 2017

Contributor

I have pushed a new version which I believe addresses all the review issues raised

Contributor

RichardBradley commented Mar 10, 2017

I have pushed a new version which I believe addresses all the review issues raised

@xabbuh

xabbuh approved these changes Mar 17, 2017

👍

Status: Reviewed

@fabpot

This comment has been minimized.

Show comment
Hide comment
@fabpot

fabpot Mar 17, 2017

Member

Thank you @RichardBradley.

Member

fabpot commented Mar 17, 2017

Thank you @RichardBradley.

fabpot added a commit that referenced this pull request Mar 17, 2017

bug #21523 #20411 fix Yaml parsing for very long quoted strings (Rich…
…ardBradley)

This PR was squashed before being merged into the 2.7 branch (closes #21523).

Discussion
----------

#20411 fix Yaml parsing for very long quoted strings

| Q             | A
| ------------- | ---
| Branch?       | 2.7
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #20411
| License       | MIT
| Doc PR        | no

This is a second fix for the issue discussed in #20411. My first PR (#21279) didn't fix the bug in all cases, sorry.

If a YAML string has too many spaces in the value, it can trigger a `PREG_BACKTRACK_LIMIT_ERROR` error in the Yaml parser.

There should be no behavioural change other than the bug fix

I have included a test which fails before this fix and passes after this fix.

I have also added checks that detect other PCRE internal errors and throw a more descriptive exception. Before this patch, the YAML engine would often give incorrect results, rather than throwing, on a PCRE `PREG_BACKTRACK_LIMIT_ERROR` error.

Commits
-------

c9a1c09 #20411 fix Yaml parsing for very long quoted strings

@fabpot fabpot closed this Mar 17, 2017

This was referenced Apr 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment