Skip to content

Fix #75245: Don't set content of elements with only whitespaces #4533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Fix #75245: Don't set content of elements with only whitespaces #4533

wants to merge 3 commits into from

Conversation

eriklundin
Copy link
Contributor

@eriklundin eriklundin commented Aug 13, 2019

Bug #75245

Test code:
var_dump(simplexml_load_string('<test1><test2> </test2><test3></test3></test1>'));

Expected results (PHP 5.4.16)

object(SimpleXMLElement)#1 (2) {
  ["test2"]=>
  object(SimpleXMLElement)#2 (0) {
  }
  ["test3"]=>
  object(SimpleXMLElement)#3 (0) {
  }
}

Results from PHP 7.3.8+:

object(SimpleXMLElement)#1 (2) {
  ["test2"]=>
  object(SimpleXMLElement)#2 (1) {
    [0]=>
    string(4) "    "
  }
  ["test3"]=>
  object(SimpleXMLElement)#3 (0) {
  }
}

Results after patch:

object(SimpleXMLElement)#1 (2) {
  ["test2"]=>
  object(SimpleXMLElement)#2 (0) {
  }
  ["test3"]=>
  object(SimpleXMLElement)#3 (0) {
  }
}

@carusogabriel
Copy link
Contributor

For reference, a running with all PHP versions: https://3v4l.org/02c8K

@saulery
Copy link

saulery commented Aug 14, 2019

Good news !

Thanks a lot !

@nikic
Copy link
Member

nikic commented Aug 14, 2019

Which commit caused the original change in behavior?

@saulery
Copy link

saulery commented Aug 14, 2019

@nikic
Copy link
Member

nikic commented Sep 17, 2019

I've tried to understand how SXE dumping is supposed to work, but was left thoroughly confused. For example:

$sxe = simplexml_load_string('<root><test>x<test2>y</test2>z</test></root>');
var_dump($sxe);
// Gives:
object(SimpleXMLElement)#4 (1) {
  ["test"]=>
  string(2) "xz"
}

Is that supposed to happen, or is this a bug?

@eriklundin
Copy link
Contributor Author

eriklundin commented Sep 17, 2019

I've been looking at the spec for XML 1.0 and i can not find a reference to how tags after character data inside a tag should be handled. I'm sure it's not valid but how it should be handled by libxml/php i don't know. PHP seems to strip out the invalid data and leave the rest. This is possibly an acceptable solution.

However, if you would want the data with the invalid characters you would need to do something like this:

$sxe = simplexml_load_string('<root><test>x&lt;test2&gt;y&lt;/test2&gt;z</test></root>');
var_dump($sxe);

object(SimpleXMLElement)#1 (1) {
  ["test"]=>
  string(18) "x<test2>y</test2>z"
}

@krakjoe
Copy link
Member

krakjoe commented Sep 30, 2019

Can we get a status update here please ?

@eriklundin
Copy link
Contributor Author

What's needed?

@krakjoe
Copy link
Member

krakjoe commented Oct 2, 2019

@eriklundin when I first read this I thought there were open questions, but I see now that you did answer ...

@krakjoe
Copy link
Member

krakjoe commented Oct 2, 2019

Merged as 6462c19

Thanks.

@krakjoe krakjoe closed this Oct 2, 2019
@nikic
Copy link
Member

nikic commented Oct 2, 2019

I've been looking at the spec for XML 1.0 and i can not find a reference to how tags after character data inside a tag should be handled. I'm sure it's not valid but how it should be handled by libxml/php i don't know. PHP seems to strip out the invalid data and leave the rest. This is possibly an acceptable solution.

Mixing character data and tags is valid XML ^^ I would expect this example to return something like

object(SimpleXMLElement)#%d (1) {
  ["test"]=>
  object(SimpleXMLElement)#%d (1) {
    [0]=>
    string(1) "x"
    ["test"] =>
    string(1) "y"
    [2]=>
    string(1) "z"
  }
}

or at least something with a similar information content. It seems pretty broken to me that SimpleXML just drops the inner tag entirely in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants