New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml_parse encoding difference from php 5.5 #4837

Open
Stype opened this Issue Feb 13, 2015 · 2 comments

Comments

Projects
None yet
4 participants
@Stype

Stype commented Feb 13, 2015

I noticed a difference in how hhvm's xml_parse handles xml that defines its own encoding. I'm on a slightly old version of hhvm, but didn't see this reported/closed anywhere.

Running

<?php
// testutf8.php
function dataEcho( $parser, $data ) {
    echo "$data";
}
$parser = xml_parser_create();
xml_set_character_data_handler(
    $parser,
    'dataEcho'
);
xml_parse( $parser, '<?xml version="1.0" encoding="iso-8859-1"?><svg xmlns="http://www.w3.org/2000/svg"><title>dÜST</title></svg>' );
echo "\n";

Produces:
vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ hhvm --version
HipHop VM 3.3.1 (rel)
Compiler: heads/master-0-g3efe530cd55a886d88b29acc014dde14c4cff755
Repo schema: b538518f4338eecd9dc2ff13865fd306926fde1b
Extension API: 20140829
vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ hhvm testutf8.php
dÜST
vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ php5 --version
PHP 5.5.9-1ubuntu4.5 (cli) (built: Oct 29 2014 11:59:10)
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies
with Xdebug v2.2.3, Copyright (c) 2002-2013, by Derick Rethans
vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ php5 testutf8.php
d�ST

@MattEagle

This comment has been minimized.

Show comment
Hide comment
@MattEagle

MattEagle Feb 20, 2015

Contributor

http://3v4l.org/cbdi2

This seems to match the behavior up to php 5.0.1, but beyond that the zend implementation changed.

Contributor

MattEagle commented Feb 20, 2015

http://3v4l.org/cbdi2

This seems to match the behavior up to php 5.0.1, but beyond that the zend implementation changed.

@gggeek

This comment has been minimized.

Show comment
Hide comment
@gggeek

gggeek Apr 14, 2015

It is true that this behaviour changed ages ago - and that it was arguably better before (it made it easy to tell to the xml parser that the charset to be expected was gotten from e.g. http headers).

Otoh I have a small code variation which shows how to obtain a more consistent behaviour from PHP whereas HHVM still gets it different:

http://3v4l.org/8UtWi

gggeek commented Apr 14, 2015

It is true that this behaviour changed ages ago - and that it was arguably better before (it made it easy to tell to the xml parser that the charset to be expected was gotten from e.g. http headers).

Otoh I have a small code variation which shows how to obtain a more consistent behaviour from PHP whereas HHVM still gets it different:

http://3v4l.org/8UtWi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment