Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser stucked in loop #87

Closed
Gounlaf opened this issue Feb 11, 2016 · 9 comments
Closed

Parser stucked in loop #87

Gounlaf opened this issue Feb 11, 2016 · 9 comments
Labels

Comments

@Gounlaf
Copy link

Gounlaf commented Feb 11, 2016

Hi,

I have an XML input with encoding problems.
I use the default KeyValue element ;

Because of the encoding problem, the function "keyValue" is strucked in the loop ; the reader object never reach the node "Reader::END_ELEMENT".

I've tried both :

    do {

        if ($reader->nodeType === Reader::ELEMENT) {
            if ($namespace !== null && $reader->namespaceURI === $namespace) {
                $values[$reader->localName] = $reader->parseCurrentElement()['value'];
            } else {
                $clark = $reader->getClark();
                $values[$clark] = $reader->parseCurrentElement()['value'];

                if (false === $values[$clark]) {
                    $reader->next();
                }
            }
        } else {
            $reader->read();
        }
    } while ($reader->nodeType !== Reader::END_ELEMENT);
    do {

        if ($reader->nodeType === Reader::ELEMENT) {
            if ($namespace !== null && $reader->namespaceURI === $namespace) {
                $values[$reader->localName] = $reader->parseCurrentElement()['value'];
            } else {
                $clark = $reader->getClark();
                $values[$clark] = $reader->parseCurrentElement()['value'];

                if (false === $values[$clark]) {
                    $reader->read();
                }
            }
        } else {
            $reader->read();
        }
    } while ($reader->nodeType !== Reader::END_ELEMENT);

It doesn't work, reader styled stucked on the same element =/

Any idea/way to skip element and move further ?

Thanks,

Regards

@evert evert added the question label Feb 11, 2016
@evert
Copy link
Member

evert commented Feb 11, 2016

This is not going to work well, because by the time you call read or next, you might actually be deeper inside the xml document. Calling next will also traverse beyond END_ELEMENT which is why you are never reaching it.

Could you share the xml snippet you are trying to parse, and what PHP data structure you wish to get? I might be able to rewrite it in a way where it's a lot clearer. Ideally you should never really have to do manual traversal like this, unless it's a special circumstance.

@Gounlaf
Copy link
Author

Gounlaf commented Feb 11, 2016

Could you share the xml snippet you are trying to parse, and what PHP data structure you wish to get?

I will try to share you only the peace of xml that is wrong.

I might be able to rewrite it in a way where it's a lot clearer. Ideally you should never really have to do manual traversal like this, unless it's a special circumstance.

I use your reader like in the documentation. I just go deeper in the code and found this loop ; i tried to "debug it", tyring to "force" the reader to find the next element. I don't use this loop myself ^^

When my element is parsed, it go an error, but you silence it (https://github.com/fruux/sabre-xml/blob/master/lib/Reader.php#L145);
I var_dumped the content : LibXML Error Input is not proper UTF-8, indicate encoding ! (don't have the complete message right now)
And so, the reader is still looping on the same element, and always go the same error.

Anyway, I will paste you a complete example =)

@evert
Copy link
Member

evert commented Feb 11, 2016

The error is indeed silenced there, but the error does get stored and we actually take it out here again:

https://github.com/fruux/sabre-xml/blob/master/lib/Reader.php#L151

There might be a bug in the error handling code though. The error you shared definitely seems to indicate so. So it would then be extra awesome to get a snippet of your xml that reproduces this, so we can make the parser more robust =)

@evert
Copy link
Member

evert commented Feb 12, 2016

Hi @Gounlaf , I received your data via email. Thanks very much for that.

I tried to reproduce the issue with the following script:

<?php

include 'vendor/autoload.php';

$reader = new Sabre\Xml\Reader();
$reader->open('xml_bug_wrong_encoding.xml');

var_dump($reader->parse());

This causes an exception to be triggered immediately:

Sabre\Xml\LibXMLException: Input is not proper UTF-8, indicate encoding !
Bytes: 0x1A 0x29 0x20 0x61
 on line 24, column 25 in /Users/evert/code/sabre/xml/lib/Reader.php on line 155

Call Stack:
    0.0002     228840   1. {main}() /Users/evert/code/sabre/xml/issue87.php:0
    0.0571     587552   2. Sabre\Xml\Reader->parse() /Users/evert/code/sabre/xml/issue87.php:8
    0.0571     588200   3. Sabre\Xml\Reader->parseCurrentElement() /Users/evert/code/sabre/xml/lib/Reader.php:69
    0.0572     589120   4. call_user_func:{/Users/evert/code/sabre/xml/lib/Reader.php:231}() /Users/evert/code/sabre/xml/lib/Reader.php:231
    0.1170     604496   5. Sabre\Xml\Element\Base::xmlDeserialize() /Users/evert/code/sabre/xml/lib/Reader.php:231
    0.1170     604912   6. Sabre\Xml\Reader->parseInnerTree() /Users/evert/code/sabre/xml/lib/Element/Base.php:86
    0.1171     605328   7. Sabre\Xml\Reader->parseCurrentElement() /Users/evert/code/sabre/xml/lib/Reader.php:161
    0.1171     606096   8. call_user_func:{/Users/evert/code/sabre/xml/lib/Reader.php:231}() /Users/evert/code/sabre/xml/lib/Reader.php:231
    0.1171     606128   9. Sabre\Xml\Element\Base::xmlDeserialize() /Users/evert/code/sabre/xml/lib/Reader.php:231
    0.1171     606128  10. Sabre\Xml\Reader->parseInnerTree() /Users/evert/code/sabre/xml/lib/Element/Base.php:86

I would consider this the expected behavior. Are you also seeing this when you run my test script or do you get the loop?

@evert
Copy link
Member

evert commented Feb 12, 2016

The problem with your file BTW is that ASCII character 26 (1A) appears in your source. This encodes CTRL-Z and should normally never appear in a text file. But still it shouldn't go in a never-ending loop

@Gounlaf
Copy link
Author

Gounlaf commented Feb 15, 2016

Hi @evert,
If I use your example, yes the exception is thrown.

But if i use the example given in "http://sabre.io/xml/reading/ => Using the XmlDeserializable interface", there is the loop. I sent you the "looping example" by email.

@evert
Copy link
Member

evert commented Feb 15, 2016

Would it be possible for you to email me a script that always reproduces the error. It's a bit hard for me to figure out where this is going wrong.

The example you mention on sabre.io should be fine because it just uses parseInnerTree, but I could be wrong.

So ideally if you could send me a single php file that has a minimal sample that reproduces the problem for you, I would be very grateful!

@Gounlaf
Copy link
Author

Gounlaf commented Feb 16, 2016

Hi @evert ,

I did yesterday, with the data I already sent to you, and the "copy/past" of the example, adapted for the data. Anyway, you can find it bellow :)

<?php
include 'vendor/autoload.php';

class Offers implements Sabre\Xml\XmlDeserializable
{
    public $data = array();

    static function xmlDeserialize(Sabre\Xml\Reader $reader) {

        $offers = new self();
        $children = $reader->parseInnerTree();
        foreach($children as $child) {
            if ($child['value'] instanceof Offer) {
                $offers->data[] = $child['value'];
            }
        }
        return $offers;

    }
}

class Offer implements Sabre\Xml\XmlDeserializable {
    static function xmlDeserialize(Sabre\Xml\Reader $reader) {

        $offer = new self();

        // Borrowing a parser from the KeyValue class.
        $keyValue = Sabre\Xml\Element\KeyValue::xmlDeserialize($reader);

//        if (isset($keyValue['{http://example.org/books}title'])) {
//            $book->title = $keyValue['{http://example.org/books}title'];
//        }
//        if (isset($keyValue['{http://example.org/books}author'])) {
//            $book->author = $keyValue['{http://example.org/books}author'];
//        }

        return $offer;

    }
}

$reader = new Sabre\Xml\Reader();
$reader->elementMap = [
    '{}offres' => 'Offers',
    '{}offre' => 'Offer',
];

$reader->open('xml_bug_wrong_encoding.xml');

var_dump($reader->parse());

@evert
Copy link
Member

evert commented Oct 9, 2016

I'm closing this ticket because it's been a while since the last comment.

If this is indeed still an issue, feel free to comment here so we can continue discussing.

This is just a general cleanup. And unfortunately this ticket never got a fully working sample. Might have gotten it via email back in february, but I no longer have it. If you care about this ticket still, please submit the info again (preferably just on github).

@evert evert closed this as completed Oct 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants