Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Transformer crashes with wrong error description on invalid UTF-8 sequences #767

sGy1980de opened this Issue Feb 26, 2013 · 3 comments


None yet
3 participants

The Transformer crashes with the wrong error description, if you parse files with invalid UTF-8 sequences. The parsing runs as expected and the structure.xml is written, but with the invalid UTF-8 sequences taken unfiltered from the original code.
Only by trying to parse the XML via DOMDocument, i was able to find the real issue. Please provide the original DOMException message in such cases, as the structure.xml seemed to be valid on the first look.

Exception message from DOMDocument:

Warning: DOMDocument::load(): Input is not proper UTF-8, indicate encoding !
Bytes: 0xFC 0x67 0x65 0x6E in ~/anyProject/docs/api/structure.xml, line: 2960

Console Output phpDoc:
Bildschirmfoto 2013-02-26 um 10 08 08


boenrobot commented Feb 26, 2013

If you even have "invalid UTF-8" sequences to begin with, chances are your code is not encoded as UTF-8 at all.

If that's the case, you can use the "--encoding" option to specify the actual encoding your files are in.


mvriel commented Apr 30, 2013

The error handling can be improved when an invalid document is provided, I have triaged this as a bug for version 2.0 to fix


mvriel commented May 4, 2013

I have resolved this issue in commit a46edce on the develop branch; whenever a libxml error occurs it is provided with the output.

I have tested this by using the following snippet:

php -r 'echo "<?php function \xa0\xa1(){}";' > test.php

The \xa0\xa1 is an invalid utf-8 sequence.

@mvriel mvriel closed this May 4, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment