Skip to content
This repository has been archived by the owner on Sep 25, 2018. It is now read-only.

Double encoding of entities. #6

Open
ajgarlag opened this issue Apr 26, 2017 · 3 comments
Open

Double encoding of entities. #6

ajgarlag opened this issue Apr 26, 2017 · 3 comments
Labels

Comments

@ajgarlag
Copy link

ajgarlag commented Apr 26, 2017

When an string with HTML entities like   is loaded, the returned document has these ampersands encoded again to   breaking the original HTML.

$payload = <<< 'HTML'
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>
      Hello&nbsp;world!
    </title>
  </head>
  <body>
    <h1>
      Hello&nbsp;world!
    </h1>
  </body>
</html>
HTML;
$doc = \Layershifter\Gumbo\Parser::load($payload);
var_dump($doc->saveHTML());

Output:

string(162) "<html lang="en"><head><meta charset="utf-8"><title>                                                                                                                                                                                   
      Hello&amp;nbsp;world!                                                                                                                                                                                                                        
    </title></head><body><h1>                                                                                                                                                                                                                      
      Hello&amp;nbsp;world!                                                                                                                                                                                                                        
    </h1></body></html>                                                                                                                                                                                                                            
" 
@layershifter
Copy link
Owner

@ajgarlag thanks for report, it's a one of known issues. It happens because libxml that saves document performs entity encode.

I have plans to solve this problem, however I don't that it will be in near future. As temporary solution you can perform entity decode after save.

@ajgarlag
Copy link
Author

@layershifter Thank you for your work!

In that case, a Known Issues section could be added to the README file.

@layershifter
Copy link
Owner

Yes, you're right, I will add it today

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants