Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions #59

Open
superpoincare opened this issue Jan 2, 2024 · 0 comments
Open

Suggestions #59

superpoincare opened this issue Jan 2, 2024 · 0 comments

Comments

@superpoincare
Copy link

superpoincare commented Jan 2, 2024

Nice work. I have some suggestions.

$html = rtrim($html, "\n");

I think this unintentionally trims newlines in the end where it isn't needed. I think your intent is to remove newline somehow added by the code before but it ends up cutting newlines elsewhere.

Another observation on this part:

// Preserve html entities
$source = preg_replace('/&([a-zA-Z]*);/', 'html5-dom-document-internal-entity1-$1-end', $source);
$source = preg_replace('/&#([0-9]*);/', 'html5-dom-document-internal-entity2-$1-end', $source);

There is also an &#x type of entities. I am not sure of the following but you could check if the entity is really a genuine one or fake by doing something like

html_entity_decode( $matches[0], ENT_QUOTES, 'UTF-8' ) === $matches[0] )

with preg_replace_callback Maybe not needed.

You could also add some random string every time in the "internal" string for security purposes, maybe I am saying something silly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant