Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem decoding certain unicode representations #84

Closed
rambii opened this issue Jan 28, 2021 · 1 comment
Closed

Problem decoding certain unicode representations #84

rambii opened this issue Jan 28, 2021 · 1 comment

Comments

@rambii
Copy link

rambii commented Jan 28, 2021

I have some user provided content in my application. Sometimes user pass content that is unexpected.

Run the following code to recreate the error.

    $content = "lorem:ipsum";
    $textContent = \Soundasleep\Html2Text::convert($content);

Note: the character is not visible here but can be seen when copied into an editor. The character is after the colon ':'.

In the database it is stored as a Tab (https://www.fileformat.info/info/unicode/char/000b/index.htm), but when I output it and convert it it fails with the error: DOMDocument::loadHTML(): Invalid char in CDATA 0xB in Entity

Is there a list of unicode characters which are not supported in this way?
Is there another way to encode them in the right way?

@edgrosvenor
Copy link
Collaborator

My sense is that this issue is beyond the scope of this package. I can't know whether the right answer is to simply remove these characters or replace them with something else. That would be a decision better left to the application developer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants