Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML special chars processing #89

Closed
brutto opened this issue Oct 24, 2018 · 3 comments
Closed

HTML special chars processing #89

brutto opened this issue Oct 24, 2018 · 3 comments
Assignees

Comments

@brutto
Copy link

brutto commented Oct 24, 2018

Thanks for this powerfull library. Found strange behaviour of this typo

Settings

        $typoSettings = new Settings(false);

        $typoSettings->set_space_collapse();
        $typoSettings->set_smart_marks();
        $typoSettings->set_single_character_word_spacing();
        $typoSettings->set_unit_spacing();
        $typoSettings->set_numbered_abbreviation_spacing();

        $typoSettings->set_smart_quotes();
        $typoSettings->set_smart_quotes_primary();
        $typoSettings->set_smart_quotes_secondary();

Predictable (Correct)

input: He is robot, am i too?

process input (htmlentity()): He is robot, am i too?

typo output: He is robot, am i too?

Output string will we as input with   as a text.

Unpredictable (Incorrect)

input: He is a robot, am i too?

process input: He is a robot, am i too?

typo: He is a robot, am i too?

Output string will we without   as a text, all become spaces.

PS: the same with & <=> &amp;

@mundschenk-at
Copy link
Owner

I'm not completely sure I understand the issue correctly. Just to clarify, the input string is not escaped by PHP-Typography, so &nbsp; is processed as a NO-BREAK SPACE literal. The $html input parameter of the PHP_Typography::process method is assumed to be a valid HTML fragment (i.e. without the document type, and surrounding <html> and <body> tags, but otherwise well-formed).

@brutto
Copy link
Author

brutto commented Oct 24, 2018

@mundschenk-at

This is working code about second variant:

$string = 'He is a robot, am i&nbsp;too?'; // <-- input
$string = htmlspecialchars($string); // <-- process innput
$string = PHP_Typography::process($string); // <-- typo
var_dump($string);

Expected $string will be:
He is a&nbsp;robot, am i&amp;nbsp;too?,

But in real we have:
He is a&nbsp;robot, am i&nbsp;too?.

Second &amp;nbsp; become &nbsp;.

Im not dig deeper, but first impression this behaviour not correct. Am i right?

@mundschenk-at
Copy link
Owner

OK, I see what you mean. Yes, the result should be He is a&nbsp;robot, am i&amp;nbsp;too? in this case. I'll investigate the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants