-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help with JoliTypo and encoding errors #7
Comments
Hey dude 👍 It looks like an encoding issue. Each time the detected encoding is UTF8, accents This is those two chars:
This is the classic $fixed = $fixer->fix(utf8_encode(utf8_decode($text))); I think your contents are not UTF-8, let me know asap 😉 I have time today to help. |
Fix of encode of decode (aka. "re-encode") did not worked, but fix of decoded string works :
Code : $decoded = utf8_decode($text);
$reEncoded = utf8_encode($decoded);
$fixed = $fixer->fix($decoded);
$fixedWrong = $fixer->fix($reEncoded);
$logs = "<script> console && console.log('-------\\nOriginal : ".$text." (".mb_detect_encoding($text).")\\nDecoded : ".$decoded." (".mb_detect_encoding($decoded).")\\nRe-encoded : ".$reEncoded." (".mb_detect_encoding($reincoded).")\\nFix(Decoded) : ".$fixed." (".mb_detect_encoding($fixed).")\\nFix(Re-encoded) : ".$fixedWrong." (".mb_detect_encoding($fixedWrong).")')</script>"; Is it possible that my content is UTF8 but that Jolitypo works only with ISO-8859-1 ? |
Nope, JoliTypo force UTF-8 at all stages. var_dump("Mentions Légales"); // string(17) "Mentions Légales"
var_dump(utf8_decode("Mentions Légales")); // string(16) "Mentions L�gales"
var_dump(utf8_encode("Mentions Légales")); // string(19) "Mentions Légales" The first string is already UTF-8, then I decode her but then as I'm displaying it in a UTF-8 console, it fail to display. In the third example I UTF-8 encode the already UTF-8 string, and I just added a test to JoliTypo to test this: $isoString = mb_convert_encoding("Mentions Légales", "ISO-8859-1", "UTF-8");
$this->assertEquals("Mentions Légales", $fixer->fix(utf8_encode($isoString)));
$this->assertEquals("Mentions Légales", $fixer->fix(utf8_encode(utf8_encode($isoString)))); |
Ok so if I understand well, it means that my content is double-encoded in UTF8. Good to know... |
I really do not understand. For example : // in an UTF8-encoded PHP file
$fixer->fix('λ'); responds λ I must be missing something... |
$fixer = new Fixer(array('Trademark'));
var_dump($fixer->fix('λ')); This return $fixer->fix(utf8_encode('λ')); return what you got: So lets try to understand:
Thx! |
|
FYI, here is my fixer : $fixers = [
"Ellipsis",
"Dimension",
"Dash",
"FrenchQuotes",
"FrenchNoBreakSpace",
"CurlyQuote",
"Hyphen",
"Trademark"];
$fixer = new Fixer($fixers);
$fixer->setLocale('fr_FR'); |
I'm testing this exact snippet (you can download the file here) and it does work as expected:
I have |
FYI, it works correctly on my machine:
Tested on
Tested on |
In the AlwaysData web-based SSH borisschapira@ssh:~/www/wp/wp-content/plugins/typofr$ php test.php
string(14) "λ"
λ
|
Maybe it's something related to the mbstring configuration
|
I have the exact same mbstring configuration. Can you also dump libxml version? (heavily used in the lib via DomDocument).
Also your "dom" section?
Thx! |
|
Look like an old release of libxml2 (april 2008) - let's dig around this. Can you try to run this code on your server? (it's an extract of how JoliTypo use DomDocument): |
Yep, that's it :
|
Can't be sure if the bug is fixed in 2.7.0.
I've found a workaround on http://php.net/manual/en/domdocument.loadhtml.php. Does it work with your other tests cases ? |
This fix is interesting and it does not break any of the tests - I will look more closely and push a new version of JoliTypo 👍 Thx! |
Let's hope it doesn't ! |
Can you try the new branch on your server?
|
Test is ok now, so I've tried to apply it to all my website.
Seems that a test on the non-emptiness of the string is needed. |
This error is now fixed (d5be8be),
|
Everything's fine, λ is managed and no more need for an UTF8 decode before fix. |
Thanks for yours ! |
Hi, I've been trying to use JoliTypo for personnal use on http://borisschapira.com/ but it provokes encoding errors for accented characters. Here is an example of what I give to JoliTypo fixer (with encoding determined via mb_detect_encoding) and what JoliTypo responds :
Here is my (pretty simple) code :
``` php`
function typofr($text)
{
static $fixer;
if (!isset($fixer)) {
$fixer = new Fixer(array(
'Trademark'));
$fixer->setLocale('fr_FR');
}
$fixed = $fixer->fix($text);
return $text."<script>console && console.log('-------');console && console.log('".$text." (".mb_detect_encoding($text).")'); console && console.log('".$fixed." (".mb_detect_encoding($fixed).")')</script>";
}
The text was updated successfully, but these errors were encountered: