-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix unicode content parsing #415
base: master
Are you sure you want to change the base?
Conversation
fixed exception in parsing symbols like em dash or cyrillic letters. Tests are included
Your test compiles fine without applying the patch. |
Sorry, I forgot explain when it happens. then you have exception on "parse error: failed at `content: '—';". |
The mbstring.func_overload ini setting can't be changed at run-time; also, I've run into systems where the mbstring extension isn't enabled. How about prefixing strlen() and substr() calls with '_', and then adding these helper functions? // mb_orig_strlen() and mb_orig_substr() only exist when mbstring.func_overload & 2 is true and the mbstring extension is loaded
if (function_exists('mb_orig_strlen')) {
function _strlen($s) {
return mb_orig_strlen($s);
}
function _substr() {
return call_user_func_array('mb_orig_substr', func_get_args());
}
} else {
function _strlen($s) {
return strlen($s);
}
function _substr() {
return call_user_func_array('substr', func_get_args());
}
} |
The language itself doesn't use unicode so the parser doesn't need to be aware of it (no unicode keywords or anything). The buffer can be treated as a string of 8 bit chars and still have unicode characters pass through it fine. I'm guessing the problem is that mbstring.func_overload causes the string functions to work differently than the $buffer[x] character accessor which causes a mismatch and the parser to fail? Does changing the ini setting with ini_set just for the parse work? If that doesn't work then another option is to replace all instances of $this->buffer[] with substr($this->buffer, ...). There's probably a noticeable performance hit to this change though. |
Yes, you are right.. Changing compile method on this public function compile($string, $name = null) {
$locale = setlocale(LC_NUMERIC, 0);
setlocale(LC_NUMERIC, "C");
$internal_encoding = NULL;
// only if mbstring installed
if (function_exists('mb_orig_strlen')) {
$internal_encoding = ini_get('mbstring.internal_encoding');
if(!is_null($internal_encoding)){
ini_set('mbstring.internal_encoding', NULL);
}
}
$this->parser = $this->makeParser($name);
$root = $this->parser->parse($string);
$this->env = null;
$this->scope = null;
$this->formatter = $this->newFormatter();
if (!empty($this->registeredVars)) {
$this->injectVariables($this->registeredVars);
}
$this->sourceParser = $this->parser; // used for error messages
$this->compileBlock($root);
ob_start();
$this->formatter->block($this->scope);
$out = ob_get_clean();
setlocale(LC_NUMERIC, $locale);
if(!is_null($internal_encoding)){
ini_set('mbstring.internal_encoding', $internal_encoding);
}
return $out;
} helps and parsing is going well even if |
Parse error appears only if you had set mbstring.internal_encoding UTF-8
Thanks for fix. |
fixed exception in parsing symbols like em dash or cyrillic letters.
Tests are included