New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use fast text conversion filters to implement mb_convert_variables #9966
Conversation
} | ||
zend_string *ret = php_mb_convert_encoding_ex(Z_STRVAL_P(var), Z_STRLEN_P(var), to_encoding, from_encoding); | ||
zval_ptr_dtor(orig_var); | ||
ZVAL_STR(orig_var, ret); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I would like to ask someone who knows PHP internals well... am I handling the refcounts correctly?
mb_convert_variables
replaces strings in an array or object. After converting the old zend_string
to the new one, I am using zval_ptr_dtor
to drop the reference to the old one, then ZVAL_STR
to make the zval
point to the new string.
I don't think ZVAL_STR
modifies the refcount, and when the new zend_string
is returned from php_mb_convert_encoding_ex
, it should start with a refcount of 1, so hopefully we are OK here... 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to make sense, but you can always run some tests with Valgrind or ASAN to see if there is any issue. But been staring at this for a bit and it seems to check out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a good idea.
@@ -17,19 +17,19 @@ $sjis = base64_decode('k/qWe4zqg2WDTINYg2eCxYK3gUIwMTIzNIJUglWCVoJXgliBQg=='); | |||
// JIS string (BASE64 encoded) | |||
$jis = base64_decode('GyRCRnxLXDhsJUYlLSU5JUgkRyQ5ISMbKEIwMTIzNBskQiM1IzYjNyM4IzkhIxsoQg=='); | |||
// EUC-JP string | |||
$euc_jp = '���ܸ�ƥ����ȤǤ���01234������������'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting to see that GitHub is able to recognize this EUC-JP encoded string and display it correctly here...
On the other side, I've converted it to UTF-8 so that it will (hopefully) appear correctly in most text editors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is preferable to have *.phpt files encoded as UTF-8 whenever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks like a nice simplification, too.
@@ -17,19 +17,19 @@ $sjis = base64_decode('k/qWe4zqg2WDTINYg2eCxYK3gUIwMTIzNIJUglWCVoJXgliBQg=='); | |||
// JIS string (BASE64 encoded) | |||
$jis = base64_decode('GyRCRnxLXDhsJUYlLSU5JUgkRyQ5ISMbKEIwMTIzNBskQiM1IzYjNyM4IzkhIxsoQg=='); | |||
// EUC-JP string | |||
$euc_jp = '���ܸ�ƥ����ȤǤ���01234������������'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is preferable to have *.phpt files encoded as UTF-8 whenever possible.
This was already merged. |
FYA @cmb69 @Girgias @nikic @kamil-tekiela