Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fast text conversion filters to implement mb_convert_variables #9966

Closed
wants to merge 1 commit into from

Conversation

alexdowad
Copy link
Contributor

}
zend_string *ret = php_mb_convert_encoding_ex(Z_STRVAL_P(var), Z_STRLEN_P(var), to_encoding, from_encoding);
zval_ptr_dtor(orig_var);
ZVAL_STR(orig_var, ret);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I would like to ask someone who knows PHP internals well... am I handling the refcounts correctly?

mb_convert_variables replaces strings in an array or object. After converting the old zend_string to the new one, I am using zval_ptr_dtor to drop the reference to the old one, then ZVAL_STR to make the zval point to the new string.

I don't think ZVAL_STR modifies the refcount, and when the new zend_string is returned from php_mb_convert_encoding_ex, it should start with a refcount of 1, so hopefully we are OK here... 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to make sense, but you can always run some tests with Valgrind or ASAN to see if there is any issue. But been staring at this for a bit and it seems to check out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a good idea.

@@ -17,19 +17,19 @@ $sjis = base64_decode('k/qWe4zqg2WDTINYg2eCxYK3gUIwMTIzNIJUglWCVoJXgliBQg==');
// JIS string (BASE64 encoded)
$jis = base64_decode('GyRCRnxLXDhsJUYlLSU5JUgkRyQ5ISMbKEIwMTIzNBskQiM1IzYjNyM4IzkhIxsoQg==');
// EUC-JP string
$euc_jp = '���ܸ�ƥ����ȤǤ���01234������������';
Copy link
Contributor Author

@alexdowad alexdowad Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting to see that GitHub is able to recognize this EUC-JP encoded string and display it correctly here...

On the other side, I've converted it to UTF-8 so that it will (hopefully) appear correctly in most text editors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is preferable to have *.phpt files encoded as UTF-8 whenever possible.

Copy link
Contributor

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks like a nice simplification, too.

@@ -17,19 +17,19 @@ $sjis = base64_decode('k/qWe4zqg2WDTINYg2eCxYK3gUIwMTIzNIJUglWCVoJXgliBQg==');
// JIS string (BASE64 encoded)
$jis = base64_decode('GyRCRnxLXDhsJUYlLSU5JUgkRyQ5ISMbKEIwMTIzNBskQiM1IzYjNyM4IzkhIxsoQg==');
// EUC-JP string
$euc_jp = '���ܸ�ƥ����ȤǤ���01234������������';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is preferable to have *.phpt files encoded as UTF-8 whenever possible.

@alexdowad
Copy link
Contributor Author

This was already merged.

@alexdowad alexdowad closed this Dec 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants