-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added multibyte string functions #161
Added multibyte string functions #161
Conversation
if (is_null($this->encoding) || !function_exists('mb_strlen')) { | ||
$actualLength = strlen($value); | ||
} else { | ||
$actualLength = mb_strlen($value, $this->encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use just $actualLength = function_exists('mb_strlen') ? mb_strlen($value) : strlen($value)
?
I want to just use ->length(10)
with default encoding, I don't want to pass 'utf8'
everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I just tried to be backwards compatible. Maybe someone uses the rule and knows about that non-multibyte behaviour. Then this person maybe decided to count the number of special chars like äöüéàè and to remove that number from the measured wrong length (because in utf8, these would get treated as two chars respecively).
But of course, it would be better to handle this as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adriansuter Now it is clear. I agree that your implementation is technically backward compatible and my is not.
This is an ambiguous question. I think that counting bytes (not characters) is a bug, but other may think vice versa because the documentation is not precise about it. Hope that this question will be clarified in the next major release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too. I think PHP should change the default behaviour of strlen()
. The expected result for most people I suppose, is the number of characters. In PHP (in case PHP is used in web development) one rarely has to count the byte length of a string. And if so, there is always unpack()
.
We will see, if PHP would implement that (I doubt it :-)).
@rick-nu Just wondering if this PR was not merged due to the documentation updates missing? If OpenEMR picked this PR up and added the documentation, could we get this merged in? Hope all is well! |
What?
Adds the possibilty to use multibyte string functions in the rules "length" and "lengthBetween".
Checklist
Linked issue
#160
Notes
LengthTest
only. Probably it would be better to make a new classLengthTestMultibyte
which actually tests the multibyte cases. Because right now, I have change theLengthTest
such that it always uses the multibyte functions. But in my opinion this unit test should only contain the original rule without encoding.