Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added multibyte string functions #161

Closed
wants to merge 1 commit into from
Closed

Added multibyte string functions #161

wants to merge 1 commit into from

Conversation

adriansuter
Copy link
Contributor

@adriansuter adriansuter commented Oct 31, 2016

What?

Adds the possibilty to use multibyte string functions in the rules "length" and "lengthBetween".

Checklist

  • Added unit test for added/fixed code
  • Updated the documentation
  • Scrutinizer code coverage is 100%
  • Scrutinizer code quality is as high as possible

Linked issue

#160

Notes

  • I have not yet updated the documentation.
  • I modified the current unit tests for LengthTest only. Probably it would be better to make a new class LengthTestMultibyte which actually tests the multibyte cases. Because right now, I have change the LengthTest such that it always uses the multibyte functions. But in my opinion this unit test should only contain the original rule without encoding.

if (is_null($this->encoding) || !function_exists('mb_strlen')) {
$actualLength = strlen($value);
} else {
$actualLength = mb_strlen($value, $this->encoding);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to use just $actualLength = function_exists('mb_strlen') ? mb_strlen($value) : strlen($value)?

I want to just use ->length(10) with default encoding, I don't want to pass 'utf8' everywhere.

Copy link
Contributor Author

@adriansuter adriansuter Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I just tried to be backwards compatible. Maybe someone uses the rule and knows about that non-multibyte behaviour. Then this person maybe decided to count the number of special chars like äöüéàè and to remove that number from the measured wrong length (because in utf8, these would get treated as two chars respecively).

But of course, it would be better to handle this as default.

Copy link

@Finesse Finesse Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriansuter Now it is clear. I agree that your implementation is technically backward compatible and my is not.

This is an ambiguous question. I think that counting bytes (not characters) is a bug, but other may think vice versa because the documentation is not precise about it. Hope that this question will be clarified in the next major release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too. I think PHP should change the default behaviour of strlen(). The expected result for most people I suppose, is the number of characters. In PHP (in case PHP is used in web development) one rarely has to count the byte length of a string. And if so, there is always unpack().

We will see, if PHP would implement that (I doubt it :-)).

@adriansuter adriansuter deleted the feature/multibyte-strings branch June 11, 2019 13:22
@adunsulag
Copy link

@rick-nu Just wondering if this PR was not merged due to the documentation updates missing? If OpenEMR picked this PR up and added the documentation, could we get this merged in? Hope all is well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants