-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strings: normalize strings into UTF-8 NFC #150
Conversation
05be0ad
to
39a924d
Compare
39a924d
to
1231f4c
Compare
composer.json
Outdated
@@ -15,7 +15,8 @@ | |||
} | |||
], | |||
"require": { | |||
"php": ">=7.0" | |||
"php": ">=7.0", | |||
"ext-intl": "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably remain only as suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should Strings
class do when there is no intl extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hrach could you please elaborate on it more then just thumbs down? I think I know some of your arguments, but want to be sure what is on your mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I consider the Strings class as the main functionality of the nette/utils package and therefore it's dependencies should be required. Nothing more, nothing less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hrach That's a valid opinion, but it's outside of scope of this PR. If you want to move currently suggested stuff to require
you need another PR and do it consistently for all stuff in suggest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hrach Agree on that. Strings is the main reason why I use nette\utils
.
@JanTvrdik That makes sense and exactly that was my next question. Will do another MR with moving all necessary dependencies into requires... There will be probably more discussion needed as we will need list of core functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in bb3a69c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened #151
src/Utils/Strings.php
Outdated
@@ -99,10 +99,13 @@ public static function substring(string $s, int $start, int $length = null): str | |||
|
|||
|
|||
/** | |||
* Removes special controls characters and normalizes line endings and spaces in UTF-8 string. | |||
* Removes special controls characters and normalizes line endings and converts to NFC form and spaces in UTF-8 string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence now makes no sense. How about sth like
Removes special controls characters and normalizes line endings, spaces and normal form to NFC in UTF-8 string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 has been writing it in a car
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in ec16e40
src/Utils/Strings.php
Outdated
*/ | ||
public static function normalize(string $s): string | ||
{ | ||
// normalize string into utf8 NFC form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- „UTF-8“ instead of „utf8“, or maybe even better to omit altogether
- the word „string“ is unnecessary
How about sth like „convert to NFC normal form “?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in b6dd526
@JanTvrdik Thanks for comments 😃 |
f6d2295
to
b6dd526
Compare
… user do not need to use Strings class at all)
11c1905
to
bb3a69c
Compare
@JanTvrdik @hrach thanks for nice argument-based discussion! I have pushed fixes into this pull request, are there any more comments on this? |
@@ -265,6 +268,9 @@ public static function capitalize(string $s): string | |||
*/ | |||
public static function compare(string $left, string $right, int $len = null): bool | |||
{ | |||
$left = \Normalizer::normalize($left, \Normalizer::FORM_D); // form NFD is faster | |||
$right = \Normalizer::normalize($right, \Normalizer::FORM_D); // form NFD is faster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to call Normalizer directly? Or would we prefer to call self::normalize(). This will also normalizer line ending and some other stuff. I'm not sure on which level should be normalization done here in compare.
Current behaviour (proposed by @dg) causes inconsistencies as it is possible to get into situation where:
Strings::compare($a, $b); // FALSE
Strings::normalize($a) === Strings::normalize($b); // TRUE
Is this expected behaviour? It seems as confusing behaviour to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to call it after substring() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Substring operates on code units, normalize may change length in code units (imho).
The last thing to sort out is the following inconsistency. Current behaviour (proposed by @dg) causes inconsistencies as it is possible to get into situation where: Strings::compare($a, $b); // FALSE
Strings::normalize($a) === Strings::normalize($b); // TRUE It seems as confusing behaviour to me. What are your opinions on this? |
This seems OK to me. It would be quite confusing if |
👍 Ok, it makes sense. So, now it looks like ready to merge, |
Thank you |
implementation of #149
Before merge: