Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strings: normalize strings into UTF-8 NFC #150

Merged
merged 4 commits into from Oct 24, 2017

Conversation

@jkuchar
Copy link
Contributor

jkuchar commented Oct 14, 2017

  • bug fix? no
  • new feature? yes
  • BC break? yes

implementation of #149

Before merge:

@jkuchar jkuchar force-pushed the grifart:149-normalize-utf8-strings branch from 05be0ad to 39a924d Oct 14, 2017
@jkuchar jkuchar force-pushed the grifart:149-normalize-utf8-strings branch from 39a924d to 1231f4c Oct 14, 2017
@@ -15,7 +15,8 @@
}
],
"require": {
"php": ">=7.0"
"php": ">=7.0",
"ext-intl": "*"

This comment has been minimized.

Copy link
@JanTvrdik

JanTvrdik Oct 14, 2017

Contributor

should probably remain only as suggested

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 14, 2017

Author Contributor

What should Strings class do when there is no intl extension?

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 14, 2017

Author Contributor

@hrach could you please elaborate on it more then just thumbs down? I think I know some of your arguments, but want to be sure what is on your mind.

This comment has been minimized.

Copy link
@hrach

hrach Oct 14, 2017

Contributor

Well, I consider the Strings class as the main functionality of the nette/utils package and therefore it's dependencies should be required. Nothing more, nothing less.

This comment has been minimized.

Copy link
@JanTvrdik

JanTvrdik Oct 14, 2017

Contributor

@hrach That's a valid opinion, but it's outside of scope of this PR. If you want to move currently suggested stuff to require you need another PR and do it consistently for all stuff in suggest.

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 16, 2017

Author Contributor

@hrach Agree on that. Strings is the main reason why I use nette\utils.

@JanTvrdik That makes sense and exactly that was my next question. Will do another MR with moving all necessary dependencies into requires... There will be probably more discussion needed as we will need list of core functionality.

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 16, 2017

Author Contributor

fixed in bb3a69c

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 16, 2017

Author Contributor

opened #151

@@ -99,10 +99,13 @@ public static function substring(string $s, int $start, int $length = null): str


/**
* Removes special controls characters and normalizes line endings and spaces in UTF-8 string.
* Removes special controls characters and normalizes line endings and converts to NFC form and spaces in UTF-8 string.

This comment has been minimized.

Copy link
@JanTvrdik

JanTvrdik Oct 14, 2017

Contributor

The sentence now makes no sense. How about sth like

Removes special controls characters and normalizes line endings, spaces and normal form to NFC in UTF-8 string

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 14, 2017

Author Contributor

👍 has been writing it in a car

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 14, 2017

Author Contributor

fixed in ec16e40

*/
public static function normalize(string $s): string
{
// normalize string into utf8 NFC form

This comment has been minimized.

Copy link
@JanTvrdik

JanTvrdik Oct 14, 2017

Contributor
  • „UTF-8“ instead of „utf8“, or maybe even better to omit altogether
  • the word „string“ is unnecessary

How about sth like „convert to NFC normal form “?

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 14, 2017

Author Contributor

👍

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 14, 2017

Author Contributor

fixed in b6dd526

@jkuchar

This comment has been minimized.

Copy link
Contributor Author

jkuchar commented Oct 14, 2017

@JanTvrdik Thanks for comments 😃

JanTvrdik and others added 2 commits Oct 14, 2017
@jkuchar jkuchar force-pushed the grifart:149-normalize-utf8-strings branch from f6d2295 to b6dd526 Oct 14, 2017
… user do not need to use Strings class at all)
@jkuchar jkuchar force-pushed the grifart:149-normalize-utf8-strings branch from 11c1905 to bb3a69c Oct 16, 2017
@jkuchar

This comment has been minimized.

Copy link
Contributor Author

jkuchar commented Oct 16, 2017

@JanTvrdik @hrach thanks for nice argument-based discussion!

I have pushed fixes into this pull request, are there any more comments on this?

@@ -265,6 +268,9 @@ public static function capitalize(string $s): string
*/
public static function compare(string $left, string $right, int $len = null): bool
{
$left = \Normalizer::normalize($left, \Normalizer::FORM_D); // form NFD is faster
$right = \Normalizer::normalize($right, \Normalizer::FORM_D); // form NFD is faster

This comment has been minimized.

Copy link
@jkuchar

jkuchar Oct 16, 2017

Author Contributor

Does it make sense to call Normalizer directly? Or would we prefer to call self::normalize(). This will also normalizer line ending and some other stuff. I'm not sure on which level should be normalization done here in compare.

Current behaviour (proposed by @dg) causes inconsistencies as it is possible to get into situation where:

Strings::compare($a, $b); // FALSE
Strings::normalize($a) === Strings::normalize($b); // TRUE

Is this expected behaviour? It seems as confusing behaviour to me.

This comment has been minimized.

Copy link
@dg

dg Oct 16, 2017

Member

Is it possible to call it after substring() ?

This comment has been minimized.

Copy link
@JanTvrdik

JanTvrdik Oct 17, 2017

Contributor

No. Substring operates on code units, normalize may change length in code units (imho).

@jkuchar jkuchar mentioned this pull request Oct 16, 2017
1 of 3 tasks complete
@jkuchar jkuchar changed the title Strings: normalizes strings into UTF-8 NFC Strings: normalize strings into UTF-8 NFC Oct 22, 2017
@jkuchar

This comment has been minimized.

Copy link
Contributor Author

jkuchar commented Oct 22, 2017

The last thing to sort out is the following inconsistency. Current behaviour (proposed by @dg) causes inconsistencies as it is possible to get into situation where:

Strings::compare($a, $b); // FALSE
Strings::normalize($a) === Strings::normalize($b); // TRUE

It seems as confusing behaviour to me. What are your opinions on this?

@JanTvrdik

This comment has been minimized.

Copy link
Contributor

JanTvrdik commented Oct 23, 2017

Current behaviour (proposed by @dg) causes inconsistencies

This seems OK to me. It would be quite confusing if Strings::compare("\r\n", "\t \x00\n") returned true. The compare function is just case-insensitive variant of ===.

@jkuchar

This comment has been minimized.

Copy link
Contributor Author

jkuchar commented Oct 23, 2017

👍 Ok, it makes sense. So, now it looks like ready to merge,

@dg

This comment has been minimized.

Copy link
Member

dg commented Oct 24, 2017

Thank you

@dg dg merged commit 6b858f6 into nette:master Oct 24, 2017
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.02%) to 91.992%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.