Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strings: normalize strings into UTF-8 NFC #150

Merged
merged 4 commits into from
Oct 24, 2017

Conversation

jkuchar
Copy link
Contributor

@jkuchar jkuchar commented Oct 14, 2017

  • bug fix? no
  • new feature? yes
  • BC break? yes

implementation of #149

Before merge:

composer.json Outdated
@@ -15,7 +15,8 @@
}
],
"require": {
"php": ">=7.0"
"php": ">=7.0",
"ext-intl": "*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably remain only as suggested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should Strings class do when there is no intl extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hrach could you please elaborate on it more then just thumbs down? I think I know some of your arguments, but want to be sure what is on your mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I consider the Strings class as the main functionality of the nette/utils package and therefore it's dependencies should be required. Nothing more, nothing less.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hrach That's a valid opinion, but it's outside of scope of this PR. If you want to move currently suggested stuff to require you need another PR and do it consistently for all stuff in suggest.

Copy link
Contributor Author

@jkuchar jkuchar Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hrach Agree on that. Strings is the main reason why I use nette\utils.

@JanTvrdik That makes sense and exactly that was my next question. Will do another MR with moving all necessary dependencies into requires... There will be probably more discussion needed as we will need list of core functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in bb3a69c

Copy link
Contributor Author

@jkuchar jkuchar Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #151

@@ -99,10 +99,13 @@ public static function substring(string $s, int $start, int $length = null): str


/**
* Removes special controls characters and normalizes line endings and spaces in UTF-8 string.
* Removes special controls characters and normalizes line endings and converts to NFC form and spaces in UTF-8 string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence now makes no sense. How about sth like

Removes special controls characters and normalizes line endings, spaces and normal form to NFC in UTF-8 string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 has been writing it in a car

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in ec16e40

*/
public static function normalize(string $s): string
{
// normalize string into utf8 NFC form
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • „UTF-8“ instead of „utf8“, or maybe even better to omit altogether
  • the word „string“ is unnecessary

How about sth like „convert to NFC normal form “?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in b6dd526

@jkuchar
Copy link
Contributor Author

jkuchar commented Oct 14, 2017

@JanTvrdik Thanks for comments 😃

… user do not need to use Strings class at all)
@jkuchar
Copy link
Contributor Author

jkuchar commented Oct 16, 2017

@JanTvrdik @hrach thanks for nice argument-based discussion!

I have pushed fixes into this pull request, are there any more comments on this?

@@ -265,6 +268,9 @@ public static function capitalize(string $s): string
*/
public static function compare(string $left, string $right, int $len = null): bool
{
$left = \Normalizer::normalize($left, \Normalizer::FORM_D); // form NFD is faster
$right = \Normalizer::normalize($right, \Normalizer::FORM_D); // form NFD is faster
Copy link
Contributor Author

@jkuchar jkuchar Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to call Normalizer directly? Or would we prefer to call self::normalize(). This will also normalizer line ending and some other stuff. I'm not sure on which level should be normalization done here in compare.

Current behaviour (proposed by @dg) causes inconsistencies as it is possible to get into situation where:

Strings::compare($a, $b); // FALSE
Strings::normalize($a) === Strings::normalize($b); // TRUE

Is this expected behaviour? It seems as confusing behaviour to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to call it after substring() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Substring operates on code units, normalize may change length in code units (imho).

@jkuchar jkuchar changed the title Strings: normalizes strings into UTF-8 NFC Strings: normalize strings into UTF-8 NFC Oct 22, 2017
@jkuchar
Copy link
Contributor Author

jkuchar commented Oct 22, 2017

The last thing to sort out is the following inconsistency. Current behaviour (proposed by @dg) causes inconsistencies as it is possible to get into situation where:

Strings::compare($a, $b); // FALSE
Strings::normalize($a) === Strings::normalize($b); // TRUE

It seems as confusing behaviour to me. What are your opinions on this?

@JanTvrdik
Copy link
Contributor

Current behaviour (proposed by @dg) causes inconsistencies

This seems OK to me. It would be quite confusing if Strings::compare("\r\n", "\t \x00\n") returned true. The compare function is just case-insensitive variant of ===.

@jkuchar
Copy link
Contributor Author

jkuchar commented Oct 23, 2017

👍 Ok, it makes sense. So, now it looks like ready to merge,

@dg
Copy link
Member

dg commented Oct 24, 2017

Thank you

@dg dg merged commit 6b858f6 into nette:master Oct 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants