Update Unicode Version to 9.0.0#27822
Merged
Merged
Conversation
0203f1f to
21081f0
Compare
jeremy
reviewed
Jan 27, 2017
Member
There was a problem hiding this comment.
Do we have test coverage demonstrating these grapheme cluster boundary rules?
Contributor
Author
There was a problem hiding this comment.
It seems that http://www.unicode.org/Public/9.0.0/ucd/auxiliary/GraphemeBreakTest.txt covers all lines of unpack_graphemes:
Following diff is how I got the coverage:
diff --git a/Gemfile b/Gemfile
index 2a42a4aea0..2aaf64360a 100644
--- a/Gemfile
+++ b/Gemfile
@@ -151,3 +151,5 @@ end
gem "ibm_db" if ENV["IBM_DB"]
gem "tzinfo-data", platforms: [:mingw, :mswin, :x64_mingw, :jruby]
gem "wdm", ">= 0.1.0", platforms: [:mingw, :mswin, :x64_mingw, :mswin64]
+
+gem "simplecov"
diff --git a/activesupport/cov.rb b/activesupport/cov.rb
new file mode 100644
index 0000000000..f5911a16c3
--- /dev/null
+++ b/activesupport/cov.rb
@@ -0,0 +1,7 @@
+$LOAD_PATH << "test"
+
+require "simplecov"
+
+SimpleCov.start
+
+load "test/multibyte_grapheme_break_conformance_test.rb"$ cd activesupport
$ bundle exec ruby cov.rb
Member
There was a problem hiding this comment.
Worth noting how this affects grapheme cluster boundaries.
Notably, this will change string.mb_chars.grapheme_length and string.mb_chars.reverse—perhaps fixing them to work with clusters like 👩👩👧👦!
Contributor
Author
There was a problem hiding this comment.
I added some notes 😄
9.0.0 was released on June 21, 2016 http://blog.unicode.org/2016/06/announcing-unicode-standard-version-90.html http://www.unicode.org/versions/Unicode9.0.0/ There are some changes about grapheme cluster in Unicode 9.0.0: http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules ------------ I noticed that `unpack_graphemes` returns [Other] when the argument is Other ÷ Prepend (it must be [Other, Prepend]). But in [Unicode 8.0.0's Prepend has no characters](http://www.unicode.org/reports/tr29/tr29-27.html#Prepend) so we don't have to backport following patch: ```diff should_break = + if pos == eoc + true ```
21081f0 to
bdcfdef
Compare
Member
jeremy
approved these changes
Jan 30, 2017
| "👩👩👧👦".mb_chars.reverse # => "👩👩👧👦" | ||
|
|
||
| *Fumiaki MATSUSHIMA* | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
9.0.0 was released on June 21, 2016
http://blog.unicode.org/2016/06/announcing-unicode-standard-version-90.html
http://www.unicode.org/versions/Unicode9.0.0/
There are some changes about grapheme cluster in Unicode 9.0.0:
http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules
Other Information
I noticed that
unpack_graphemeswill return [Other] when the argument is Other ÷ Prepend(it must be [Other, Prepend]).
But in Unicode 8.0.0's Prepend has no characters
so we don't have to backport following patch:
If we support Ruby 2.4 only, we can simply replace
unpack_graphemeswith Ruby'sscan(/\X/)😉(@nurse implemented grapheme extended cluster on 2.4! )
https://bugs.ruby-lang.org/issues/12831
See #26743 for more information about replacing
AS::Multibytewith Ruby's feature.I'm not unicode expert so I'm very happy if you review this PR carefully 🙏