New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Unicode Version to 9.0.0 #27822

Merged
merged 1 commit into from Jan 30, 2017

Conversation

Projects
None yet
5 participants
@mtsmfm
Contributor

mtsmfm commented Jan 27, 2017

Summary

9.0.0 was released on June 21, 2016

http://blog.unicode.org/2016/06/announcing-unicode-standard-version-90.html

http://www.unicode.org/versions/Unicode9.0.0/

There are some changes about grapheme cluster in Unicode 9.0.0:

http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules

Other Information

I noticed that unpack_graphemes will return [Other] when the argument is Other 梅 Prepend
(it must be [Other, Prepend]).
But in Unicode 8.0.0's Prepend has no characters
so we don't have to backport following patch:

should_break =
+ if pos == eoc
+   true
# GB3. CR X LF

If we support Ruby 2.4 only, we can simply replace unpack_graphemes with Ruby's scan(/\X/) 馃槈
(@nurse implemented grapheme extended cluster on 2.4! )

https://bugs.ruby-lang.org/issues/12831

def unpack_graphemes(string)
  string.scan(/\X/).map(&:codepoints)
end

See #26743 for more information about replacing AS::Multibyte with Ruby's feature.


I'm not unicode expert so I'm very happy if you review this PR carefully 馃檹

# GB13. [^RI] (RI RI)* RI X RI
elsif codepoints[marker..pos].all? { |c| database.boundary[:regional_indicator] === c } && codepoints[marker..pos].count { |c| database.boundary[:regional_indicator] === c }.even?
false
# GB999. Any 梅 Any

This comment has been minimized.

@jeremy

jeremy Jan 27, 2017

Member

Do we have test coverage demonstrating these grapheme cluster boundary rules?

@jeremy

jeremy Jan 27, 2017

Member

Do we have test coverage demonstrating these grapheme cluster boundary rules?

This comment has been minimized.

@mtsmfm

mtsmfm Jan 28, 2017

Contributor

It seems that http://www.unicode.org/Public/9.0.0/ucd/auxiliary/GraphemeBreakTest.txt covers all lines of unpack_graphemes:

https://s3-ap-northeast-1.amazonaws.com/mtsmfm/rails/27822/coverage/index.html#40076a266c43e5b028d6d14e50d80fabb410f3b5

Following diff is how I got the coverage:

diff --git a/Gemfile b/Gemfile
index 2a42a4aea0..2aaf64360a 100644
--- a/Gemfile
+++ b/Gemfile
@@ -151,3 +151,5 @@ end
 gem "ibm_db" if ENV["IBM_DB"]
 gem "tzinfo-data", platforms: [:mingw, :mswin, :x64_mingw, :jruby]
 gem "wdm", ">= 0.1.0", platforms: [:mingw, :mswin, :x64_mingw, :mswin64]
+
+gem "simplecov"
diff --git a/activesupport/cov.rb b/activesupport/cov.rb
new file mode 100644
index 0000000000..f5911a16c3
--- /dev/null
+++ b/activesupport/cov.rb
@@ -0,0 +1,7 @@
+$LOAD_PATH << "test"
+
+require "simplecov"
+
+SimpleCov.start
+
+load "test/multibyte_grapheme_break_conformance_test.rb"
$ cd activesupport
$ bundle exec ruby cov.rb
@mtsmfm

mtsmfm Jan 28, 2017

Contributor

It seems that http://www.unicode.org/Public/9.0.0/ucd/auxiliary/GraphemeBreakTest.txt covers all lines of unpack_graphemes:

https://s3-ap-northeast-1.amazonaws.com/mtsmfm/rails/27822/coverage/index.html#40076a266c43e5b028d6d14e50d80fabb410f3b5

Following diff is how I got the coverage:

diff --git a/Gemfile b/Gemfile
index 2a42a4aea0..2aaf64360a 100644
--- a/Gemfile
+++ b/Gemfile
@@ -151,3 +151,5 @@ end
 gem "ibm_db" if ENV["IBM_DB"]
 gem "tzinfo-data", platforms: [:mingw, :mswin, :x64_mingw, :jruby]
 gem "wdm", ">= 0.1.0", platforms: [:mingw, :mswin, :x64_mingw, :mswin64]
+
+gem "simplecov"
diff --git a/activesupport/cov.rb b/activesupport/cov.rb
new file mode 100644
index 0000000000..f5911a16c3
--- /dev/null
+++ b/activesupport/cov.rb
@@ -0,0 +1,7 @@
+$LOAD_PATH << "test"
+
+require "simplecov"
+
+SimpleCov.start
+
+load "test/multibyte_grapheme_break_conformance_test.rb"
$ cd activesupport
$ bundle exec ruby cov.rb
unless File.exist?(filename)
$stderr.puts "Downloading #{url.split('/').last}"
FileUtils.mkdir_p(File.dirname(filename))

This comment has been minimized.

@jeremy

jeremy Jan 27, 2017

Member

馃憤

@jeremy

jeremy Jan 27, 2017

Member

馃憤

Show outdated Hide outdated activesupport/CHANGELOG.md

@jeremy jeremy removed the needs feedback label Jan 27, 2017

@jeremy jeremy added this to the 5.1.0 milestone Jan 27, 2017

Update Unicode Version to 9.0.0
9.0.0 was released on June 21, 2016

http://blog.unicode.org/2016/06/announcing-unicode-standard-version-90.html

http://www.unicode.org/versions/Unicode9.0.0/

There are some changes about grapheme cluster in Unicode 9.0.0:

http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules

------------

I noticed that `unpack_graphemes` returns [Other] when the argument is Other 梅 Prepend
(it must be [Other, Prepend]).
But in [Unicode 8.0.0's Prepend has no characters](http://www.unicode.org/reports/tr29/tr29-27.html#Prepend)
so we don't have to backport following patch:

```diff
should_break =
+ if pos == eoc
+   true
```
@amatsuda

This comment has been minimized.

Show comment
Hide comment
@amatsuda

amatsuda Jan 28, 2017

Member

@jeremy FYI I asked @mtsmfm to work on this because I wanted to include this in 5.1, and to me this looks all right.

Member

amatsuda commented Jan 28, 2017

@jeremy FYI I asked @mtsmfm to work on this because I wanted to include this in 5.1, and to me this looks all right.

@jeremy

jeremy approved these changes Jan 30, 2017

"馃懇鈥嶐煈┾嶐煈р嶐煈".mb_chars.reverse # => "馃懇鈥嶐煈┾嶐煈р嶐煈"
*Fumiaki MATSUSHIMA*

This comment has been minimized.

@jeremy

jeremy Jan 30, 2017

Member

鉂わ笍

@jeremy

jeremy Jan 30, 2017

Member

鉂わ笍

@jeremy jeremy merged commit 1fc9955 into rails:master Jan 30, 2017

2 checks passed

codeclimate no new or fixed issues
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@mtsmfm mtsmfm deleted the mtsmfm:bump-unicode-version branch Feb 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment