- Fixed bug #52981 (Unicode casing table was out-of-date).

  Updated with UnicodeData-6.0.0d7.txt and included the
  source of the generator program with the distribution.
#The replaced tables, generated circa 2002, seem to reflect
#Unicode 3.2. I was unable to generate the same property
#offsets with Unicode 3.2 data, but all the tests I made
#indicate php_unicode_is_prop() is returning the correct
#values. The replaced file merely says it used a "modified
#version" of ucgendat, which is not very helpful. The results
#I got were not significantly different, only slightly higher
#offsets at two properties, which were carried over to the
#subsequent properties.
#I was, however, able to replicate precisely the casing table.
#The extent of the "modifications" besides omitting most of
#the tables, a slightly different layout and the casing table
#offsets having been multiplied by 3 is unclear.
#The test suite showed no regressions; however, it's very poor
#in testing the modified portion of the extension.
1 parent f1d905a commit 42dae97fd49f8d5f5d45c6254794f41fc2b32c88 @cataphract cataphract committed Oct 5, 2010
Showing with 6,277 additions and 2,735 deletions.
  1. +23 −0 ext/mbstring/tests/bug52981.phpt
  2. +1,985 −0 ext/mbstring/ucgendat.c
  3. +4,269 −2,735 ext/mbstring/unicode_data.h
@@ -0,0 +1,23 @@
+Bug #52981 (Unicode properties are outdated (from Unicode 3.2))
+<?php extension_loaded('mbstring') or die('skip mbstring not available'); ?>
+function test($str)
+ $upper = mb_strtoupper($str, 'UTF-8');
+ $len = strlen($upper);
+ for ($i = 0; $i < $len; ++$i) echo dechex(ord($upper[$i])) . ' ';
+ echo "\n";
+// OK
+test("\xF0\x90\x90\xB8");// U+10438 DESERET SMALL LETTER H (added in 3.1.0, March 2001)
+// not OK
+test("\xE2\xB0\xB0"); // U+2C30 GLAGOLITIC SMALL LETTER AZU (added in 4.1.0, March 2005)
+test("\xD4\xA5"); // U+0525 CYRILLIC SMALL LETTER PE WITH DESCENDER (added in 5.2.0, October 2009)
+f0 90 90 90
+e2 b0 80
+d4 a4
