fix(react-avatar): support initials calculation for GB18030-2022 extension characters#35878
fix(react-avatar): support initials calculation for GB18030-2022 extension characters#35878dmytrokirpa wants to merge 2 commits intomicrosoft:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes @fluentui/react-avatar initials generation to correctly handle supplementary Unicode characters (notably GB18030-2022 extension characters) by avoiding surrogate-pair corruption during cleanup and by extracting initials using code point–aware logic.
Changes:
- Adjusts cleanup and unsupported-script detection regexes to preserve surrogate pairs and allow supplementary CJK characters to produce initials.
- Updates initials extraction logic to be code point–aware (including RTL swapping) and to gate unsupported-script checks on the first code point.
- Adds unit tests covering GB18030-2022 extension characters and mixed strings starting with such characters, plus a beachball change file.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/react-components/react-avatar/library/src/utils/getInitials.ts | Fixes surrogate-pair handling and updates initials/unsupported-script logic to support supplementary characters. |
| packages/react-components/react-avatar/library/src/utils/getInitials.test.ts | Adds regression tests for supplementary CJK initials and mixed-string cases. |
| change/@fluentui-react-avatar-b3b582c1-9dae-4459-9787-66b6167b0458.json | Adds a patch change file describing the fix. |
Comments suppressed due to low confidence (1)
packages/react-components/react-avatar/library/src/utils/getInitials.ts:115
- The unsupported-script check now only tests the first code point of
displayName. This changes behavior for mixed-script names that start with a supported character but contain unsupported scripts later (e.g."John 松田"would now return"J松"instead of falling back to the icon). Please confirm this behavior change is intended; if not, consider applyingUNSUPPORTED_TEXT_REGEXto the derived initials (or to each initial candidate) rather than only the first code point, and add a regression test for a mixed Latin + CJK/Japanese name.
// Check only the first code point against UNSUPPORTED_TEXT_REGEX so that names starting with a supported
// character (e.g. GB18030-2022 extension characters) produce an initial even when the rest of the string
// contains BMP CJK characters that would otherwise trigger the regex.
const firstCodePoint = [...displayName][0] ?? '';
if (
UNSUPPORTED_TEXT_REGEX.test(firstCodePoint) ||
(!options?.allowPhoneInitials && PHONENUMBER_REGEX.test(displayName))
) {
return '';
}
return getInitialsLatin(displayName, isRtl, options?.firstInitialOnly);
packages/react-components/react-avatar/library/src/utils/getInitials.ts
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
📊 Bundle size reportUnchanged fixtures
|
|
@tudorpopams and @micahgodbolt |
As far as I understood, the request was to display the correct initials (characters) for the GB18030-2022 extension. The screenshot doesn’t show them because the font lacks those glyphs. Since Fluent doesn’t include bundled fonts, it’s up to the consumer to address that |
|
Pull request demo site: URL |
| @@ -0,0 +1,7 @@ | |||
| { | |||
There was a problem hiding this comment.
🕵🏾♀️ visual changes to review in the Visual Change Report
vr-tests-react-components/CalendarCompat 4 screenshots
| Image Name | Diff(in Pixels) | Image Type |
|---|---|---|
| vr-tests-react-components/CalendarCompat.multiDayView - High Contrast.default.chromium.png | 1208 | Changed |
| vr-tests-react-components/CalendarCompat.multiDayView - RTL.default.chromium.png | 487 | Changed |
| vr-tests-react-components/CalendarCompat.multiDayView - Dark Mode.default.chromium.png | 1090 | Changed |
| vr-tests-react-components/CalendarCompat.multiDayView.default.chromium_1.png | 488 | Changed |
vr-tests-react-components/Charts-DonutChart 3 screenshots
| Image Name | Diff(in Pixels) | Image Type |
|---|---|---|
| vr-tests-react-components/Charts-DonutChart.Dynamic.default.chromium.png | 5581 | Changed |
| vr-tests-react-components/Charts-DonutChart.Dynamic - Dark Mode.default.chromium.png | 7530 | Changed |
| vr-tests-react-components/Charts-DonutChart.Dynamic - RTL.default.chromium.png | 5570 | Changed |
vr-tests-react-components/Menu Converged - submenuIndicator slotted content 2 screenshots
| Image Name | Diff(in Pixels) | Image Type |
|---|---|---|
| vr-tests-react-components/Menu Converged - submenuIndicator slotted content.default - RTL.submenus open.chromium.png | 599 | Changed |
| vr-tests-react-components/Menu Converged - submenuIndicator slotted content.default.submenus open.chromium.png | 605 | Changed |
vr-tests-react-components/Positioning 2 screenshots
| Image Name | Diff(in Pixels) | Image Type |
|---|---|---|
| vr-tests-react-components/Positioning.Positioning end.chromium.png | 908 | Changed |
| vr-tests-react-components/Positioning.Positioning end.updated 2 times.chromium.png | 792 | Changed |
vr-tests-react-components/Skeleton converged 1 screenshots
| Image Name | Diff(in Pixels) | Image Type |
|---|---|---|
| vr-tests-react-components/Skeleton converged.Opaque Skeleton with rectangle - Dark Mode.default.chromium.png | 21 | Changed |
There were 1 duplicate changes discarded. Check the build logs for more information.

Previous Behavior
GB18030-2022 extension characters (CJK Ext B–I, GFZB-196, BX, GX, CX, HX, IX — e.g.
𬸚,𢃾,𪜀,𰉖) rendered as?or blank when used as an Avatarname. Mixed strings starting with such a character (e.g.𫚭齅䶱5𮯠灋𬘭r...) also produced no initials at all.Three compounding bugs in
getInitials:UNWANTED_CHARS_REGEXincluded\uD800-\uFFFFwhich, without theuflag, matched individual UTF-16 code units. This stripped the high surrogate of any supplementary character, leaving an orphaned low surrogate that rendered as?.UNSUPPORTED_TEXT_REGEXcould no longer detect the character as CJK, so the broken code unit fell through togetInitialsLatin.getInitialsLatinusedcharAt(0)which returns only the first UTF-16 code unit (half of a surrogate pair), not the full character.New Behavior
GB18030-2022 extension characters (and all supplementary CJK characters) are correctly rendered as the initial in the Avatar. Mixed strings like
𫚭齅䶱5𮯠灋𬘭r𫟼蝌龯...use the first code point (𫚭) as the initial.BMP CJK/Arabic/Korean/Japanese names continue to show the icon fallback (no initials), as before.
Changes made to
getInitials.ts:UNWANTED_CHARS_REGEX: changed\uD800-\uFFFF→\uE000-\uFFFFto exclude the surrogate range, preserving surrogate pairs for supplementary characters.UNSUPPORTED_TEXT_REGEX: removed the[\uD840-\uD869][\uDC00-\uDED6]surrogate-pair clause for CJK Ext B so those characters also produce initials.getInitials: the unsupported-language check now tests only the first code point of the name (not the whole string), so names starting with a GB18030-2022 extension char produce an initial even when the rest contains BMP CJK characters.getInitialsLatin: replacedcharAt(0)with[...word][0](spread iterator) for correct code-point extraction; fixed the RTL swap to also use code-point iteration.Related Issue(s)