Skip to content

Fix/ambigous charset#30

Merged
ralflang merged 4 commits into
FRAMEWORK_6_0from
fix/ambigousCharset
May 29, 2026
Merged

Fix/ambigous charset#30
ralflang merged 4 commits into
FRAMEWORK_6_0from
fix/ambigousCharset

Conversation

@TDannhauer
Copy link
Copy Markdown
Collaborator

Silence UConverter “ambiguous encoding” warnings for common charsets

Summary

  • Add CharacterSets::toUConverter() to map charset names that ICU treats as ambiguous to stable canonical names before constructing UConverter.
  • Use toUConverter() for both source and destination encodings in HordeString::_convertCharset() (and the legacy Horde_String wrapper).
  • Add unit tests for windows-1258 and pass-through behavior for unambiguous charsets.

Problem

When iconv and mbstring cannot convert a string, HordeString::_convertCharset() falls back to PHP’s intl UConverter. Several widely used charset labels are ambiguous in ICU: PHP accepts them but emits a warning and picks one canonical mapping, for example:

UConverter::__construct(): Ambiguous encoding specified, using ibm-5354_P100-1998

This is commonly triggered by windows-1258 (Vietnamese), which Horde lowercases from MIME labels like Windows-1258. The warning appears on line 202 of HordeString.php and can flood Horde logs during mail/ActiveSync processing even though conversion succeeds.

Other affected aliases include big5-hkscs, shift_jis, tis-620, windows-936, and windows-950.

Solution

  1. Extend CharacterSets with a small $uconverterMap (after normalize()) that maps ambiguous aliases to the same ICU canonical names PHP would select anyway (e.g. windows-1258cp1258, which resolves to ibm-5354_P100-1998 without a warning).
  2. Call CharacterSets::toUConverter($from) and CharacterSets::toUConverter($to) immediately before new UConverter($to, $from).
  3. Leave iconv and mbstring paths unchanged; they already accept names like windows-1258.

Behavior is unchanged; only log noise is removed.

Files changed

File Change
src/CharacterSets.php $uconverterMap, toUConverter()
src/HordeString.php Use toUConverter() for UConverter fallback
lib/Horde/String.php Same for legacy class
test/CharacterSetsTest.php Tests for toUConverter()

Test plan

  • With intl enabled, run: new UConverter('utf-8', 'windows-1258') before/after — confirm no Ambiguous encoding warning after the patch.
  • HordeString::convertCharset('<html>', 'UTF-8', 'Windows-1258') returns unchanged ASCII (existing testBug9528).
  • vendor/bin/phpunit test/CharacterSetsTest.php passes.
  • Process mail or ActiveSync traffic that previously logged ibm-5354_P100-1998 warnings; confirm logs are clean and message bodies still display correctly for Vietnamese (windows-1258) content.

@TDannhauer TDannhauer requested a review from ralflang May 29, 2026 07:00
@TDannhauer
Copy link
Copy Markdown
Collaborator Author

@ralflang I am really not sure obout this PR, please conduct a full depth review

@ralflang ralflang merged commit 1e82235 into FRAMEWORK_6_0 May 29, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants