A PHP port of ICU4J's MessagePattern parser with locale utilities. Parses ICU MessageFormat patterns into an inspectable AST and provides locale data support for internationalization in PHP.
This package focuses on:
- ICU MessagePattern Parser - Parsing ICU MessageFormat patterns into a precise token stream and AST (Abstract Syntax Tree), exposing the internal structure of messages (literals, arguments, selects, plurals, nested sub-messages, offsets, quoted text, etc.).
- Locale Utilities - Access language data, plural rules, and locale validation for internationalization support.
This package does not provide locale-aware date/number formatting itself — it provides the pattern model and utilities so you can build formatters or validators that interoperate with PHP's intl extension or other formatting libraries.
- Domenico Lupinetti — Ostico — domenico@translated.net / ostico@gmail.com
- Full tokenization of ICU MessageFormat patterns (braces, argument names/indexes, type specifiers, selectors, offsets, quoted text, etc.).
- AST representation of a parsed message pattern (message, message parts, argument placeholders, plural/select blocks, and their sub-messages).
- Utilities for traversing, validating, and reconstructing patterns.
- Error reporting with position information for malformed patterns.
- Mirrored behavior of ICU4J MessagePattern parsing logic (same parsing rules and edge-case handling).
- Languages: Comprehensive language data including localized names, RTL detection, and language code validation.
- Plural Rules: CLDR-based plural rules for 140+ languages to determine the correct plural form for any number.
- Language Domains: Domain-specific language groupings for translation workflows.
composer require matecat/icu-intl- PHP 8.3+
- ext-mbstring required
- ext-intl recommended for full formatting integration (not required for parsing)
- Composer for installation and development tasks
| Namespace | Description |
|---|---|
Matecat\ICU |
ICU MessagePattern parser and AST classes |
Matecat\Locales |
Language data, plural rules, and locale utilities |
use Matecat\ICU\MessagePattern;
use Matecat\ICU\Tokens\TokenType;
$patternText = "You have {num, plural, offset:1 =0{no messages} =1{one message} other{# messages}} in {folder}.";
$pattern = new MessagePattern($patternText);
// Get AST and traverse (Token is a small DTO with type, start, length, value, limit)
$indent = '';
foreach ($pattern as $i => $part) {
$explanation = '';
$partString = (string)$part;
$type = $part->getType();
if ($type === TokenType::MSG_START) {
$indent = str_pad('', $part->getValue() * 4, ' ', STR_PAD_LEFT);
}
if ($part->getLength() > 0) {
$explanation .= '="' . $pattern->getSubstring($part) . '"';
}
if ($type->hasNumericValue()) {
$explanation .= '=' . $pattern->getNumericValue($part);
}
printf("%2d: %s%s%s\n", $i, $indent, $partString, $explanation);
if ($type === TokenType::MSG_LIMIT) {
$nestingLevel = $part->getValue();
if ($nestingLevel > 1) {
$indent = str_pad('', ($nestingLevel - 1) * 4, ' ', STR_PAD_LEFT);
} else {
$indent = '';
}
}
}
/*
0: MSG_START(0)@0
1: ARG_START(PLURAL)@9="{"
2: ARG_NAME(0)@10="num"
3: ARG_INT(1)@30="1"=1
4: ARG_SELECTOR(0)@32="=0"
5: ARG_INT(0)@33="0"=0
6: MSG_START(1)@34="{"
7: MSG_LIMIT(1)@46="}"
8: ARG_SELECTOR(0)@48="=1"
9: ARG_INT(1)@49="1"=1
10: MSG_START(1)@50="{"
11: MSG_LIMIT(1)@62="}"
12: ARG_SELECTOR(0)@64="other"
13: MSG_START(1)@69="{"
14: REPLACE_NUMBER(0)@70="#"
15: MSG_LIMIT(1)@80="}"
16: ARG_LIMIT(PLURAL)@81="}"
17: ARG_START(NONE)@86="{"
18: ARG_NAME(0)@87="folder"
19: ARG_LIMIT(NONE)@93="}"
20: MSG_LIMIT(0)@95
*/$pattern = new MessagePattern("Hello {name}, welcome!");
/*
0: MSG_START(0)@0
1: ARG_START(NONE)@6="{"
2: ARG_NAME(0)@7="name"
3: ARG_LIMIT(NONE)@11="}"
4: MSG_LIMIT(0)@22
*/$pattern = new MessagePattern();
$pattern->parse("You have {count, plural, =0{no messages} one{# message} other{# messages}}.");$pattern = MessagePattern::parse("{gender, select, female{{num, plural, one{She has one file} other{She has # files}}} male{{num, plural, one{He has one file} other{He has # files}}} other{{num, plural, one{They have one file} other{They have # files}}}}");
/*
0: MSG_START(0)@0
1: ARG_START(SELECT)@0="{"
2: ARG_NAME(0)@1="gender"
3: ARG_SELECTOR(0)@17="female"
4: MSG_START(1)@23="{"
5: ARG_START(PLURAL)@24="{"
...
42: ARG_LIMIT(SELECT)@219="}"
43: MSG_LIMIT(0)@220
*/use Matecat\Locales\Languages;
// Get all supported languages
$languages = Languages::getInstance();
$allLanguages = $languages->getLanguages();
// Get a specific language by RFC3066 code
$english = $languages->getLanguage('en-US');
echo $english['name']; // "English US"
echo $english['localized']; // "English"
echo $english['direction']; // "ltr"
echo $english['plurals']; // 2
// Check if a language is RTL
$isRtl = $languages->isRTL('ar-SA'); // true
// Get the number of plural forms for a language
$pluralCount = Languages::getPluralsCount('ru-RU'); // 3use Matecat\ICU\Plurals\PluralRules;
// Get the plural form index for a number in a specific language
$form = PluralRules::getCardinalFormIndex('en', 1); // 0 (singular)
$form = PluralRules::getCardinalFormIndex('en', 5); // 1 (plural)
$form = PluralRules::getCardinalFormIndex('ru', 1); // 0 (one)
$form = PluralRules::getCardinalFormIndex('ru', 2); // 1 (few)
$form = PluralRules::getCardinalFormIndex('ru', 5); // 2 (many)
// Get the CLDR plural category name for a number
$category = PluralRules::getCardinalCategoryName('en', 1); // "one"
$category = PluralRules::getCardinalCategoryName('en', 5); // "other"
$category = PluralRules::getCardinalCategoryName('ru', 1); // "one"
$category = PluralRules::getCardinalCategoryName('ru', 2); // "few"
$category = PluralRules::getCardinalCategoryName('ru', 5); // "many"
$category = PluralRules::getCardinalCategoryName('ar', 0); // "zero"
$category = PluralRules::getCardinalCategoryName('ar', 1); // "one"
$category = PluralRules::getCardinalCategoryName('ar', 2); // "two"
$category = PluralRules::getCardinalCategoryName('ar', 5); // "few"
$category = PluralRules::getCardinalCategoryName('ar', 11); // "many"
$category = PluralRules::getCardinalCategoryName('ar', 100); // "other"
// Get all available plural categories for a language
$categories = PluralRules::getCardinalCategories('en'); // ["one", "other"]
$categories = PluralRules::getCardinalCategories('ru'); // ["one", "few", "many"]
$categories = PluralRules::getCardinalCategories('ar'); // ["zero", "one", "two", "few", "many", "other"]
// Use category constants for comparison
if (PluralRules::getCardinalCategoryName('en', $count) === PluralRules::CATEGORY_ONE) {
echo "Singular form";
}The MessagePatternAnalyzer validates that plural/selectordinal selectors comply with CLDR plural categories for a given locale. It provides per-argument warnings for detailed feedback.
use Matecat\ICU\MessagePattern;
use Matecat\ICU\MessagePatternAnalyzer;
use Matecat\ICU\Plurals\PluralComplianceException;
// Parse an ICU message
$pattern = new MessagePattern();
$pattern->parse('{count, plural, one{# item} other{# items}}');
// Validate plural compliance for English
$analyzer = new MessagePatternAnalyzer($pattern, 'en');
$warning = $analyzer->validatePluralCompliance();
// Returns null when all categories are valid and complete
var_dump($warning); // null - 'one' and 'other' are valid for English
// Check with Russian locale - Russian requires one/few/many/other
$analyzer = new MessagePatternAnalyzer($pattern, 'ru');
$warning = $analyzer->validatePluralCompliance();
// Returns a PluralComplianceWarning with per-argument details
$warning->getMessage(); // Human-readable warning message
$warning->getArgumentWarnings(); // Array of PluralArgumentWarning objects
$warning->getAllMissingCategories(); // ['few', 'many']
$warning->getAllWrongLocaleSelectors(); // []
// Access per-argument warnings
foreach ($warning->getArgumentWarnings() as $argWarning) {
echo $argWarning->argumentName; // 'count'
echo $argWarning->getArgumentTypeLabel(); // 'plural' or 'selectordinal'
print_r($argWarning->expectedCategories); // ['one', 'few', 'many', 'other']
print_r($argWarning->missingCategories); // ['few', 'many']
print_r($argWarning->foundSelectors); // ['one', 'other']
echo $argWarning->getMessage(); // Detailed message for this argument
}
// Invalid CLDR categories throw an exception
$pattern->parse('{count, plural, some{# items} other{# items}}'); // 'some' is not a valid CLDR category
$analyzer = new MessagePatternAnalyzer($pattern, 'en');
try {
$analyzer->validatePluralCompliance();
} catch (PluralComplianceException $e) {
echo $e->getMessage();
// "Invalid selectors found: [some]. Valid CLDR categories are: [zero, one, two, few, many, other]."
print_r($e->invalidSelectors); // ['some']
print_r($e->expectedCategories); // ['zero', 'one', 'two', 'few', 'many', 'other']
}
// Valid CLDR categories wrong for locale return warnings (not exceptions)
$pattern->parse('{count, plural, one{# item} few{# items} other{# items}}');
$analyzer = new MessagePatternAnalyzer($pattern, 'en'); // English doesn't use 'few'
$warning = $analyzer->validatePluralCompliance();
$argWarning = $warning->getArgumentWarnings()[0];
print_r($argWarning->wrongLocaleSelectors); // ['few'] - valid CLDR but not for English
// Explicit numeric selectors (=0, =1, =2) are always valid but don't substitute category keywords
$pattern->parse('{count, plural, =0{none} =1{one item} other{# items}}');
$analyzer = new MessagePatternAnalyzer($pattern, 'en');
$warning = $analyzer->validatePluralCompliance();
$argWarning = $warning->getArgumentWarnings()[0];
print_r($argWarning->numericSelectors); // ['=0', '=1']
print_r($argWarning->missingCategories); // ['one'] - =1 doesn't substitute for 'one' keyword
// Nested messages with multiple plural arguments get per-argument validation
$pattern->parse("{gender, select, female{{n, plural, one{her item} other{her items}}} male{{n, plural, one{his item} other{his items}}}}");
$analyzer = new MessagePatternAnalyzer($pattern, 'en');
$warning = $analyzer->validatePluralCompliance(); // null - all valid
// SelectOrdinal validation uses ordinal rules (different from cardinal)
$pattern->parse('{rank, selectordinal, one{#st} two{#nd} few{#rd} other{#th}}');
$analyzer = new MessagePatternAnalyzer($pattern, 'en');
$warning = $analyzer->validatePluralCompliance(); // null - English ordinal uses one/two/few/other| Selector Type | Behavior |
|---|---|
| Non-existent CLDR category (e.g., 'some') | Throws PluralComplianceException |
| Valid CLDR category wrong for locale (e.g., 'few' in English) | Returns warning in wrongLocaleSelectors |
| Missing required category for locale | Returns warning in missingCategories |
| Explicit numeric selector (=0, =1, etc.) | Always valid, tracked in numericSelectors |
| 'other' category | Always valid (ICU requires it as fallback) |
use Matecat\Locales\LanguageDomains;
// Get all language domains
$domains = LanguageDomains::getInstance();
$allDomains = $domains->getDomains();
// Get a specific domain
$domain = $domains->getDomain('technical');This library focuses on parsing and structure. If you want to format values using parsed plural/select patterns:
- Use PHP's Intl MessageFormatter (intl extension) for end-to-end ICU MessageFormat formatting (it accepts message strings and values).
- Or, implement a custom formatter that walks the AST and applies number/date formatting from ext-intl or other libraries.
__construct(?string $pattern = null, int $apostropheMode = MessagePattern::APOSTROPHE_DOUBLE_OPTIONAL)parse(string $pattern): selfparseChoiceStyle(string $pattern): selfparsePluralStyle(string $pattern): selfparseSelectStyle(string $pattern): selfclear(): voidclearPatternAndSetApostropheMode(int $mode): voidgetApostropheMode(): intgetPatternString(): stringcountParts(): intgetPart(int $index): PartgetPartType(int $index): Parts\TokenTypegetSubstring(Part $part): stringgetNumericValue(Part $part): float|int(returnsMessagePattern::NO_NUMERIC_VALUEwhen not numeric)getPluralOffset(int $argStartIndex): floatvalidateArgumentName(string $name): int(static helper)appendReducedApostrophes(string $s, int $start, int $limit, string &$out): void(static helper)- Implements
Iteratorto iterate parts.
Represents a parsed token/part with accessors:
getType(): Parts\TokenTypegetIndex(): intgetLength(): intgetValue(): mixedgetLimit(): intgetArgType(): ?ArgTypePart::MAX_LENGTHPart::MAX_VALUE
Token types used by the parser: MSG_START, MSG_LIMIT, ARG_START, ARG_NAME, ARG_NUMBER, ARG_INT, ARG_DOUBLE, ARG_TYPE, ARG_STYLE, ARG_SELECTOR, ARG_LIMIT, INSERT_CHAR, REPLACE_NUMBER, SKIP_SYNTAX, etc.
Argument classifications: NONE, SIMPLE, CHOICE, PLURAL, SELECT, SELECTORDINAL.
__construct(MessagePattern $pattern, string $language = 'en-US')containsComplexSyntax(): bool- Returns true if the pattern contains plural, select, choice, or selectordinalvalidatePluralCompliance(): ?PluralComplianceWarning- Validates if plural forms comply with the locale's expected categories. Returns null if valid, a warning object if there are issues, or throwsPluralComplianceExceptionfor invalid CLDR categories.
Returned when plural selectors have compliance issues that don't warrant an exception.
__construct(array $argumentWarnings)getArgumentWarnings(): array<PluralArgumentWarning>- Get all argument-level warningsgetAllMissingCategories(): array<string>- Get all missing categories across all argumentsgetAllWrongLocaleSelectors(): array<string>- Get all wrong locale selectors across all argumentsgetMessages(): array<string>- Get all warning messages as an arraygetMessage(): string- Human-readable warning message (joins all messages with newlines)- Implements
Stringableinterface
Detailed warning information for a single plural/selectordinal argument.
argumentName: string- The argument name (e.g., 'count', 'num_guests')argumentType: ArgType- The argument type (PLURAL or SELECTORDINAL)expectedCategories: array<string>- Valid CLDR categories for this argument type and localefoundSelectors: array<string>- All selectors found in this argumentmissingCategories: array<string>- Expected categories not foundnumericSelectors: array<string>- Explicit numeric selectors found (e.g., =0, =1)wrongLocaleSelectors: array<string>- Valid CLDR categories that don't apply to this localegetArgumentTypeLabel(): string- Returns 'plural' or 'selectordinal'getMessage(): string- Human-readable message for this argument- Implements
Stringableinterface
Thrown when a selector is not a valid CLDR category name (e.g., 'some', 'foo').
expectedCategories: array<string>- Valid CLDR categoriesfoundSelectors: array<string>- All selectors found in the messageinvalidSelectors: array<string>- Non-existent CLDR category namesmissingCategories: array<string>- (Always empty, for interface compatibility)
getInstance(): Languages(singleton)getLanguages(): arraygetLanguage(string $rfc3066code): ?arrayisRTL(string $rfc3066code): boolgetPluralsCount(string $rfc3066code): int(static)
getCardinalFormIndex(string $locale, int $n): int(static) - Returns the plural form index for a numbergetCardinalCategoryName(string $locale, int $n): string(static) - Returns the CLDR category name ('zero', 'one', 'two', 'few', 'many', 'other')getCardinalCategories(string $locale): array(static) - Returns all available cardinal category names for a localegetOrdinalCategories(string $locale): array(static) - Returns all available ordinal category names for a localegetOrdinalFormIndex(string $locale, int $n): int(static) - Returns the ordinal form index for a numberisValidCategory(string $selector): bool(static) - Checks if a selector is a valid CLDR category name
CATEGORY_ZERO= 'zero'CATEGORY_ONE= 'one'CATEGORY_TWO= 'two'CATEGORY_FEW= 'few'CATEGORY_MANY= 'many'CATEGORY_OTHER= 'other'VALID_CATEGORIES= ['zero', 'one', 'two', 'few', 'many', 'other']
getInstance(): LanguageDomains(singleton)getDomains(): arraygetDomain(string $domainKey): ?array
InvalidArgumentExceptionfor syntax errorsOutOfBoundsExceptionfor excessive sizes/nesting and indexing errorsMatecat\Locales\InvalidLanguageExceptionfor invalid language codesMatecat\ICU\Plurals\PluralComplianceExceptionfor invalid CLDR plural category names
vendor/bin/phpunitor
composer testvendor/bin/phpstan analyze- ICU Parser core:
src/ICU/MessagePattern.php - Pattern Analyzer:
src/ICU/MessagePatternAnalyzer.php - Part/token model:
src/ICU/Tokens/Part.php - Token types:
src/ICU/Tokens/TokenType.php - Argument types:
src/ICU/Tokens/ArgType.php - Plural Rules:
src/ICU/Plurals/PluralRules.php - Plural Compliance Warning:
src/ICU/Plurals/PluralComplianceWarning.php - Plural Argument Warning:
src/ICU/Plurals/PluralArgumentWarning.php - Plural Compliance Exception:
src/ICU/Plurals/PluralComplianceException.php - Languages:
src/Locales/Languages.php - Language Domains:
src/Locales/LanguageDomains.php - Tests:
tests/
This project is licensed under the GNU Lesser General Public License v3.0 or later.
- SPDX-License-Identifier:
LGPL-3.0-or-later - See the
LICENSEfile for the full license text. - More info: https://www.gnu.org/licenses/lgpl-3.0.html
This PHP port mirrors ideas from ICU4J MessagePattern.java.
- See
LICENSEfor project license and attribution to ICU4J sources: https://github.com/unicode-org/icu - Plural rules based on CLDR (Unicode Common Locale Data Repository): https://cldr.unicode.org/