Skip to content
Browse files

feature #33553 [String] a new component for object-oriented strings m…

…anagement with an abstract unit system (nicolas-grekas, hhamon, gharlan)

This PR was merged into the 5.0-dev branch.


[String] a new component for object-oriented strings management with an abstract unit system

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Tickets       | -
| License       | MIT
| Doc PR        | -

This is a reboot of #22184 (thanks @hhamon for working on it) and a generalization of my previous work on the topic ([patchwork/utf8]( Unlike existing libraries (including `patchwork/utf8`), this component provides a unified API for the 3 unit systems of strings: bytes, code points and grapheme clusters.

The unified API is defined by the `AbstractString` class. It has 2 direct child classes: `BinaryString` and `AbstractUnicodeString`, itself extended by `Utf8String` and `GraphemeString`.

All objects are immutable and provide clear edge-case semantics, using exceptions and/or (nullable) types!

Two helper functions are provided to create such strings:
new GraphemeString('foo') == u('foo'); // when dealing with Unicode, prefer grapheme units
new BinaryString('foo') == b('foo');

`GraphemeString` is the most linguistic-friendly variant of them, which means it's the one ppl should use most of the time *when dealing with written text*.

Future ideas:
 - improve tests
 - add more docblocks (only where they'd add value!)
 - consider adding more methods in the string API (`is*()?`, `*Encode()`?, etc.)
 - first class Emoji support
 - merge the Inflector component into this one
 - use `width()` to improve `truncate()` and `wordwrap()`
 - move method `slug()` to a dedicated locale-aware service class
 - propose your ideas (send PRs after merge)

Out of (current) scope:
 - what [intl]( provides (collations, transliterations, confusables, segmentation, etc)

Here is the unified API I'm proposing in this PR, borrowed from looking at many existing libraries, but also Java, Python, JavaScript and Go.

function __construct(string $string = '');
static function unwrap(array $values): array
static function wrap(array $values): array
function after($needle, bool $includeNeedle = false, int $offset = 0): self;
function afterLast($needle, bool $includeNeedle = false, int $offset = 0): self;
function append(string ...$suffix): self;
function before($needle, bool $includeNeedle = false, int $offset = 0): self;
function beforeLast($needle, bool $includeNeedle = false, int $offset = 0): self;
function camel(): self;
function chunk(int $length = 1): array;
function collapseWhitespace(): self
function endsWith($suffix): bool;
function ensureEnd(string $suffix): self;
function ensureStart(string $prefix): self;
function equalsTo($string): bool;
function folded(): self;
function ignoreCase(): self;
function indexOf($needle, int $offset = 0): ?int;
function indexOfLast($needle, int $offset = 0): ?int;
function isEmpty(): bool;
function join(array $strings): self;
function jsonSerialize(): string;
function length(): int;
function lower(): self;
function match(string $pattern, int $flags = 0, int $offset = 0): array;
function padBoth(int $length, string $padStr = ' '): self;
function padEnd(int $length, string $padStr = ' '): self;
function padStart(int $length, string $padStr = ' '): self;
function prepend(string ...$prefix): self;
function repeat(int $multiplier): self;
function replace(string $from, string $to): self;
function replaceMatches(string $fromPattern, $to): self;
function slice(int $start = 0, int $length = null): self;
function snake(): self;
function splice(string $replacement, int $start = 0, int $length = null): self;
function split(string $delimiter, int $limit = null, int $flags = null): array;
function startsWith($prefix): bool;
function title(bool $allWords = false): self;
function toBinary(string $toEncoding = null): BinaryString;
function toGrapheme(): GraphemeString;
function toUtf8(): Utf8String;
function trim(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;
function trimEnd(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;
function trimStart(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;
function truncate(int $length, string $ellipsis = ''): self;
function upper(): self;
function width(bool $ignoreAnsiDecoration = true): int;
function wordwrap(int $width = 75, string $break = "\n", bool $cut = false): self;
function __clone();
function __toString(): string;

`AbstractUnicodeString` adds these:
static function fromCodePoints(int ...$codes): self;
function ascii(array $rules = []): self;
function codePoint(int $index = 0): ?int;
function folded(bool $compat = true): parent;
function normalize(int $form = self::NFC): self;
function slug(string $separator = '-'): self;

and `BinaryString`:
static function fromRandom(int $length = 16): self;
function byteCode(int $index = 0): ?int;
function isUtf8(): bool;
function toUtf8(string $fromEncoding = null): Utf8String;
function toGrapheme(string $fromEncoding = null): GraphemeString;

Case insensitive operations are done with the `ignoreCase()` method.
e.g. `b('abc')->ignoreCase()->indexOf('B')` will return `1`.

For reference, CLDR transliterations (used in the `ascii()` method) are defined here:


dd8745a [String] add more tests
82a0095 [String] add tests
012e92a [String] a new component for object-oriented strings management with an abstract unit system
  • Loading branch information...
fabpot committed Sep 26, 2019
2 parents 77e42d8 + dd8745a commit 5d154fb304486ce51ab47d0e62e6690ce9c3e05d
@@ -270,7 +270,7 @@ install:
(cd src/Symfony/Component/HttpFoundation; mv composer.bak composer.json)
COMPONENTS=$(git diff --name-only src/ | grep composer.json || true)
if [[ $COMPONENTS && $LEGACY && $TRAVIS_PULL_REQUEST != false ]]; then
if [[ $COMPONENTS && $LEGACY && $TRAVIS_BRANCH != master && $TRAVIS_PULL_REQUEST != false ]]; then
export FLIP='🙃'
SYMFONY_VERSION=$(echo $SYMFONY_VERSION | awk '{print $1 - 1}')
echo -e "\\n\\e[33;1mChecking out Symfony $SYMFONY_VERSION and running tests with patched components as deps\\e[0m"
@@ -27,8 +27,10 @@
"psr/log": "~1.0",
"symfony/contracts": "^1.1.7|^2",
"symfony/polyfill-ctype": "~1.8",
"symfony/polyfill-intl-grapheme": "~1.0",
"symfony/polyfill-intl-icu": "~1.0",
"symfony/polyfill-intl-idn": "^1.10",
"symfony/polyfill-intl-normalizer": "~1.0",
"symfony/polyfill-mbstring": "~1.0",
"symfony/polyfill-php73": "^1.11"
@@ -82,6 +84,7 @@
"symfony/sendgrid-mailer": "self.version",
"symfony/serializer": "self.version",
"symfony/stopwatch": "self.version",
"symfony/string": "self.version",
"symfony/templating": "self.version",
"symfony/translation": "self.version",
"symfony/twig-bridge": "self.version",
@@ -144,7 +147,10 @@
"autoload-dev": {
"files": [ "src/Symfony/Component/VarDumper/Resources/functions/dump.php" ]
"files": [
"repositories": [
@@ -0,0 +1,2 @@
/Tests export-ignore
/phpunit.xml.dist export-ignore
@@ -0,0 +1,3 @@

0 comments on commit 5d154fb

Please sign in to comment.
You can’t perform that action at this time.