Skip to content
Permalink
Browse files

feature #33553 [String] a new component for object-oriented strings m…

…anagement with an abstract unit system (nicolas-grekas, hhamon, gharlan)

This PR was merged into the 5.0-dev branch.

Discussion
----------

[String] a new component for object-oriented strings management with an abstract unit system

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Tickets       | -
| License       | MIT
| Doc PR        | -

This is a reboot of #22184 (thanks @hhamon for working on it) and a generalization of my previous work on the topic ([patchwork/utf8](https://github.com/tchwork/utf8)). Unlike existing libraries (including `patchwork/utf8`), this component provides a unified API for the 3 unit systems of strings: bytes, code points and grapheme clusters.

The unified API is defined by the `AbstractString` class. It has 2 direct child classes: `BinaryString` and `AbstractUnicodeString`, itself extended by `Utf8String` and `GraphemeString`.

All objects are immutable and provide clear edge-case semantics, using exceptions and/or (nullable) types!

Two helper functions are provided to create such strings:
```php
new GraphemeString('foo') == u('foo'); // when dealing with Unicode, prefer grapheme units
new BinaryString('foo') == b('foo');
```

`GraphemeString` is the most linguistic-friendly variant of them, which means it's the one ppl should use most of the time *when dealing with written text*.

Future ideas:
 - improve tests
 - add more docblocks (only where they'd add value!)
 - consider adding more methods in the string API (`is*()?`, `*Encode()`?, etc.)
 - first class Emoji support
 - merge the Inflector component into this one
 - use `width()` to improve `truncate()` and `wordwrap()`
 - move method `slug()` to a dedicated locale-aware service class
 - propose your ideas (send PRs after merge)

Out of (current) scope:
 - what [intl](https://php.net/intl) provides (collations, transliterations, confusables, segmentation, etc)

Here is the unified API I'm proposing in this PR, borrowed from looking at many existing libraries, but also Java, Python, JavaScript and Go.

```php
function __construct(string $string = '');
static function unwrap(array $values): array
static function wrap(array $values): array
function after($needle, bool $includeNeedle = false, int $offset = 0): self;
function afterLast($needle, bool $includeNeedle = false, int $offset = 0): self;
function append(string ...$suffix): self;
function before($needle, bool $includeNeedle = false, int $offset = 0): self;
function beforeLast($needle, bool $includeNeedle = false, int $offset = 0): self;
function camel(): self;
function chunk(int $length = 1): array;
function collapseWhitespace(): self
function endsWith($suffix): bool;
function ensureEnd(string $suffix): self;
function ensureStart(string $prefix): self;
function equalsTo($string): bool;
function folded(): self;
function ignoreCase(): self;
function indexOf($needle, int $offset = 0): ?int;
function indexOfLast($needle, int $offset = 0): ?int;
function isEmpty(): bool;
function join(array $strings): self;
function jsonSerialize(): string;
function length(): int;
function lower(): self;
function match(string $pattern, int $flags = 0, int $offset = 0): array;
function padBoth(int $length, string $padStr = ' '): self;
function padEnd(int $length, string $padStr = ' '): self;
function padStart(int $length, string $padStr = ' '): self;
function prepend(string ...$prefix): self;
function repeat(int $multiplier): self;
function replace(string $from, string $to): self;
function replaceMatches(string $fromPattern, $to): self;
function slice(int $start = 0, int $length = null): self;
function snake(): self;
function splice(string $replacement, int $start = 0, int $length = null): self;
function split(string $delimiter, int $limit = null, int $flags = null): array;
function startsWith($prefix): bool;
function title(bool $allWords = false): self;
function toBinary(string $toEncoding = null): BinaryString;
function toGrapheme(): GraphemeString;
function toUtf8(): Utf8String;
function trim(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;
function trimEnd(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;
function trimStart(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;
function truncate(int $length, string $ellipsis = ''): self;
function upper(): self;
function width(bool $ignoreAnsiDecoration = true): int;
function wordwrap(int $width = 75, string $break = "\n", bool $cut = false): self;
function __clone();
function __toString(): string;
```

`AbstractUnicodeString` adds these:
```php
static function fromCodePoints(int ...$codes): self;
function ascii(array $rules = []): self;
function codePoint(int $index = 0): ?int;
function folded(bool $compat = true): parent;
function normalize(int $form = self::NFC): self;
function slug(string $separator = '-'): self;
```

and `BinaryString`:
```php
static function fromRandom(int $length = 16): self;
function byteCode(int $index = 0): ?int;
function isUtf8(): bool;
function toUtf8(string $fromEncoding = null): Utf8String;
function toGrapheme(string $fromEncoding = null): GraphemeString;
```

Case insensitive operations are done with the `ignoreCase()` method.
e.g. `b('abc')->ignoreCase()->indexOf('B')` will return `1`.

For reference, CLDR transliterations (used in the `ascii()` method) are defined here:
https://github.com/unicode-org/cldr/tree/master/common/transforms

Commits
-------

dd8745a [String] add more tests
82a0095 [String] add tests
012e92a [String] a new component for object-oriented strings management with an abstract unit system
  • Loading branch information...
fabpot committed Sep 26, 2019
2 parents 77e42d8 + dd8745a commit 5d154fb304486ce51ab47d0e62e6690ce9c3e05d
@@ -270,7 +270,7 @@ install:
(cd src/Symfony/Component/HttpFoundation; mv composer.bak composer.json)
COMPONENTS=$(git diff --name-only src/ | grep composer.json || true)
if [[ $COMPONENTS && $LEGACY && $TRAVIS_PULL_REQUEST != false ]]; then
if [[ $COMPONENTS && $LEGACY && $TRAVIS_BRANCH != master && $TRAVIS_PULL_REQUEST != false ]]; then
export FLIP='🙃'
SYMFONY_VERSION=$(echo $SYMFONY_VERSION | awk '{print $1 - 1}')
echo -e "\\n\\e[33;1mChecking out Symfony $SYMFONY_VERSION and running tests with patched components as deps\\e[0m"
@@ -27,8 +27,10 @@
"psr/log": "~1.0",
"symfony/contracts": "^1.1.7|^2",
"symfony/polyfill-ctype": "~1.8",
"symfony/polyfill-intl-grapheme": "~1.0",
"symfony/polyfill-intl-icu": "~1.0",
"symfony/polyfill-intl-idn": "^1.10",
"symfony/polyfill-intl-normalizer": "~1.0",
"symfony/polyfill-mbstring": "~1.0",
"symfony/polyfill-php73": "^1.11"
},
@@ -82,6 +84,7 @@
"symfony/sendgrid-mailer": "self.version",
"symfony/serializer": "self.version",
"symfony/stopwatch": "self.version",
"symfony/string": "self.version",
"symfony/templating": "self.version",
"symfony/translation": "self.version",
"symfony/twig-bridge": "self.version",
@@ -144,7 +147,10 @@
]
},
"autoload-dev": {
"files": [ "src/Symfony/Component/VarDumper/Resources/functions/dump.php" ]
"files": [
"src/Symfony/Component/String/Resources/functions.php",
"src/Symfony/Component/VarDumper/Resources/functions/dump.php"
]
},
"repositories": [
{
@@ -0,0 +1,2 @@
/Tests export-ignore
/phpunit.xml.dist export-ignore
@@ -0,0 +1,3 @@
composer.lock
phpunit.xml
vendor/

0 comments on commit 5d154fb

Please sign in to comment.
You can’t perform that action at this time.