Skip to content

Wrong length of surrogate Unicode codepoints #12556

@mlocati

Description

@mlocati

Description

The following code:

<?php

echo 'SUN BEHIND CLOUD: ', mb_strlen("\u{26C5}"), "\n";

echo 'Non Private Use High Surrogate, First: ', mb_strlen("\u{D800}"), "\n";
echo 'Non Private Use High Surrogate, Last: ', mb_strlen("\u{DB7F}"), "\n";

echo 'Private Use High Surrogate, First: ', mb_strlen("\u{DB80}"), "\n";
echo 'Private Use High Surrogate, Last: ', mb_strlen("\u{DBFF}"), "\n";

echo 'Low Surrogate, First: ', mb_strlen("\u{DC00}"), "\n";
echo 'Low Surrogate, Last: ', mb_strlen("\u{DFFF}"), "\n";

In PHP 8.1 & 8.2 we have:

SUN BEHIND CLOUD: 1
Non Private Use High Surrogate, First: 1
Non Private Use High Surrogate, Last: 1
Private Use High Surrogate, First: 1
Private Use High Surrogate, Last: 1
Low Surrogate, First: 1
Low Surrogate, Last: 1

In PHP 8.3 we instead have:

SUN BEHIND CLOUD: 1
Non Private Use High Surrogate, First: 3
Non Private Use High Surrogate, Last: 3
Private Use High Surrogate, First: 3
Private Use High Surrogate, Last: 3
Low Surrogate, First: 3
Low Surrogate, Last: 3

PHP Version

8.3.0-dev (81e236c)

Operating System

Ubuntu 22.04.3

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions