Skip to content

Commit

Permalink
Fix/clarify dirname/basename docs wrt. locales
Browse files Browse the repository at this point in the history
For basename(), we declare the behavior regarding invalid characters in
the path as being undefined, since that depends on the availability of
mblen, and also on the position of the invalid characters prior to PHP
8.0.0[1].

dirname() is actually not local-aware, but relies on an ASCII
compatible character encoding regarding the directory separator.  On
Windows, it is however, dependent on the currently set codepage
(although a fallback is still in place to use the Windows ANSI codepage
of the operating system[2], if the string is not valid for the current
codepage).

Again, we declared failure to comply to these assumptions as resulting
in undefined behavior.  Users should make sure to pass valid strings.

[1] <http://git.php.net/?p=php-src.git;a=commitdiff;h=90705d44e3da1d0aa7b8b4fd921ec597391eccb2>
[2] <https://github.com/php/php-src/blob/5e015425263c28d40fd49ee386135f02d0e76975/win32/codepage.h#L95-L106>
  • Loading branch information
cmb69 committed Feb 19, 2021
1 parent 871df69 commit 88c1f8d
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 3 deletions.
2 changes: 2 additions & 0 deletions reference/filesystem/functions/basename.xml
Expand Up @@ -29,6 +29,8 @@
<function>basename</function> is locale aware, so for it to see the
correct basename with multibyte character paths, the matching locale must
be set using the <function>setlocale</function> function.
If <parameter>path</parameter> contains characters which are invalid for the
current locale, the behavior of <function>basename</function> is undefined.
</para>
</caution>
</refsect1>
Expand Down
13 changes: 10 additions & 3 deletions reference/filesystem/functions/dirname.xml
Expand Up @@ -27,9 +27,16 @@
</note>
<caution>
<para>
<function>dirname</function> is locale aware, so for it to see the
correct directory name with multibyte character paths, the matching locale must
be set using the <function>setlocale</function> function.
On Windows, <function>dirname</function> assumes the currently set codepage, so for it to see the
correct directory name with multibyte character paths, the matching codepage must
be set.
If <parameter>path</parameter> contains characters which are invalid for the
current codepage, the behavior of <function>basename</function> is undefined.
</para>
<para>
On other systems, <function>dirname</function> assumes <parameter>path</parameter>
to be encoded in an ASCII compatible encoding. Otherwise the behavior of the
the function is undefined.
</para>
</caution>
</refsect1>
Expand Down

0 comments on commit 88c1f8d

Please sign in to comment.