Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix setlocale() with magic locale strings #8834

Closed
wants to merge 1 commit into from

Conversation

fredemmott
Copy link
Contributor

@fredemmott fredemmott commented May 5, 2021

There are two write-only magic locale C strings for setlocale

  • "", which means "set the locale based on the environment"
  • nullptr (represented in PHP as the string "0"), which means
    "return the current locale"

On a fresh request, setlocale(LC_ALL, "0") previously returned "",
but

  • the initial locale is "C", not inherited from the environment (this matches
    C and Python, but PHP inherits from the environment by default)
  • "" should never be returned as a locale

Additionally, setting to "" did set the locale to be inherited from the
environment, but calling setlocale(LC_ALL, "") would return the empty string
(indistinguishable from the initial state) rather than the locale.

This diff does not change the default locale or how setting works, but it does
correct the indicated locale, which was previously incorrect in these cases.

Future work will probably include

  • adding a read-only getlocale()
  • adding a read-only environmentlocale() or similar to make it possible to fetch
    it without changing the state
  • add newlocale to allow using a specific locale without creating it

Test Plan:

$ locale
LANG=
LANGUAGE=
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=en_GB.UTF-8

debugger

$ hphp/hhvm/hhvm -m debug
hphpd> =setlocale(LC_ALL, "0")
"C"
hphpd> =setlocale(LC_ALL, "")
"en_GB.UTF-8"
hphpd> =setlocale(LC_ALL, "0")
"en_GB.UTF-8"
hphpd> =setlocale(LC_CTYPE, "fr_FR")
"fr_FR"
hphpd> =setlocale(LC_ALL, "0")
"LC_CTYPE=fr_FR;LC_NUMERIC=en_GB.UTF-8;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=en_GB.UTF-8;LC_ADDRESS=en_GB.UTF-8;LC_TELEPHONE=en_GB.UTF-8;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=en_GB.UTF-8"
hphpd> =setlocale(LC_CTYPE, "")
"en_GB.UTF-8"
hphpd> =setlocale(LC_ALL, "0")
"en_GB.UTF-8"

For multiple requests

$ cat test.hack
<<__EntryPoint>>
function main(): void {
  var_dump(setlocale(LC_ALL, "0"));
  var_dump(setlocale(LC_ALL, ""));
  var_dump(setlocale(LC_ALL, "fr_FR"));
  var_dump(setlocale(LC_ALL, "0"));
}
$ hphp/hhvm/hhvm -m server -p 8080

and in another terminal...

$ curl http://localhost:8080/test.hack
string(1) "C"
string(11) "en_GB.UTF-8"
string(5) "fr_FR"
string(5) "fr_FR"
$ siege -c 50 -r 100 http://localhost:8080/test.hack # 5000 total requests with concurrency 50
...
$ curl http://localhost:8080/test.hack # Unchanged results
string(1) "C"
string(11) "en_GB.UTF-8"
string(5) "fr_FR"
string(5) "fr_FR"

There are two write-only magic locale C strings for `setlocale`
- `""`, which means "set the locale based on the environment"
- nullptr (represented in PHP as the string `"0"`), which means
  "return the current locale"

On a fresh request, `setlocale(LC_ALL, "0")` previously returned `""`,
but
- the initial locale is `"C"`, not inherited from the environment (this matches
  C and Python, but PHP inherits from the environment by default)
- `""` should never be *returned* as a locale

Additionally, setting to `""` did set the locale to be inherited from the
environment, but calling `setlocale(LC_ALL, "")` would return the empty string
(indistinguishable from the initial state) rather than the locale.

This diff does not change the default locale or how setting works, but it does
correct the *indicated* locale, which was previously incorrect in these cases.

Future work will probably include
- adding a read-only getlocale()
- adding a read-only environmentlocale() or similar to make it possible to fetch
  it without changing the state
- add newlocale to allow using a specific locale without creating it

Test Plan:

```
$ locale
LANG=
LANGUAGE=
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=en_GB.UTF-8
```

debugger

```
$ hphp/hhvm/hhvm -m debug
hphpd> =setlocale(LC_ALL, "0")
"C"
hphpd> =setlocale(LC_ALL, "")
"en_GB.UTF-8"
hphpd> =setlocale(LC_ALL, "0")
"en_GB.UTF-8"
hphpd> =setlocale(LC_CTYPE, "fr_FR")
"fr_FR"
hphpd> =setlocale(LC_ALL, "0")
"LC_CTYPE=fr_FR;LC_NUMERIC=en_GB.UTF-8;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=en_GB.UTF-8;LC_ADDRESS=en_GB.UTF-8;LC_TELEPHONE=en_GB.UTF-8;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=en_GB.UTF-8"
hphpd> =setlocale(LC_CTYPE, "")
"en_GB.UTF-8"
hphpd> =setlocale(LC_ALL, "0")
"en_GB.UTF-8"
```

For multiple requests

```
$ cat test.hack
<<__EntryPoint>>
function main(): void {
  var_dump(setlocale(LC_ALL, "0"));
  var_dump(setlocale(LC_ALL, ""));
  var_dump(setlocale(LC_ALL, "fr_FR"));
  var_dump(setlocale(LC_ALL, "0"));
}
$ hphp/hhvm/hhvm -m server -p 8080
```

and in another terminal...

```
$ curl http://localhost:8080/test.hack
string(1) "C"
string(11) "en_GB.UTF-8"
string(5) "fr_FR"
string(5) "fr_FR"
$ siege -c 50 -r 100 http://localhost:8080/test.hack # 5000 total requests with concurrency 50
...
$ curl http://localhost:8080/test.hack # Unchanged results
string(1) "C"
string(11) "en_GB.UTF-8"
string(5) "fr_FR"
string(5) "fr_FR"
```
@fredemmott fredemmott changed the title setlocale get Fix setlocale() with magic locale strings May 5, 2021
@facebook-github-bot
Copy link
Contributor

@fredemmott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1734c1e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants