Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit 1.29 unexpected returns getfilesystemencoding is ascii #817

Closed
darkbarker opened this issue Dec 22, 2022 · 19 comments
Closed

unit 1.29 unexpected returns getfilesystemencoding is ascii #817

darkbarker opened this issue Dec 22, 2022 · 19 comments
Assignees

Comments

@darkbarker
Copy link

debian 11, all locale settings *.uft-8

system (as before):

$ python
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getfilesystemencoding()
'utf-8'

but in application (django) launched on unit + unit-python3.9 sys.getfilesystemencoding() returns after update (from 1.28 to 1.29) 'ascii' (with appropriate ancient spell "UnicodeEncodeError: 'ascii' codec can't encode characters in position" etc)

maybe it"s related to this?:

Feature: support per-application cgroups on Linux.

(nothing more suitable)

after updates-rollback is everything works.
i read docs, issues, maybe i missed something?

@tippexs
Copy link
Contributor

tippexs commented Dec 22, 2022

Hi @darkbarker We are currently investigating an issue that was introduced with 1.29 as we added support for Python 3.11. See the thread on our mailing list for more details.

https://mailman.nginx.org/pipermail/unit/2022-December/GMDLF7ZJEPRNZWXVEQ3G6NKBKLGKL2PC.html

We will share a fix for this as soon as possible.

@ac000
Copy link
Member

ac000 commented Dec 22, 2022

@darkbarker

Please try reverting 491d0f7

$ git revert -n 491d0f70

Ignore the warnings...

@ac000
Copy link
Member

ac000 commented Dec 22, 2022

Using this simple test

{
    "listeners": {
        "[::1]:8080": {
            "pass": "applications/gh817"
        }
    },

    "applications": {
        "gh817": {
            "type": "python",
            "path": "/home/andrew/src/python/",
            "module": "gh817"
        }
    }
}
$ cat ~/src/python/gh817.py
import sys

def application(environ, start_response):
    start_response("200 OK", [("Content-Type", "text/plain")])
    return sys.getfilesystemencoding().encode('utf-8') + b'\n'

Produces

$ curl http://localhost:8080/
utf-8

That was with current unit master on Fedora 37 (Python 3.11)

Are you able to confirm if that works or fails for you? Or if you have a minimal reproducer?

@ac000
Copy link
Member

ac000 commented Dec 22, 2022

The above works for me on Debian 11

There must be something else tickling this bug.

@thresheek
Copy link
Member

Hi @darkbarker does that happen with unit.nginx.org-provided debian 11 packages?

@darkbarker
Copy link
Author

@thresheek

does that happen with unit.nginx.org-provided debian 11 packages?

yes, installed from packages.nginx.org, with apt (as the documentation says).
nothing else is done (with the package, with binaries, with the systemd-unit, with any configs-in-package)

/etc/apt/sources.list.d/unit.list
deb https://packages.nginx.org/unit/debian/ bullseye unit
deb-src https://packages.nginx.org/unit/debian/ bullseye unit

@darkbarker
Copy link
Author

@ac000 build unit from source is not so quick to do for me 😅 a little later (if not fixed yet)

@ac000
Copy link
Member

ac000 commented Dec 23, 2022

@darkbarker Are you able to test the simple program above?

@ac000
Copy link
Member

ac000 commented Dec 23, 2022

So far the only way I've been able to reproduce this is to run unit under a locale of say, 'C'

$ export LC_ALL=C
$ locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
$ /tmp/unit/sbin/unitd --no-daemon
2022/12/23 21:04:50 [warn] 10858#10858 Unit is running unprivileged, then it cannot use arbitrary user and group.
2022/12/23 21:04:50 [info] 10858#10858 unit 1.30.0 started
$ curl http://localhost:8080/
ascii

Put it back

LC_ALL=en_GB.UTF-8
$ /tmp/unit/sbin/unitd --no-daemon
2022/12/23 21:11:12 [warn] 10896#10896 Unit is running unprivileged, then it cannot use arbitrary user and group.
2022/12/23 21:11:12 [info] 10896#10896 unit 1.30.0 started
$ curl http://localhost:8080/
utf-8

@darkbarker
Copy link
Author

on a simple application (similar as above), the error could not be repeated. on django it repeats again. I need some time....

@darkbarker
Copy link
Author

{
    "applications": {
        "app_test_encoding": {
            "type": "python 3",
            "path": "/home/slb/app_test_encoding/",
            "module": "app_test_encoding"
        }
    },

    "listeners": {
        "127.0.0.1:8666": {
            "pass": "applications/app_test_encoding"
        }
    }
}

->

$ curl http://localhost:8666/
getfilesystemencoding: utf-8
getdefaultencoding: utf-8
getdefaultlocale: ('en_US', 'UTF-8')

but (+ "home")

{
    "applications": {
        "app_test_encoding": {
            "type": "python 3",
            "path": "/home/slb/app_test_encoding/",
            "module": "app_test_encoding",
            "home": "/home/slb/app_test_encoding/env"
        }
    },

    "listeners": {
        "127.0.0.1:8666": {
            "pass": "applications/app_test_encoding"
        }
    }
}

->

$ curl http://localhost:8666/
getfilesystemencoding: ascii
getdefaultencoding: utf-8
getdefaultlocale: ('en_US', 'UTF-8')

/home/slb/app_test_encoding/env is a pure vanilla virtualenv

***:~/app_test_encoding$ virtualenv ./env
created virtual environment CPython3.9.2.final.0-64 in 200ms
  creator CPython3Posix(dest=/home/slb/app_test_encoding/env, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/slb/.local/share/virtualenv)
    added seed packages: pip==20.3.4, pkg_resources==0.0.0, setuptools==44.1.1, wheel==0.34.2
  activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator

@darkbarker
Copy link
Author

darkbarker commented Dec 27, 2022

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

import sys
import locale

def application(environ, start_response):
    start_response("200 OK", [("Content-Type", "text/plain")])
    return ("getfilesystemencoding: %s\ngetdefaultencoding: %s\ngetdefaultlocale: %s\n" % (sys.getfilesystemencoding(), sys.getdefaultencoding(), locale.getdefaultlocale())).encode('utf-8')

@ac000
Copy link
Member

ac000 commented Dec 28, 2022

So it seems virtualenv is what tickles this particular bug. Don't know much about it but this looks like a crucial piece of information. Thanks!.

@darkbarker
Copy link
Author

inside the virtualenv, the encodings are correct, too

***:~$ which python
/usr/bin/python

***:~$ python
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getfilesystemencoding()
'utf-8'
>>> quit()

***:~$ . ~/app_test_encoding/env/bin/activate

(env) ***:~$ which python
/home/slb/app_test_encoding/env/bin/python

(env) ***:~$ python
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getfilesystemencoding()
'utf-8'
>>> quit()

@ac000
Copy link
Member

ac000 commented Dec 30, 2022

So for Python 3.8+ when you have the home option set we now go through the PyConfig API.

When this happens config.filesystem_encoding defaults to NULL if not set (which we don't) which seems to be the cause of the issue.

There's a simple patch you could try to verify this...

diff --git a/src/python/nxt_python.c b/src/python/nxt_python.c
index bdb04579..0f9b373e 100644
--- a/src/python/nxt_python.c
+++ b/src/python/nxt_python.c
@@ -94,6 +94,8 @@ nxt_python3_init_config(nxt_int_t pep405)
         }
     }
 
+    PyConfig_SetString(&config, &config.filesystem_encoding, L"utf-8");
+
     status = Py_InitializeFromConfig(&config);
     if (PyStatus_Exception(status)) {
         goto pyinit_exception;

When we don't go this route things seem to work out as before.

Sill some work to be done...

@ac000
Copy link
Member

ac000 commented Jan 5, 2023

So while the above changes the output of getfilesystemencoding it doesn't seem to actually solve the underlying issue.

@ac000
Copy link
Member

ac000 commented Jan 6, 2023

This patch seems to do the right thing and will enable UTF-8 depending on the setting of the LC_CTYPE environment variable (this is the default behaviour when using the non-isolated python config).

So if LC_CTYPE is either: C, POSIX or some specific UTF-8 locale then you will get UTF-8 support.

diff --git a/src/python/nxt_python.c b/src/python/nxt_python.c
index bdb04579..ce497d33 100644
--- a/src/python/nxt_python.c
+++ b/src/python/nxt_python.c
@@ -75,8 +75,17 @@ static nxt_python_proto_t    nxt_py_proto;
 static nxt_int_t
 nxt_python3_init_config(nxt_int_t pep405)
 {
-    PyStatus  status;
-    PyConfig  config;
+    PyConfig     config;
+    PyStatus     status;
+    PyPreConfig  preconfig;
+
+    PyPreConfig_InitIsolatedConfig(&preconfig);
+    preconfig.utf8_mode = -1;
+
+    status = Py_PreInitialize(&preconfig);
+    if (PyStatus_Exception(status)) {
+        return NXT_ERROR;
+    }
 
     PyConfig_InitIsolatedConfig(&config);
 

@hawiliali
Copy link

Just compiled unit with this patch. No longer see any ascii related issues logged at all
Hope all others will have same result
🫡

nginx-hg-mirror pushed a commit that referenced this issue Jan 12, 2023
There was a couple of reports of Python applications failing due to the
following type of error

File "/opt/netbox/netbox/netbox/configuration.py", line 25, in _import
     print(f"\U0001f9ec loaded config '{path}'")
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f9ec' in
position 0: ordinal not in range(128)

due to the use of Unicode text in the print() statement.

This only happened for python 3.8+ when using the "home" configuration
option as this meant we were going through the new PyConfig
configuration.

When using this new configuration method with the 'isolated' specific
API (for embedded Python) UTF-8 is disabled by default,
PyPreConfig->utf8_mode = 0.

To fix this we need to setup the Python pre config and enable utf-8
mode. However rather than enable utf-8 unconditionally we can set to it
to -1 so that it will use the LC_CTYPE environment variable to determine
whether to enable utf-8 mode or not. utf-8 mode will be enabled if
LC_CTYPE is either: C, POSIX or some specific UTF-8 locale. This is the
default utf8_mode setting when using the non-isolated PyPreConfig API.

Reported-by: Tobias Genannt <tobias.genannt@kappa-velorum.net>
Tested-by: Tobias Genannt <tobias.genannt@kappa-velorum.net>
Link: <https://peps.python.org/pep-0587/>
Link: <https://docs.python.org/3/c-api/init_config.html#c.PyPreConfig.utf8_mode>
Fixes: 491d0f7 ("Python: Added support for Python 3.11.")
Closes: <#817>
Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
@ac000
Copy link
Member

ac000 commented Jan 12, 2023

This fix has been merged.

@ac000 ac000 closed this as completed Jan 12, 2023
nginx-hg-mirror pushed a commit that referenced this issue Feb 27, 2023
There was a couple of reports of Python applications failing due to the
following type of error

File "/opt/netbox/netbox/netbox/configuration.py", line 25, in _import
     print(f"\U0001f9ec loaded config '{path}'")
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f9ec' in
position 0: ordinal not in range(128)

due to the use of Unicode text in the print() statement.

This only happened for python 3.8+ when using the "home" configuration
option as this meant we were going through the new PyConfig
configuration.

When using this new configuration method with the 'isolated' specific
API (for embedded Python) UTF-8 is disabled by default,
PyPreConfig->utf8_mode = 0.

To fix this we need to setup the Python pre config and enable utf-8
mode. However rather than enable utf-8 unconditionally we can set to it
to -1 so that it will use the LC_CTYPE environment variable to determine
whether to enable utf-8 mode or not. utf-8 mode will be enabled if
LC_CTYPE is either: C, POSIX or some specific UTF-8 locale. This is the
default utf8_mode setting when using the non-isolated PyPreConfig API.

Reported-by: Tobias Genannt <tobias.genannt@kappa-velorum.net>
Tested-by: Tobias Genannt <tobias.genannt@kappa-velorum.net>
Link: <https://peps.python.org/pep-0587/>
Link: <https://docs.python.org/3/c-api/init_config.html#c.PyPreConfig.utf8_mode>
Fixes: 491d0f7 ("Python: Added support for Python 3.11.")
Closes: <#817>
Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants