Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple test failures with OSError: [Errno 84] Invalid or incomplete multibyte or wide character on ZFS with utf8only=on #81765

Open
dimitern mannequin opened this issue Jul 13, 2019 · 5 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@dimitern
Copy link
Mannequin

dimitern mannequin commented Jul 13, 2019

BPO 37584
Nosy @gpshead, @vstinner, @benjaminp, @ezio-melotti, @serhiy-storchaka, @dimitern
Files
  • cpython_test_output.log: Tests output
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-07-13.10:13:56.997>
    labels = ['type-bug', '3.9', '3.10', '3.11', 'tests', 'expert-unicode']
    title = 'Multiple test failures with OSError: [Errno 84] Invalid or incomplete multibyte or wide character on ZFS with utf8only=on'
    updated_at = <Date 2021-12-13.02:23:38.746>
    user = 'https://github.com/dimitern'

    bugs.python.org fields:

    activity = <Date 2021-12-13.02:23:38.746>
    actor = 'gregory.p.smith'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Tests', 'Unicode']
    creation = <Date 2019-07-13.10:13:56.997>
    creator = 'dimitern'
    dependencies = []
    files = ['48475']
    hgrepos = []
    issue_num = 37584
    keywords = []
    message_count = 5.0
    messages = ['347794', '347801', '347998', '348006', '408420']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'vstinner', 'benjamin.peterson', 'ezio.melotti', 'serhiy.storchaka', 'dimitern']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'test needed'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue37584'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @dimitern
    Copy link
    Mannequin Author

    dimitern mannequin commented Jul 13, 2019

    I'm running Ubuntu 19.04 on a ZFS mirrored pool, where my home partition is configured with 'utf8only=on' attribute. I've cloned cpython and after running the tests, as described in devguide.python.org, I have 11 test failures:

    == Tests result: FAILURE ==

    389 tests OK.

    11 tests failed:
    test_cmd_line_script test_httpservers test_imp test_import
    test_ntpath test_os test_posixpath test_socket test_unicode_file
    test_unicode_file_functions test_zipimport

    I've been looking for similar or matching reported issues, but could not find one. I'm on the EuroPython 2019 CPython sprint and we'll be looking into this with the help of some of the core devs.

    @dimitern dimitern mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error labels Jul 13, 2019
    @dimitern
    Copy link
    Mannequin Author

    dimitern mannequin commented Jul 13, 2019

    Here's some additional information I found for that specific attribute:

    From the documentation at
    http://dlc.sun.com/osol/docs/content/ZFSADMIN/gazss.html
    (link is dead, but here's where I found the section below: https://zfs-discuss.opensolaris.narkive.com/3NqQVG0H/utf8only-and-normalization-properties#post1)

    utf8only
    Boolean
    Off
    This property indicates whether a file system should reject file names
    that include characters that are not present in the UTF-8 character code
    set. If this property is explicitly set to off, the normalization
    property must either not be explicitly set or be set to none. The
    default value for the utf8only property is off. This property cannot be
    changed after the file system is created.

    @ezio-melotti
    Copy link
    Member

    I think Dimiter was able to fix most of the failures, except test_unicode_file_functions.
    Yesterday during the sprints we were looking at it, and we did some tests using the following snippet:

    import os
    import unicodedata
    upsilon_diaeresis_and_hook = "ϔ"
    
    for form in ["NFC", "NFD", "NFKC", "NFKD"]:                       
      unicode_filename = unicodedata.normalize(form, upsilon_diaeresis_and_hook)
      with open(unicode_filename, "w") as f: f.write(form)
      print("N:", ascii(unicode_filename))
      print([ascii(filename) for filename in os.listdir('.')])

    On ext4 this creates 4 different files: ['\u03d4', '\u03d2\u0308', '\u03ab', '\u03a5\u0308']
    On ZFS with utf8only=true (and I believe normalization=formD), only 2 files are created but each of the 4 filenames can be used to access either of the 2 files.
    This is also the default behavior on Mac.

    The test is already skipped on darwin (Lib/test/test_unicode_file_functions.py:120), and should be skipped for ZFS too (might depend on the exact flags used), however we weren't able to find a portable way to determine the filesystem and flags.

    An alternative is to try creating the 4 files and skip the test if only 2 gets created and if all the names can be used to open these two files, however this might mask other failures. Unless someone can come up with a better way to do this, I think this is the only option.

    In addition, different filesystems that don't exhibit this behavior can be used on Mac, so the test shouldn't be skipped in those cases.

    @vstinner
    Copy link
    Member

    """
    On ext4 this creates 4 different files: ['\u03d4', '\u03d2\u0308', '\u03ab', '\u03a5\u0308']
    On ZFS with utf8only=true (and I believe normalization=formD), only 2 files are created but each of the 4 filenames can be used to access either of the 2 files.
    This is also the default behavior on Mac.

    The test is already skipped on darwin (Lib/test/test_unicode_file_functions.py:120), and should be skipped for ZFS too (might depend on the exact flags used), however we weren't able to find a portable way to determine the filesystem and flags.
    """

    I suggest to create a temporary directory, create the 4 files and see how many files you can using os.listdir(). If you get 4, the FS doesn't normalize anything. If you get less, it's likely that the FS normalizes names.

    @gpshead
    Copy link
    Member

    gpshead commented Dec 13, 2021

    Confirmed.

    Repro: Do an ubuntu 20.04 install and choose "experimental zfs" support during install - https://ubuntu.com/blog/zfs-focus-on-ubuntu-20-04-lts-whats-new). On such a zfs filesystem, the following tests from a ./python -m test.regrtest run fail in 3.10:

    11 tests failed:
    test_cmd_line_script test_httpservers test_imp test_import
    test_ntpath test_os test_posixpath test_socket test_unicode_file
    test_unicode_file_functions test_zipimport

    Move over to a tmpfs and all but test_httpservers now pass. test_httpservers tries to create such a path on /tmp

    ======================================================================
    ERROR: test_undecodable_filename (test.test_httpservers.SimpleHTTPServerTestCase)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/greg/test/cpython/Lib/test/test_httpservers.py", line 400, in test_undecodable_filename
        with open(os.path.join(self.tempdir, filename), 'wb') as f:
    OSError: [Errno 84] Invalid or incomplete multibyte or wide character: '/tmp/tmpnt9ch98x/@test_124227_tmp\udce7w\udcf0.txt'

    I expect any filesystem mounted to reject non-UTF8 pathnames to cause similar failures. Our test suite needs to detect this environment and skip these tests there.

    @gpshead gpshead added 3.10 only security fixes 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Dec 13, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants