Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIX module incompatible with ZFS on FreeBSD #113078

Closed
renatoalencar opened this issue Dec 14, 2023 · 3 comments
Closed

POSIX module incompatible with ZFS on FreeBSD #113078

renatoalencar opened this issue Dec 14, 2023 · 3 comments
Labels
OS-freebsd type-bug An unexpected behavior, bug, or error

Comments

@renatoalencar
Copy link

renatoalencar commented Dec 14, 2023

Bug report

Bug description:

When running tests on FreeBSD 14, I've found a problem with the current wrapper implementations of POSIX's stat and makedev. The first problem resides in the fact that the code in https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L937 assumes dev_t is always signed and positive, and in makedev not following the same interfaces from major and minor, which uses int and unsigned int for the later two.

On stat

In https://github.com/python/cpython/blob/main/Lib/test/test_posix.py#L694-698, it's checked if st.st_dev is positive, but it can fail for reasons that I'll explain below. For now we should know that _PyLong_FromDev in https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L2521, uses PyLong_FromLongLong which may interpret unsigned 64 bit integers incorrectly.

st = posix.stat(os_helper.TESTFN)
dev = st.st_dev
self.assertIsInstance(dev, int)
self.assertGreaterEqual(dev, 0)

Leads to this test error:

======================================================================
FAIL: test_makedev (test.test_posix.PosixTester.test_makedev)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/renato/Desktop/cpython/Lib/test/test_posix.py", line 698, in test_makedev
    self.assertGreaterEqual(dev, 0)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
AssertionError: -2702552447326613027 not greater than or equal to 0
 
----------------------------------------------------------------------
Ran 130 tests in 1.239s
 
FAILED (failures=1, skipped=36)
test test_posix failed
test_posix failed (1 failure)
 
== Tests result: FAILURE ==

In fact when looking into FreeBSDs C headers, that is known that dev_t is uint64 and not int64, and even glibc uses __UQUAD_TYPE.

Although st_dev shouldn't be negative, it doesn't cause any harm at first glance. It could probably crash other libraries and routines that expect it to be positive or unsigned tho.

On makedev

By assuming that dev_t is always positive the implementations of makedev doesn't expect the major and minor parameters to be unsigned. Therefore, being int, they may overflow once they use the results from stat+major/minor giving the following test error:

======================================================================
ERROR: test_makedev (test.test_posix.PosixTester.test_makedev)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/renato/Desktop/cpython/Lib/test/test_posix.py", line 700, in test_makedev
    major = posix.major(dev)
            ~~~~~~~~~~~^^^^^
OverflowError: can't convert negative int to unsigned
 
----------------------------------------------------------------------
Ran 130 tests in 1.259s
 
FAILED (errors=1, skipped=36)
test test_posix failed
test_posix failed (1 error)
 
== Tests result: FAILURE ==

Therefore, the whole round trip could cause errors in applications following this same pattern. The FreeBSD man pages for makedev, major and minor also mentions that the return values for major and minor could span the complete range of an int:

RETURN VALUES

       The major() and minor() macros return numbers whose value can span  the
       complete	range of an int.

On Linux, the return values are always unsigned int:

SYNOPSIS

       #include <sys/sysmacros.h>

       dev_t makedev(unsigned int maj, unsigned int min);

       unsigned int major(dev_t dev);
       unsigned int minor(dev_t dev);

ZFS and FreeBSD

After asking on the FreeBSD forums trying to understand why that st_dev was so large, after all it could be a bug on FreeBSD iself. Some users reported that in fact, ZFS uses such high device IDs. The reason, is that it doesn't refer to real devices, but virtual ones so OpenZFS generates the upper 56 bits of the device ID randomly, and therefore there's actually a 50% probability of that happening since the MSB is randomly generated.

More details from ralphbsz:

Modern file system software (such as ZFS) no longer has a 1-to-1 correspondence between disk drive and file system. For example, the home directory of my server is physically stored on disks /dev/ada2p1 and /dev/ada3p8(it is mirrored), which have device numbers 0x98 and 0xb2, but because I use GPT labels, ZFS finds them under /dev/gpt/hd1[46]_home which has device numbers 0xa0 and 0xcb. And in ZFS, one pool (which corresponds to a set of block devices, which are then parts of physical disks) can contain multiple file systems, so it wouldn't even work to construct a fake device ID, for example by concatenating the ones of the physical disks. Today, the file system ID has a m-to-n relationship with the device ID of the disks. The solution to this is that ZFS (and other such file systems) have to create virtual (that is: fake!) st_dev numbers. It so happens that ZFS chooses very large 64-bit numbers for st_dev. On your machine it happens to have the highest bit set; on my machine, it happens to be 3876826178434374726 for the home file system (which is a little smaller, and doesn't happen to have the highest bit set).

And from the OpenZFS code:

/*
* The fsid is 64 bits, composed of an 8-bit fs type, which
* separates our fsid from any other filesystem types, and a
* 56-bit objset unique ID.  The objset unique ID is unique to
* all objsets open on this system, provided by unique_create().
* The 8-bit fs type must be put in the low bits of fsid[1]
* because that's where other Solaris filesystems put it.
*/

I couldn't dig deeper into Linux's implementation of ZFS, but testing it on a virtual machine the same bug doesn't happen.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Other

Linked PRs

@renatoalencar renatoalencar added the type-bug An unexpected behavior, bug, or error label Dec 14, 2023
renatoalencar pushed a commit to renatoalencar/cpython that referenced this issue Dec 14, 2023
Following the implementions from both FreeBSD and GNU glibc,
`PyLong_FromDev should actually be unsigned. This can break
on FreeBSD systems using ZFS, since they wouldn't generate
device IDs with 56 upper bits randomly, therefore the MSB
too.

Also, following the documentation from FreeBSD and Linux,
`makedev` should accept unsigned 32 bit integers. The reason
to that is because on FreeBSD `major` and `minor` can return
anything on the 32 bit int range and on Linux it really specifies
that it returns a unsigned integer.
renatoalencar added a commit to renatoalencar/cpython that referenced this issue Dec 14, 2023
Following the implementions from both FreeBSD and GNU glibc,
`PyLong_FromDev should actually be unsigned. This can break
on FreeBSD systems using ZFS, since they wouldn't generate
device IDs with 56 upper bits randomly, therefore the MSB
too.

Also, following the documentation from FreeBSD and Linux,
`makedev` should accept unsigned 32 bit integers. The reason
to that is because on FreeBSD `major` and `minor` can return
anything on the 32 bit int range and on Linux it really specifies
that it returns a unsigned integer.
renatoalencar added a commit to renatoalencar/cpython that referenced this issue Dec 14, 2023
Following the implementions from both FreeBSD and GNU glibc,
`PyLong_FromDev should actually be unsigned. This can break
on FreeBSD systems using ZFS, since they wouldn't generate
device IDs with 56 upper bits randomly, therefore the MSB
too.

Also, following the documentation from FreeBSD and Linux,
`makedev` should accept unsigned 32 bit integers. The reason
to that is because on FreeBSD `major` and `minor` can return
anything on the 32 bit int range and on Linux it really specifies
that it returns a unsigned integer.
@serhiy-storchaka
Copy link
Member

Thank you for your detailed report. This issue looks like a duplicate of #89928. Please test whether #31794 fixes it.

@renatoalencar
Copy link
Author

Thank you for your detailed report. This issue looks like a duplicate of #89928. Please test whether #31794 fixes it.

Sure, there's one thing tho. On you're implementation, NODEV shouldn't actually be there. makedev would never return a -1 since it only applies bitwise operations on top of major and minor. It could also be simples, since it doesn't need extra checks for NODEV.

Also, major and minor functions probably shouldn't expect for -1 as inputs as well. Since they're also only bitwise operations and don't have any semantics beyond that.

@vstinner
Copy link
Member

This issue is a duplicate of #89928.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-freebsd type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants