Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS os.statvfs() has rollover for >4TB disks at each 4TB (32bit counter overflow?) #87804

Closed
sanderjo mannequin opened this issue Mar 27, 2021 · 19 comments
Closed

MacOS os.statvfs() has rollover for >4TB disks at each 4TB (32bit counter overflow?) #87804

sanderjo mannequin opened this issue Mar 27, 2021 · 19 comments
Labels
3.9 only security fixes extension-modules C modules in the Modules dir OS-mac release-blocker type-bug An unexpected behavior, bug, or error

Comments

@sanderjo
Copy link
Mannequin

sanderjo mannequin commented Mar 27, 2021

BPO 43638
Nosy @ronaldoussoren, @ned-deily, @Safihre, @sanderjo

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-03-27.10:36:42.644>
labels = ['OS-mac', 'type-bug', '3.9']
title = 'MacOS os.statvfs() has rollover for >4TB disks at each 4TB (32bit counter overflow?)'
updated_at = <Date 2021-03-29.08:41:43.606>
user = 'https://github.com/sanderjo'

bugs.python.org fields:

activity = <Date 2021-03-29.08:41:43.606>
actor = 'sanderjo'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['macOS']
creation = <Date 2021-03-27.10:36:42.644>
creator = 'sanderjo'
dependencies = []
files = []
hgrepos = []
issue_num = 43638
keywords = []
message_count = 4.0
messages = ['389596', '389607', '389668', '389674']
nosy_count = 4.0
nosy_names = ['ronaldoussoren', 'ned.deily', 'Safihre', 'sanderjo']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue43638'
versions = ['Python 3.9']

Linked PRs

@sanderjo
Copy link
Mannequin Author

sanderjo mannequin commented Mar 27, 2021

MacOS BigSur (and older), python 3.9.2 (and older)

For disks >4TB, os.statvfs() shows a wrong value for available space: too low, and always rollover at each 4TB.
As 4TB = 2^42, hypothesis: rollover in 32bit counter (with 10bit blocksize)

Example:

"df -m" does show the correct available space

df -m /Volumes/Frank/
Filesystem 1M-blocks Used Available Capacity iused ifree %iused Mounted on
//frank@SynologyBlabla._smb._tcp.local/Frank 21963360 2527744 19435615 12% 2588410474 19902070164 12% /Volumes/Frank

So available space 19902070164 MB, so about 18.5 TB. Good.

Now python's os.statvfs():

>>> s = os.statvfs("/Volumes/Frank")
>>> s.f_bavail * s.f_frsize / 1024**2
2658399.39453125

So 2.5TB, and thus wrong

The difference is 16777216 MB which is exactly 4 times 4TB.

Problem seems to be in MacOS statvfs() itself; reproducable with a few lines of C code.

We have implemented a workaround in our python program SABnzbd to directly use MacOS' libc statfs() call (not statvfs() ). A solution / workaround in python itself would be much nicer.

No problem with python on Linux with >4TB drives.

@sanderjo sanderjo mannequin added 3.9 only security fixes OS-mac type-bug An unexpected behavior, bug, or error labels Mar 27, 2021
@sanderjo
Copy link
Mannequin Author

sanderjo mannequin commented Mar 27, 2021

Correction on typo in original post / to be clear:

From the "df -m /Volumes/Frank/", Available is 19435615 in unity of 1MB blocks, so 19435615 MB. Which is 18.5 TB. All correctly reported by "df -m". But not by os.statvfs()

@ronaldoussoren
Copy link
Contributor

As you note in the title this is a 32-bit overflow in the statvfs system API, the struct it uses contains 32-bit values.

@sanderjo
Copy link
Mannequin Author

sanderjo mannequin commented Mar 29, 2021

OK. What would be a solution from/for Python to get the correct available space on a MacOS system?

In SABnzbd we implemented a workaround with a direct call to MacOS C lib's statfs(). See https://github.com/sabnzbd/sabnzbd/blob/develop/sabnzbd/filesystem.py#L948-L989

IMHO not a great solution for a python programmer. Could python's os.statvfs() use the correct (64bit) info from statfs()?

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@Safihre
Copy link

Safihre commented Apr 25, 2022

@roachcord3
Copy link

roachcord3 commented Oct 8, 2022

It seems to me that since statvfs is POSIX-compliant but not guaranteed to give 64-bit results, there ought to be either some sort of shim that will call statfs64 if it is available, and otherwise choose statvfs for compatibility, or statfs64 should just be callable as its own function.

I dug into it some more, and found that (from what I can tell, anyway) Linux decided to change the types in the struct used by statfs (and therefore, by extension, statvfs) but macOS has kept it the same as the old BSD implementation–specifically, its statvfs wraps statfs like back in the old days, and its statfs is still 32-bit.

Anyway, to help incentivize a fix, I have posted a bounty for a fix for this issue.

@ned-deily
Copy link
Member

I am not an expert on this but I think the statfs description you are linking to above is out-of-date. The current macOS man page for statfs states in part:

     The statfs64 and fstatfs64 routines are equivalent to their corresponding non-64-suffixed routine, when
     64-bit inodes are in effect.  They were added before there was support for the symbol variants, and so
     are now deprecated.  Instead of using these, set the _DARWIN_USE_64_BIT_INODE macro before including
     header files to force 64-bit inode support.

     The statfs64 structure used by these deprecated routines is the same as the statfs structure when
     64-bit inodes are in effect (see above).

and the _DARWIN_USE_64_BIT_INODE macro is available from at least macOS 10.5 on. So I would think it should be possible to implement a workaround within _posixmodule.c using macOS's statfs. A PR would be welcome :)

@roachcord3
Copy link

@ned-deily Oooh excellent find! Unfortunately I'm not much of a C programmer so I won't be taking a crack at it, but it seems like my bounty should be pretty easy to claim for someone who does know C.

@ronaldoussoren
Copy link
Contributor

Implementing os.statvfs using statfs(2) isn't as easy as I'd like: the f_namemax field of struct statvfs is not available in struct statfs. It is possibly to use pathconf(2) instead, or keep using statvfs(2) to get this field.

@roachcord3
Copy link

@ronaldoussoren did you have any luck using @ned-deily's find? It might be possible to just compile with that macro set and then the current implementation using statvfs might "just work," returning 64 bit values instead.

@ronaldoussoren
Copy link
Contributor

The block counts reported by statvfs are always 32-bit, the macro's mentioned by Ned are for statfs (not the lack of a 'v' in the name).

I've looked at the statvfs source code (yay opensource) for macOS 13 and even f_namemax is easy to replicate: libc just sets that field to a fixed value and not something that is filesystem specific. In hindsight that was expected, all filesystems users are likely to encounter support 255 characters in file names...

Longer term it might be interesting to expose statfs directly, but that's for a different issue and something that needs design work because the statfs structures used by Linux and macOS are different (and I haven't looked at other OS-es).

ronaldoussoren added a commit to ronaldoussoren/cpython that referenced this issue Nov 18, 2022
On macOS the statvfs interface returns block counts as
32-bit integers, and that results in bad reporting for
larger disks.

Therefore reimplement statvfs in terms of statfs, which
does use 64-bit integers for block counts.

Tested using a sparse filesystem image of 100TB.
@roachcord3
Copy link

@ronaldoussoren ah, thank you for explaining! I misunderstood some things! Looks like you have found a path forward though, because I see a PR. That's awesome, thank you for working on this!!

@iritkatriel iritkatriel added the extension-modules C modules in the Modules dir label Nov 29, 2023
ronaldoussoren added a commit that referenced this issue Feb 10, 2024
On macOS the statvfs interface returns block counts as
32-bit integers, and that results in bad reporting for
larger disks.

Therefore reimplement statvfs in terms of statfs, which
does use 64-bit integers for block counts.

Tested using a sparse filesystem image of 100TB.
@ronaldoussoren
Copy link
Contributor

I have (finally!) merged my PR. I won't merge the PR into 3.11 and 3.12, because the change is a bit too large for that to my taste.

@sobolevn
Copy link
Member

Sorry, I've missed this PR during the review phase :(
I have several comments with small changes, PR is ready.
@ronaldoussoren can you please take a look? :)

@sobolevn sobolevn reopened this Feb 10, 2024
sobolevn added a commit to sobolevn/cpython that referenced this issue Feb 10, 2024
@encukou
Copy link
Member

encukou commented Feb 12, 2024

After the error handling fix, the refleaks buildbot started failing.
If we don't manage to fix it today, we need to revert for alpha 4 (scheduled tomorow), and fix it for the next alpha.

@encukou
Copy link
Member

encukou commented Feb 12, 2024

Reposting investigation from @Eclips4

Bisected to #115236
I spent about an hour to try figure out what's happening.
First of all, for some reason it's not reproducible on non-freethreading build. Changes in #115236 also isn't > related to free-threading.

We have this macros:

    do {                                           \
        if (item == NULL) {                        \
            Py_DECREF(v);                          \
            return NULL;                           \
        }                                          \
        PyStructSequence_SET_ITEM(v, index, item); \
    } while (0)                                    \

If we rewrite this macros like this:

#define SET_ITEM(v, index, item)                   \
    do {                                           \
        PyStructSequence_SET_ITEM(v, index, item); \
    } while (0)                                    \

Refleaks are gone!! Why???
I've checked it and what is amazing is that the body of this if statement isn't executed at all..

@sobolevn
Copy link
Member

Working on the fix! Thanks for the report.

@vstinner
Copy link
Member

After the error handling fix, the refleaks buildbot started failing.

Apparently, the leak only occurs on macOS, on the ARM64 MacOS M1 Refleaks NoGIL 3.x worker: https://buildbot.python.org/all/#/builders/1368/builds/184

6 tests failed:
    test_largefile test_os test_pickle test_pickletools test_posix
    test_shutil

sobolevn added a commit that referenced this issue Feb 12, 2024
@sobolevn
Copy link
Member

sobolevn commented Feb 12, 2024

Looks like new runs of m1 free-threaded refleaks buildbot are successful: https://buildbot.python.org/all/#/builders/1368/builds/188

We can hopefully close this issue now :)

fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this issue Feb 14, 2024
On macOS the statvfs interface returns block counts as
32-bit integers, and that results in bad reporting for
larger disks.

Therefore reimplement statvfs in terms of statfs, which
does use 64-bit integers for block counts.

Tested using a sparse filesystem image of 100TB.
fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this issue Feb 14, 2024
fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this issue Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes extension-modules C modules in the Modules dir OS-mac release-blocker type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

8 participants