New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-41092: Optionally request actual filesize via 'os.path.getsize' #21088
base: main
Are you sure you want to change the base?
Conversation
Provide the ability to get the actual, on-disk size of a file using 'os.path.getsize'. This is helpful when dealing with things like sparse files. Signed-off-by: Stephen Finucane <stephen@that.guru>
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
|
||
.. versionchanged:: 3.6 | ||
Accepts a :term:`path-like object`. | ||
|
||
.. versionchanged:: 3.10 | ||
Add the optional *apparent* parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not wedded to this parameter name. Open to better suggestions
if apparent: | ||
return os.stat(filename).st_size | ||
else: | ||
return os.stat(filename).st_blocks * 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this magic constant coming from? Are you assuming that all file systems and operating systems have 512 bytes block size? POSIX standard does not specify the block size, see https://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking it from fstat(2)
:
The st_blocks field indicates the number of blocks allocated to the file, 512-byte units. (This may be smaller than st_size/512 when the file has holes.)
Our own docs state much the same thing.
st_blocks
Number of 512-byte blocks allocated for file. This may be smaller than st_size/512 when the file has holes.
Note that we should not use st_blksize
because it's the preferred blocksize and presumably not relevant here. From fstat(2)
:
The st_blksize field gives the "preferred" blocksize for efficient file system I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)
Would a (private) constant be beneficial? Or simply a comment, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The man page https://linux.die.net/man/2/fstat reflects the situation on Linux. Python supports many more platforms than just Linux or BSD. I guess our own documentation is also based on the Linux or BSD documentation, too.
The Open Group docs cover the generic POSIX standard. It mentions that some platforms define DEV_BSIZE
in <sys/param.h>
. I also found some code that defines DEV_BSIZE
from BSIZE
. I recommend to expose the constant from posixmodule.c
and only support the feature when the system has a usable DEV_BSIZE
or BSIZE
.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
@stephenfin, please make the changes requested. Thank you! |
Provide the ability to get the actual, on-disk size of a file using
os.path.getsize
. This is helpful when dealing with things like sparse files.https://bugs.python.org/issue41092