-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for a way to determine the actual file size #2
Comments
Thanks for opening this issue. I haven't had time to continue working on these tools since December, but hopefully I will soon, at which point I will implement this. APFS stores the exact file size in bytes in an extended attribute, but these tools do not yet look at xattrs. As you mention, APFS also stores the file size in blocks, rounded up, in the file extents, so this is what is currently being used (that is, we use the value given in As a workaround for now, we can assume that a sequence of continuous If you have any questions or concerns, do let me know. |
👍 |
I'm going to try and have a look at adding support for correct file sizes. If you have any advice let me know asap. Feel a bit lost in the code base. |
@memecode, I will be cleaning up the codebase and adding this soon enough, we don't have support at all for xattrs yet |
In my investigation so far, the inode seems to have an "uncompressedsize" field in later Apple APFS specifications. Which seems like exactly what I want. However I don't get any APFS_TYPE_INODE entries when I iterate over a given file's records. I do get some APFS_TYPE_XATTR records, so I tried to parse them. However they don't seem to be mapping to useful j_xattr_val_t records. I'm adding investigation code to both apfs-list.c (to try and list the right file size when printing a APFS_TYPE_DIR_REC) and also apfs-recover.c (to write the correct file size). The changes I'm making are in a fork here: memecode@97409e6 I've made some of the worker functions not exit(-1) when they error out. I feel that it's better to continue on rather than bail on the whole operation. My hope is to eventually be able to script recovering the whole volume, so these sort of errors could mean missing out on some data. Also in my fork is a python script that recursively exports a whole folder tree. Currently the parameters are hard coded in variables at the top rather than command line parameters which is fine during testing but I'll make it more "user friendly" later. |
Something I don't understand currently is in apfs-list.c's "print_fs_records" function, inside the case APFS_TYPE_DIR_REC, I tried adding a call to get_fs_records for "val->file_id", should be all the file's records... but it fails sometimes with "read_blocks: Reached end-of-file after reading 0 blocks". But if I look at how apfs-recover.c works... it dives down the directory path doing exactly that. Then scans through the record list for extents to write to stdout. |
@memecode According to Jonathan Levin, the file size is stored in a filesystem-agnostic manner (i.e. compatible with HFS+) using the xattr Regarding FS record traversal, I will have to review the code. The current implementation is not great. I will be working on this project for the next few weeks, so I'd appreciate if I could have that time to look over everything and unify the tools. There is currently quite a lot of duplicated code. |
Oooo some progress... I managed to parse through the INODE's xfields for one particular file and extract the INO_EXT_TYPE_DSTREAM object, which gives me the exact file size. So I added that to the code that writes out the extent blocks in the recover command, ie the last block is shorter than the full nx_block_size. This should mean the output file is the right size? Hopefully? I found a reference on how to do it here: |
I haven't seen one of those XATTRS in my investigation so far. I'm going to run my recover script over the whole user dir and see what I get now that the file size might be sorted. Fingers crossed. |
My recovery script has successfully extracted a loadable Logic Pro X project from the corrupt hard disk. So it seems like my file size changes are working well. Which is very reassuring. The downside is that it's pretty slow. Having to reload the FS everytime you list a directory or recover a file is non optimal. Thinking about rewriting the recover app to have a "recover a whole folder in one hit" mode. So that most of that grinding goes away. |
Q: is there some reason this is in C rather than C++? |
@memecode I may be porting it to C++ for OOP, need to learn the language, though — will be deciding whether it's worth my time after doing some reading and making some design choices. |
I don't mean rewrite the whole thing, but just start using C++ to manage memory and do automatic cleanup for you. Let it slowly creep in and make life easier. My scripted recovery doesn't work as well as I'd hope. It's failing to list some directories and extract some files. I'm trying to resolve these issues into a concrete example. Although I can't post the actual drive image somewhere. |
@memecode Porting to C++ has been on my list of things to explore for months now, and the codebase is still small enough that a complete rewrite in C++ wouldn't be a big deal for me. Having said that, I may just stick with C; it depends on whether I think the benefits are worth it. Programming in a non-OOP fashion has its advantages. |
@memecode says:
I've been looking over the revised spec, and that is indeed the file size if the file is not compressed via decmpfs. I will test against my problematic disk image from Nov 2019 to see if |
After much too long of a wait, I have finally resolved this issue. Contrary to what Levin told me at the time (in what was a brief Twitter DM), the xattr |
Hi,
I have come across the project while trying to recover my partly corrupted APFS storage. I managed to list all the files via
apfs-list
. It works like a charm. Moreover,apfs-recover
provides me the content part. Yet I have not managed to find the right way to determine the actual file size. All blocks are multiple of 4096 and I see no other information. I have no idea what to do with binary files as some file formats can be unable to open.According to Apple File System Reference, j_file_extent_val_t is specified to be multiple of the block size.
Is there a way to determine the actual files size. If not, how does
stdio::fread
know when it should return zero if no more bytes available? There must be some information about the total size of the file, otherwise the standard FS API would be inconvenient.Thank you,
The text was updated successfully, but these errors were encountered: