Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for a way to determine the actual file size #2

Closed
avoloshko opened this issue Mar 14, 2020 · 16 comments
Closed

Looking for a way to determine the actual file size #2

avoloshko opened this issue Mar 14, 2020 · 16 comments
Labels
enhancement New feature or request

Comments

@avoloshko
Copy link

avoloshko commented Mar 14, 2020

Hi,

I have come across the project while trying to recover my partly corrupted APFS storage. I managed to list all the files via apfs-list. It works like a charm. Moreover, apfs-recover provides me the content part. Yet I have not managed to find the right way to determine the actual file size. All blocks are multiple of 4096 and I see no other information. I have no idea what to do with binary files as some file formats can be unable to open.

According to Apple File System Reference, j_file_extent_val_t is specified to be multiple of the block size.

Is there a way to determine the actual files size. If not, how does stdio::fread know when it should return zero if no more bytes available? There must be some information about the total size of the file, otherwise the standard FS API would be inconvenient.

Thank you,

@jivanpal jivanpal added the enhancement New feature or request label Mar 21, 2020
@jivanpal
Copy link
Owner

Thanks for opening this issue. I haven't had time to continue working on these tools since December, but hopefully I will soon, at which point I will implement this.

APFS stores the exact file size in bytes in an extended attribute, but these tools do not yet look at xattrs. As you mention, APFS also stores the file size in blocks, rounded up, in the file extents, so this is what is currently being used (that is, we use the value given in j_file_extent_val_t::len_and_flags).

As a workaround for now, we can assume that a sequence of continuous 0x00 bytes at the end of a file generated by apfs-recover are not part of the actual file, and so the file can be truncated. You can do this with the aid of a short program and shell script, which I have now added under the unstable branch in /supplemental-tools.

If you have any questions or concerns, do let me know.

@avoloshko
Copy link
Author

Thanks for opening this issue. I haven't had time to continue working on these tools since December, but hopefully I will soon, at which point I will implement this.

APFS stores the exact file size in bytes in an extended attribute, but these tools do not yet look at xattrs. As you mention, APFS also stores the file size in blocks, rounded up, in the file extents, so this is what is currently being used (that is, we use the value given in j_file_extent_val_t::len_and_flags).

As a workaround for now, we can assume that a sequence of continuous 0x00 bytes at the end of a file generated by apfs-recover are not part of the actual file, and so the file can be truncated. You can do this with the aid of a short program and shell script, which I have now added under the unstable branch in /supplemental-tools.

If you have any questions or concerns, do let me know.

👍

@memecode
Copy link

memecode commented Aug 7, 2020

I'm going to try and have a look at adding support for correct file sizes. If you have any advice let me know asap. Feel a bit lost in the code base.

@jivanpal
Copy link
Owner

jivanpal commented Aug 7, 2020

@memecode, I will be cleaning up the codebase and adding this soon enough, we don't have support at all for xattrs yet

@memecode
Copy link

memecode commented Aug 7, 2020

In my investigation so far, the inode seems to have an "uncompressedsize" field in later Apple APFS specifications. Which seems like exactly what I want. However I don't get any APFS_TYPE_INODE entries when I iterate over a given file's records. I do get some APFS_TYPE_XATTR records, so I tried to parse them. However they don't seem to be mapping to useful j_xattr_val_t records.

I'm adding investigation code to both apfs-list.c (to try and list the right file size when printing a APFS_TYPE_DIR_REC) and also apfs-recover.c (to write the correct file size).

The changes I'm making are in a fork here: memecode@97409e6

I've made some of the worker functions not exit(-1) when they error out. I feel that it's better to continue on rather than bail on the whole operation. My hope is to eventually be able to script recovering the whole volume, so these sort of errors could mean missing out on some data. Also in my fork is a python script that recursively exports a whole folder tree. Currently the parameters are hard coded in variables at the top rather than command line parameters which is fine during testing but I'll make it more "user friendly" later.

@memecode
Copy link

memecode commented Aug 7, 2020

Something I don't understand currently is in apfs-list.c's "print_fs_records" function, inside the case APFS_TYPE_DIR_REC, I tried adding a call to get_fs_records for "val->file_id", should be all the file's records... but it fails sometimes with "read_blocks: Reached end-of-file after reading 0 blocks". But if I look at how apfs-recover.c works... it dives down the directory path doing exactly that. Then scans through the record list for extents to write to stdout.

@jivanpal
Copy link
Owner

jivanpal commented Aug 7, 2020

In my investigation so far, the inode seems to have an "uncompressedsize" field in later Apple APFS specifications.

@memecode According to Jonathan Levin, the file size is stored in a filesystem-agnostic manner (i.e. compatible with HFS+) using the xattr com.apple.decmpfs.

Regarding FS record traversal, I will have to review the code. The current implementation is not great. I will be working on this project for the next few weeks, so I'd appreciate if I could have that time to look over everything and unify the tools. There is currently quite a lot of duplicated code.

@memecode
Copy link

memecode commented Aug 8, 2020

Oooo some progress... I managed to parse through the INODE's xfields for one particular file and extract the INO_EXT_TYPE_DSTREAM object, which gives me the exact file size. So I added that to the code that writes out the extent blocks in the recover command, ie the last block is shorter than the full nx_block_size. This should mean the output file is the right size? Hopefully?

memecode@05ab621

I found a reference on how to do it here:
https://github.com/nkondakov/apfs-fuse/blob/670e45ef92996f604fd6cd9a0b56d84fc6c3df51/ApfsLib/ApfsDir.cpp
Particularly how to update 'offset' to keep it 8byte aligned.

@memecode
Copy link

memecode commented Aug 8, 2020

com.apple.decmpfs

I haven't seen one of those XATTRS in my investigation so far.

I'm going to run my recover script over the whole user dir and see what I get now that the file size might be sorted. Fingers crossed.

@memecode
Copy link

memecode commented Aug 8, 2020

My recovery script has successfully extracted a loadable Logic Pro X project from the corrupt hard disk. So it seems like my file size changes are working well. Which is very reassuring. The downside is that it's pretty slow. Having to reload the FS everytime you list a directory or recover a file is non optimal. Thinking about rewriting the recover app to have a "recover a whole folder in one hit" mode. So that most of that grinding goes away.

@memecode
Copy link

memecode commented Aug 8, 2020

Q: is there some reason this is in C rather than C++?

@jivanpal
Copy link
Owner

jivanpal commented Aug 8, 2020

@memecode I may be porting it to C++ for OOP, need to learn the language, though — will be deciding whether it's worth my time after doing some reading and making some design choices.

@memecode
Copy link

memecode commented Aug 9, 2020

@memecode I may be porting it to C++ for OOP, need to learn the language, though — will be deciding whether it's worth my time after doing some reading and making some design choices.

I don't mean rewrite the whole thing, but just start using C++ to manage memory and do automatic cleanup for you. Let it slowly creep in and make life easier.

My scripted recovery doesn't work as well as I'd hope. It's failing to list some directories and extract some files. I'm trying to resolve these issues into a concrete example. Although I can't post the actual drive image somewhere.

@jivanpal
Copy link
Owner

jivanpal commented Aug 9, 2020

@memecode Porting to C++ has been on my list of things to explore for months now, and the codebase is still small enough that a complete rewrite in C++ wouldn't be a big deal for me. Having said that, I may just stick with C; it depends on whether I think the benefits are worth it. Programming in a non-OOP fashion has its advantages.

@jivanpal
Copy link
Owner

@memecode says:

In my investigation so far, the inode seems to have an "uncompressedsize" field in later Apple APFS specifications. Which seems like exactly what I want.

I've been looking over the revised spec, and that is indeed the file size if the file is not compressed via decmpfs. I will test against my problematic disk image from Nov 2019 to see if j_inode_val_t::pad2/j_inode_val_t::uncompressed_size actually holds the file size on that version of APFS.

@jivanpal
Copy link
Owner

After much too long of a wait, I have finally resolved this issue. drat recover now gets the proper file size on disk from the inode. Thanks to @memecode for pointing out that the xfields contain a dstream with the file size. It seems that, at the time, I failed to realise that xfields and xattrs are different things!

Contrary to what Levin told me at the time (in what was a brief Twitter DM), the xattr com.apple.decmpfs is only used for compressed files to indicate the uncompressed file size, in the same manner that it is used under HFS for file-system agnostic compression; see here. I would guess that j_inode_val_t::uncompressed_size in later APFS revisions also serves this purpose, since that field still does not appear to be used for uncompressed files. In any case, we currently just recover the data as it appears on disk, which means that compressed files will be recovered in their compressed form, not uncompressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants