Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ZPL attribute tables and embedded data #2

Closed
1 of 2 tasks
hiliev opened this issue Mar 27, 2018 · 27 comments
Closed
1 of 2 tasks

Support for ZPL attribute tables and embedded data #2

hiliev opened this issue Mar 27, 2018 · 27 comments

Comments

@hiliev
Copy link
Owner

hiliev commented Mar 27, 2018

In order to be able to use py-zfs-rescue on pools created by modern OSes, the following two enhancements are needed:

  • Support for block pointers with embedded data
  • Support for ZPL attribute tables as an alternative to znode_phys_t (bonus data type 0x2c)
@eiselekd
Copy link
Collaborator

I try to take a look at this and see weather I can make progress in that direction.

@eiselekd
Copy link
Collaborator

eiselekd commented Mar 28, 2018

@hiliev I've implemented the embedded data in the dnode and detected the System Attribute bonusbuffer, however I'm trying to understand the format of its content. Is it a zap encoded buffer?

append: I think I found it in zdb: dump_znode(objset_t *os, uint64_t object, void *data, size_t size)
I have to first scan the SA master node, then scan the "SA attr layouts" and "SA attr registration" dnodes, then use that layout to scan the SA ? Is it really that complicated?

@hiliev
Copy link
Owner Author

hiliev commented Mar 28, 2018

This seems to be a bit more complicated than expected. ZFS has an attribute registration mechanism - SA. There is a bunch of layout tables that define the attributes and their offsets. Those are stored ZAP-like in several system objects. The order of the attributes may differ from pool to pool, therefore that system objects have to be parsed and the tables analysed. The objects are seen in your output from the other issue:

0:[SA master node] ...
1:[ZFS delete queue] ...
2:[ZFS directory] ...
3:[SA attr registration] ...
4:[SA attr layouts] ...

The SA master node (judging from the hex dump, although I haven't really decompressed the embedded data) appears to be a MicroZAP that holds the object IDs of the SA attribute registration and the SA attribute layouts objects.

The attributes in the bonus buffer itself are prefixed with a sa_hdr_phys. The index of the layout used is contained in the sa_layout_info field. The sa_impl.h header is very helpful.

@eiselekd
Copy link
Collaborator

eiselekd commented Mar 28, 2018

@hiliev I think you are right, the SA master node (index 32 actually) contains:

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
        32    1   128K    512      0     512    512  100.00  SA master node (K=inherit) (Z=inherit)
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 0
	microzap: 512 bytes, 2 entries

		REGISTRY = 35 
		LAYOUTS = 36

Is it possible that I ask a question: In my test pool datapool I have created a
dataset zfs create datapool/datadir where my actual target files is located, called test.bin .
Now I look at zdb -ddddddd datapool and try to see how this pool is referenced starting from the MOS. But there is so many information that I cannot make out a structure.

One thing I notices is that py-zfs-rescue does collect the toplevel MOS dnodes with type 16 as the target datasets to archive. The root dataset "datapool" seems is in this set (it says there is data in it, however there are no objects inside it), however not the child dataset "datapool/datadir". How is the child-dataset traversal done when starting from the MOS ?
More confusing is that the search for type-16 return 3 datasets in the MOS of which 2 state "0 uncompressed bytes". The Dataset labeled information in the zdb dump on the other hand lists the hirarchical datasets present....Can you recommend some reading to understand how the whole structure is traversed?

@hiliev
Copy link
Owner Author

hiliev commented Mar 28, 2018

I never really looked into how parent-child relationships are implemented. In my case, the MOS was broken and the root dataset was lost. I was happy to just be able to find all accessible datasets and rescue their content.

@eiselekd
Copy link
Collaborator

@hiliev I have a question:

self._asize = (1 + (qword0 & 0xffffff)) << 9
does a +1 for asize calculation. Is the +1 a saveguard?

@eiselekd
Copy link
Collaborator

@hiliev The child dataset dependency seems to be retrieved by :

  • DSLdataset.ds_dir_obj points to the DSLdirectory
  • DSLdirectory.child_dir_zapobj points to a ZAP which a list of child directory names-ids pairs which point to the child DSLdirectory
  • DSLdirectory.head_dataset_obj of the child's DSLdirectory points the its dataset

@eiselekd
Copy link
Collaborator

@hiliev :
pushed PR for
#6
Maybe you can close this issue now...

@hiliev
Copy link
Owner Author

hiliev commented Mar 30, 2018

Let me test it on the pool of my server first. As for the _asize value, ZFS stores certain non-zero values in a biased format, i.e. as an offset from the minimum value, in that particular case equal to 1.

@eiselekd
Copy link
Collaborator

@hiliev : Just want to note that I have succeeded to retrive my files now. Wanna thank you for the py-zfs-rescue repo and the hints you gave. The unsorted patches are on https://github.com/eiselekd/dumpbin-py-zfs-rescue, maybe someone will find it useful in the future.

@eiselekd
Copy link
Collaborator

eiselekd commented Apr 16, 2018

@hiliev : pushed also https://github.com/eiselekd/dumpbin-py-zfs-rescue/blob/master/zfs/sa.py#L59 and https://github.com/eiselekd/dumpbin-py-zfs-rescue/blob/master/zfs/dnode.py#L194 which implemente a more complete handling of system attribute and bonustype 0x2c. With it symlinks are also handled. Are you intereted in getting a PR?

@hiliev
Copy link
Owner Author

hiliev commented Apr 16, 2018

Sorry, I'm currently moving to a different country and my FreeNAS system is offline in a locker room and I'm very slow at testing and accepting PRs. I'll be able to work on it again in about a month.

@eiselekd
Copy link
Collaborator

@hiliev : ok, I understand. If you have time then let me know and I will supply PRs. There is one error that you might be interested in. https://github.com/eiselekd/dumpbin-py-zfs-rescue/blob/d21f4c28acee0d26ab3ba227fc7d8b03881dffd8/zfs/blocktree.py#L85 In the original repo the levelcache is a flat array that is shared between levels. I changed it to be a tree instead.

@eiselekd
Copy link
Collaborator

Hi again, If you are interested and have time now I can supply patches. (As for py-zfs-rescue enabled me to restore my data I thought I need to contribute back). Tell me which area you want to address first.

@hiliev
Copy link
Owner Author

hiliev commented Jul 17, 2018

Hi @eiselekd, I'm glad my little project helped your in recovering your data. I had great plans for it and still have a backlog of todo's geared towards making it more user friendly and in particular turning it into a visual ZFS debugger and explorer. Unfortunately, working at a startup company in a completely different field leaves me with zero spare time for this project. If you are willing to take over the CLI branch and develop it further, please feel free to do so. The areas that needs attention are perhaps adding a proper command-line interface, pool scrub functionality, and support for raidz with higher parity (e.g., raidz2). If you wish, I can also make you a project collaborator, so you don't have to fork a separate version.

@eiselekd
Copy link
Collaborator

@hiliev You can add me as a collaborator and maybe give me access to a special branch that I can hack around with. I could transfer the improvments from https://github.com/eiselekd/dumpbin-py-zfs-rescue back to your repo:

  • lz4 decompression (already pulled)
  • fletcher4 cksum
  • first level child datasets
  • blkptr with embedded data (already pulled)
  • improved block server protocol
  • bigger than 2TB disk support
  • support SystemAttributes, bonus type 0x2c (partially pulled)
  • variable asize (already pulled)
  • fuse (llfuse) interface for recovery
    I could also contribute:
  • linux losetup or similar based testing
  • add a command line interface as you mentioned to make the configuration interactive

@hiliev
Copy link
Owner Author

hiliev commented Jul 17, 2018

I sent you an invitation to become a collaborator. It gives you push access and you should be able to create branches on your own. When I find the time, I'll hack on the GUI stuff in a separate branch too.

@eiselekd
Copy link
Collaborator

Accepted, thanks.

@eiselekd
Copy link
Collaborator

eiselekd commented Jul 18, 2018

@hiliev : Added pull request #12 which add (from list above):

  • fletcher4 cksum (please pull)
  • first level child datasets (was already pulled)
  • improved block server protocol (please pull)
  • bigger than 2TB disk support (please pull)
  • support SystemAttributes, bonus type 0x2c (please pull)
  • linux losetup or similar based testing (please pull)

@hiliev
Copy link
Owner Author

hiliev commented Aug 1, 2018

Do I have to accept the pull request explicitly or your commit rights allow you to do it?

@eiselekd
Copy link
Collaborator

eiselekd commented Aug 1, 2018

I didnt try to push it myself. It is also that even if I tested the code in linux (subfolder test/Makefile) I didnt test it for disks from FreeNAS. I have been setting up a home NAS recenty (with FreeNAS as in a KVM and a SATA controller card passthrough), however I find it a bit hard to work with because the /usr/ports is disabled and I cannot work FreeBSD style with it except within jails however I'm not familiar with those. I didnt find any description on howto enable /usr/ports in FreeNAS again. I could run in FreeBSD but then I'm not shure what the delta to FreeNAS is there.

@hiliev
Copy link
Owner Author

hiliev commented Aug 2, 2018

FreeNAS is based on FreeBSD-STABLE kernels and the ZFS code should be the same as in the vanilla FreeBSD. My FreeNAS box is back online and I'll be able to test the code.

@eiselekd
Copy link
Collaborator

eiselekd commented Aug 2, 2018

I can also try it out on a FreeBSD box in the weekend.

@eiselekd
Copy link
Collaborator

eiselekd commented Aug 3, 2018

I tested on FreeBSD 11.2 and mdconfig and zpool create datapool0 raidz /dev/${md0} /dev/${md1} /dev/${md2} and was able to read files back. zfs create datapool0/datadir (childsets) on the other hand seems to be handled differently on FreeBSD. The child dataset is found in the ZAP but In the childdataset code in zfs_rescue.py child = mos[v] will return None. So child datasets are only working for linux.

@eiselekd
Copy link
Collaborator

eiselekd commented Aug 3, 2018

Conclusion from my side: Ok to push but create an issue to implement child datasets in BSD.

@hiliev
Copy link
Owner Author

hiliev commented Aug 6, 2018

That's strange. The ZFS implementation in FreeBSD should be the one closest to the reference implementation in OpenSolaris as it borrows directly most of the code. Perhaps Linux is the one that handles child datasets differently. It means that there are ZFS flavours and the code should be able somehow to detect the flavour or get it, e.g., via a command-line argument.

In any case, I'm fine with merging and creating a separate issue for ZFS on FreeBSD.

@eiselekd
Copy link
Collaborator

eiselekd commented Aug 7, 2018

Attr tables and embedded data are handled

@eiselekd eiselekd closed this as completed Aug 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants