Support for ZPL attribute tables and embedded data #2

hiliev · 2018-03-27T12:54:03Z

In order to be able to use py-zfs-rescue on pools created by modern OSes, the following two enhancements are needed:

Support for block pointers with embedded data
Support for ZPL attribute tables as an alternative to znode_phys_t (bonus data type 0x2c)

The text was updated successfully, but these errors were encountered:

eiselekd · 2018-03-27T22:30:21Z

I try to take a look at this and see weather I can make progress in that direction.

eiselekd · 2018-03-28T18:23:42Z

@hiliev I've implemented the embedded data in the dnode and detected the System Attribute bonusbuffer, however I'm trying to understand the format of its content. Is it a zap encoded buffer?

append: I think I found it in zdb: dump_znode(objset_t *os, uint64_t object, void *data, size_t size)
I have to first scan the SA master node, then scan the "SA attr layouts" and "SA attr registration" dnodes, then use that layout to scan the SA ? Is it really that complicated?

hiliev · 2018-03-28T19:30:30Z

This seems to be a bit more complicated than expected. ZFS has an attribute registration mechanism - SA. There is a bunch of layout tables that define the attributes and their offsets. Those are stored ZAP-like in several system objects. The order of the attributes may differ from pool to pool, therefore that system objects have to be parsed and the tables analysed. The objects are seen in your output from the other issue:

0:[SA master node] ...
1:[ZFS delete queue] ...
2:[ZFS directory] ...
3:[SA attr registration] ...
4:[SA attr layouts] ...

The SA master node (judging from the hex dump, although I haven't really decompressed the embedded data) appears to be a MicroZAP that holds the object IDs of the SA attribute registration and the SA attribute layouts objects.

The attributes in the bonus buffer itself are prefixed with a sa_hdr_phys. The index of the layout used is contained in the sa_layout_info field. The sa_impl.h header is very helpful.

eiselekd · 2018-03-28T20:12:31Z

@hiliev I think you are right, the SA master node (index 32 actually) contains:

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
        32    1   128K    512      0     512    512  100.00  SA master node (K=inherit) (Z=inherit)
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 0
	microzap: 512 bytes, 2 entries

		REGISTRY = 35 
		LAYOUTS = 36

Is it possible that I ask a question: In my test pool datapool I have created a
dataset zfs create datapool/datadir where my actual target files is located, called test.bin .
Now I look at zdb -ddddddd datapool and try to see how this pool is referenced starting from the MOS. But there is so many information that I cannot make out a structure.

One thing I notices is that py-zfs-rescue does collect the toplevel MOS dnodes with type 16 as the target datasets to archive. The root dataset "datapool" seems is in this set (it says there is data in it, however there are no objects inside it), however not the child dataset "datapool/datadir". How is the child-dataset traversal done when starting from the MOS ?
More confusing is that the search for type-16 return 3 datasets in the MOS of which 2 state "0 uncompressed bytes". The Dataset labeled information in the zdb dump on the other hand lists the hirarchical datasets present....Can you recommend some reading to understand how the whole structure is traversed?

hiliev · 2018-03-28T20:58:07Z

I never really looked into how parent-child relationships are implemented. In my case, the MOS was broken and the root dataset was lost. I was happy to just be able to find all accessible datasets and rescue their content.

eiselekd · 2018-03-29T15:11:46Z

@hiliev I have a question:

py-zfs-rescue/zfs/blockptr.py

Line 37 in 9b3b8ba

self._asize = (1 + (qword0 & 0xffffff)) << 9

does a +1 for asize calculation. Is the +1 a saveguard?

eiselekd · 2018-03-29T21:17:12Z

@hiliev The child dataset dependency seems to be retrieved by :

DSLdataset.ds_dir_obj points to the DSLdirectory
DSLdirectory.child_dir_zapobj points to a ZAP which a list of child directory names-ids pairs which point to the child DSLdirectory
DSLdirectory.head_dataset_obj of the child's DSLdirectory points the its dataset

eiselekd · 2018-03-30T13:12:54Z

@hiliev :
pushed PR for
#6
Maybe you can close this issue now...

hiliev · 2018-03-30T15:06:59Z

Let me test it on the pool of my server first. As for the _asize value, ZFS stores certain non-zero values in a biased format, i.e. as an offset from the minimum value, in that particular case equal to 1.

eiselekd · 2018-04-15T11:16:46Z

@hiliev : Just want to note that I have succeeded to retrive my files now. Wanna thank you for the py-zfs-rescue repo and the hints you gave. The unsorted patches are on https://github.com/eiselekd/dumpbin-py-zfs-rescue, maybe someone will find it useful in the future.

eiselekd · 2018-04-16T18:51:48Z

@hiliev : pushed also https://github.com/eiselekd/dumpbin-py-zfs-rescue/blob/master/zfs/sa.py#L59 and https://github.com/eiselekd/dumpbin-py-zfs-rescue/blob/master/zfs/dnode.py#L194 which implemente a more complete handling of system attribute and bonustype 0x2c. With it symlinks are also handled. Are you intereted in getting a PR?

hiliev · 2018-04-16T19:34:34Z

Sorry, I'm currently moving to a different country and my FreeNAS system is offline in a locker room and I'm very slow at testing and accepting PRs. I'll be able to work on it again in about a month.

eiselekd · 2018-04-17T17:35:38Z

@hiliev : ok, I understand. If you have time then let me know and I will supply PRs. There is one error that you might be interested in. https://github.com/eiselekd/dumpbin-py-zfs-rescue/blob/d21f4c28acee0d26ab3ba227fc7d8b03881dffd8/zfs/blocktree.py#L85 In the original repo the levelcache is a flat array that is shared between levels. I changed it to be a tree instead.

eiselekd · 2018-07-15T10:10:30Z

Hi again, If you are interested and have time now I can supply patches. (As for py-zfs-rescue enabled me to restore my data I thought I need to contribute back). Tell me which area you want to address first.

hiliev · 2018-07-17T13:41:38Z

Hi @eiselekd, I'm glad my little project helped your in recovering your data. I had great plans for it and still have a backlog of todo's geared towards making it more user friendly and in particular turning it into a visual ZFS debugger and explorer. Unfortunately, working at a startup company in a completely different field leaves me with zero spare time for this project. If you are willing to take over the CLI branch and develop it further, please feel free to do so. The areas that needs attention are perhaps adding a proper command-line interface, pool scrub functionality, and support for raidz with higher parity (e.g., raidz2). If you wish, I can also make you a project collaborator, so you don't have to fork a separate version.

eiselekd · 2018-07-17T18:45:40Z

@hiliev You can add me as a collaborator and maybe give me access to a special branch that I can hack around with. I could transfer the improvments from https://github.com/eiselekd/dumpbin-py-zfs-rescue back to your repo:

lz4 decompression (already pulled)
fletcher4 cksum
first level child datasets
blkptr with embedded data (already pulled)
improved block server protocol
bigger than 2TB disk support
support SystemAttributes, bonus type 0x2c (partially pulled)
variable asize (already pulled)
fuse (llfuse) interface for recovery
I could also contribute:
linux losetup or similar based testing
add a command line interface as you mentioned to make the configuration interactive

hiliev · 2018-07-17T19:55:36Z

I sent you an invitation to become a collaborator. It gives you push access and you should be able to create branches on your own. When I find the time, I'll hack on the GUI stuff in a separate branch too.

eiselekd · 2018-07-17T20:01:48Z

Accepted, thanks.

eiselekd · 2018-07-18T17:42:38Z

@hiliev : Added pull request #12 which add (from list above):

fletcher4 cksum (please pull)
first level child datasets (was already pulled)
improved block server protocol (please pull)
bigger than 2TB disk support (please pull)
support SystemAttributes, bonus type 0x2c (please pull)
linux losetup or similar based testing (please pull)

hiliev · 2018-08-01T14:24:46Z

Do I have to accept the pull request explicitly or your commit rights allow you to do it?

eiselekd · 2018-08-01T21:36:55Z

I didnt try to push it myself. It is also that even if I tested the code in linux (subfolder test/Makefile) I didnt test it for disks from FreeNAS. I have been setting up a home NAS recenty (with FreeNAS as in a KVM and a SATA controller card passthrough), however I find it a bit hard to work with because the /usr/ports is disabled and I cannot work FreeBSD style with it except within jails however I'm not familiar with those. I didnt find any description on howto enable /usr/ports in FreeNAS again. I could run in FreeBSD but then I'm not shure what the delta to FreeNAS is there.

hiliev · 2018-08-02T10:58:39Z

FreeNAS is based on FreeBSD-STABLE kernels and the ZFS code should be the same as in the vanilla FreeBSD. My FreeNAS box is back online and I'll be able to test the code.

eiselekd · 2018-08-02T16:17:46Z

I can also try it out on a FreeBSD box in the weekend.

eiselekd · 2018-08-03T18:52:14Z

I tested on FreeBSD 11.2 and mdconfig and zpool create datapool0 raidz /dev/${md0} /dev/${md1} /dev/${md2} and was able to read files back. zfs create datapool0/datadir (childsets) on the other hand seems to be handled differently on FreeBSD. The child dataset is found in the ZAP but In the childdataset code in zfs_rescue.py child = mos[v] will return None. So child datasets are only working for linux.

eiselekd · 2018-08-03T18:53:42Z

Conclusion from my side: Ok to push but create an issue to implement child datasets in BSD.

hiliev · 2018-08-06T14:31:15Z

That's strange. The ZFS implementation in FreeBSD should be the one closest to the reference implementation in OpenSolaris as it borrows directly most of the code. Perhaps Linux is the one that handles child datasets differently. It means that there are ZFS flavours and the code should be able somehow to detect the flavour or get it, e.g., via a command-line argument.

In any case, I'm fine with merging and creating a separate issue for ZFS on FreeBSD.

eiselekd · 2018-08-07T20:27:10Z

Attr tables and embedded data are handled

hiliev added the enhancement label Mar 27, 2018

eiselekd closed this as completed Aug 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for ZPL attribute tables and embedded data #2

Support for ZPL attribute tables and embedded data #2

hiliev commented Mar 27, 2018 •

edited

Loading

eiselekd commented Mar 27, 2018

eiselekd commented Mar 28, 2018 •

edited

Loading

hiliev commented Mar 28, 2018 •

edited

Loading

eiselekd commented Mar 28, 2018 •

edited

Loading

hiliev commented Mar 28, 2018

eiselekd commented Mar 29, 2018

eiselekd commented Mar 29, 2018

eiselekd commented Mar 30, 2018

hiliev commented Mar 30, 2018 •

edited

Loading

eiselekd commented Apr 15, 2018

eiselekd commented Apr 16, 2018 •

edited

Loading

hiliev commented Apr 16, 2018

eiselekd commented Apr 17, 2018

eiselekd commented Jul 15, 2018

hiliev commented Jul 17, 2018

eiselekd commented Jul 17, 2018

hiliev commented Jul 17, 2018

eiselekd commented Jul 17, 2018

eiselekd commented Jul 18, 2018 •

edited

Loading

hiliev commented Aug 1, 2018

eiselekd commented Aug 1, 2018

hiliev commented Aug 2, 2018

eiselekd commented Aug 2, 2018

eiselekd commented Aug 3, 2018

eiselekd commented Aug 3, 2018

hiliev commented Aug 6, 2018 •

edited

Loading

eiselekd commented Aug 7, 2018

Support for ZPL attribute tables and embedded data #2

Support for ZPL attribute tables and embedded data #2

Comments

hiliev commented Mar 27, 2018 • edited Loading

eiselekd commented Mar 27, 2018

eiselekd commented Mar 28, 2018 • edited Loading

hiliev commented Mar 28, 2018 • edited Loading

eiselekd commented Mar 28, 2018 • edited Loading

hiliev commented Mar 28, 2018

eiselekd commented Mar 29, 2018

eiselekd commented Mar 29, 2018

eiselekd commented Mar 30, 2018

hiliev commented Mar 30, 2018 • edited Loading

eiselekd commented Apr 15, 2018

eiselekd commented Apr 16, 2018 • edited Loading

hiliev commented Apr 16, 2018

eiselekd commented Apr 17, 2018

eiselekd commented Jul 15, 2018

hiliev commented Jul 17, 2018

eiselekd commented Jul 17, 2018

hiliev commented Jul 17, 2018

eiselekd commented Jul 17, 2018

eiselekd commented Jul 18, 2018 • edited Loading

hiliev commented Aug 1, 2018

eiselekd commented Aug 1, 2018

hiliev commented Aug 2, 2018

eiselekd commented Aug 2, 2018

eiselekd commented Aug 3, 2018

eiselekd commented Aug 3, 2018

hiliev commented Aug 6, 2018 • edited Loading

eiselekd commented Aug 7, 2018

hiliev commented Mar 27, 2018 •

edited

Loading

eiselekd commented Mar 28, 2018 •

edited

Loading

hiliev commented Mar 28, 2018 •

edited

Loading

eiselekd commented Mar 28, 2018 •

edited

Loading

hiliev commented Mar 30, 2018 •

edited

Loading

eiselekd commented Apr 16, 2018 •

edited

Loading

eiselekd commented Jul 18, 2018 •

edited

Loading

hiliev commented Aug 6, 2018 •

edited

Loading