Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a program to easily do small targeted metadata searches #18

Open
knorrie opened this issue Mar 15, 2019 · 12 comments
Open

Adding a program to easily do small targeted metadata searches #18

knorrie opened this issue Mar 15, 2019 · 12 comments
Labels

Comments

@knorrie
Copy link
Owner

@knorrie knorrie commented Mar 15, 2019

After moving a bunch of interesting example programs into the bin/ location I'm looking at the rest of the random pile of code in examples/

What about a tool that allows searching any metadata you want? And what if it already has a bunch of presets that select ranges from trees automatically?

btrfs-search-metadata [-l|--long] <type> [<moar> ..] <path>

So, e.g.:

  • btrfs-search-metadata chunks /mountpoint -> show list of chunks
  • btrfs-search-metadata blockgroups /mountpoint -> show list of block groups
  • btrfs-search-metadata inode /path/to/file -> show everything related to directory or file inode
  • btrfs-search-metadata devices /mountpoint -> show list of devices with their info
  • btrfs-search-metadata orphans /mountpoint -> show list of orphaned stuff, like subvolume roots that the cleaner is working on
  • btrfs-search-metadata roots /mountpoint -> show list of tree roots present with some info
  • btrfs-search-metadata tree X [--min '(A B C)'] [--max '(D E F)'] /mountpoint -> dump any range of metadata objects from any tree on the screen, with key '(A B C)' and '(D E F)' as min/max search key. So e.g. '(31337 0 0)' and '(80085 -1 -1)' would show all metadata objects in that range. When no min or max given, just dump entire tree X.

All these things can be done in just a few lines of code using lib functionality. \o/

Doing things like the whole if/else tree inside examples/show_directory_contents.py is made easier by just using the new 'recursive object printer using str()' that's in develop branch now. The -l option for more elaborate output would just switch to the object pretty printer instead, which recursively prints all attributes of all nested objects that it gets fed.

Feedback / more thoughts anyone? This will remove another 80% of crap from the examples/ and when it's in distro packages it will be available to answer a lot of questions that people keep asking on the mailing list and on IRC. "How can I see which extents this file has and if they're compressed or not?" "What does my list of blockgroups look like?" etc.

@dim-geo
Copy link

@dim-geo dim-geo commented Jul 31, 2019

Hi,
Yes, that would be very useful.
It could even replace the need for a tutorial.
For example, in my project I would like to enumerate the extents of a subvolume in order to calculate the unique size of each subvolume.
So a metadata utility like that would help.

@knorrie
Copy link
Owner Author

@knorrie knorrie commented Jul 31, 2019

No, the whole purpose of this library is to make it possible to write small targeted pieces of code that can give you want you want, INSTEAD of having another tool do it and you parsing back the output!

Easiest way to get all of them is to just dump the tree and filter it:

import btrfs
tree = 1234
with btrfs.FileSystem('/') as fs:
  for header, data in btrfs.ioctl.search_v2(fs.fd, tree):
    if header.type == btrfs.ctree.EXTENT_DATA_KEY:
      item = btrfs.ctree.FileExtentItem(header, data)
      # do something with it
      print(item)

Now you can do something with the attributes of all of those objects:
https://python-btrfs.readthedocs.io/en/stable/btrfs.html#btrfs.ctree.FileExtentItem

@dim-geo
Copy link

@dim-geo dim-geo commented Aug 1, 2019

@knorrie
Copy link
Owner Author

@knorrie knorrie commented Aug 1, 2019

Ah, you meant: it means because you can look at the code as an example and do something different based on it.

Yes, great, indeed. It would be a better organized version of a bunch of examples "in a box" than the current collection in examples/. Thanks for your +1!

@knorrie
Copy link
Owner Author

@knorrie knorrie commented Aug 25, 2019

I started working on this. It's happening in the develop branch.

@knorrie
Copy link
Owner Author

@knorrie knorrie commented Aug 25, 2019

Ok, I did some more stuff in the evening and the program can do most of the stuff written in the initial idea now. Like promised, it's all very boring. Most of it is parsing of command line options, and then just connecting a search query to a pretty printer and have it do its thing streaming objects from left to right.

It's still a work in progress, and it also needs a decent man page. Next thing to do first is writing a parser for the '(objectid type offset)' strings for min_key and max_key.

But, in the meantime, let me know what you think of it!

@knorrie
Copy link
Owner Author

@knorrie knorrie commented Aug 25, 2019

When confused, just start with the 'inode' preset search option, e.g.

btrfs-search-metadata inode /bin/bash
btrfs-search-metadata inode /

@knorrie
Copy link
Owner Author

@knorrie knorrie commented Apr 18, 2020

The develop branch now has a btrfs-search-metadata that can also take min-key and max-key arguments for dump and show a range of metadata items.

Now I'm looking at the examples/ collection to see which of them can be removed.

@knorrie
Copy link
Owner Author

@knorrie knorrie commented Apr 18, 2020

So, TODO:

  • a sub-command that shows inode info by specifying <subvolid> <inode> <mountpoint> instead of the current inode thing that shows info about the file you point it to
  • when using the current ./show_file.py example, it prints a line like "filename ./show_tree_keys.py tree 1015 inum 130985", which is interesting,
  • for dump, add the option to only print tree keys (this will replace show_extent_tree_keys.py, show_chunk_tree_keys.py, show_tree_keys.py)
  • something to do show_block_group_contents.py would be nice
  • show_orphaned_subvols.py also is not in there yet
  • show_subvolumes.py hmmm, what to do about that?

And, decide what the cut-off is between 'more can be done' and 'enough for now'

@Zygo
Copy link

@Zygo Zygo commented Apr 19, 2020

something to do show_block_group_contents.py would be nice

That is one of my favorites, especially with the current balance looping kernel bugs. ;)

Are there sorting options? "List all block groups not using device 4 in ascending usage order" is one of my most common use cases for RAID reshapes where device 4 is now bigger.

A 'pwd' that finds the path from fs_root to a file by looking at backrefs might be useful, but IIUC it can't be done entirely with TREE_SEARCH.

@knorrie
Copy link
Owner Author

@knorrie knorrie commented Apr 19, 2020

Yeah, bg contents should be added, it's just a slice of min_key (bg_start_vaddr 0 0) to max_key (bg_vaddr + bg_length -1, -1, -1) from the extent tree.

Sorting options? No. First version of this thing will just show info about preset slices of metadata items. Then I'm doing the p-b v12 release and after that, I'm going to work on the tutorial documentation again which should result in some crash course docs that helps you to write a 10-line python program that can answer "which block groups are not using devid 4 and are not having any metadata in RAID1 on them", since it's much better to help you play with it on that level than building a complete monster of cli options that will never suffice.

So, search-metadata should just show search slices, and not contain a lot of magic post-processing.

The pwd thing is interesting, I think I didn't try doing that yet, but it will be out of scope for this tool that has "easily do small targeted metadata searches" as goal.

@knorrie
Copy link
Owner Author

@knorrie knorrie commented Jun 30, 2020

Ok, almost there (in develop branch).

What's left to do:

  • Writing the man page, which has a short explanation about all preset options, and an example for each of them.
  • (Not sure yet): Add some option to resolve file names for block group contents, using logical_ino_v2. This is not purely a metadata search, but I suspect the feature request would arrive pretty soon, and another example script could be deleted. Should it show file names for extents for a block group, or should it just have a shortcut option to show filenames for a single extent vaddr?

Please test and let me know what you think, right now. Especially things which raise eyebrows because they're not very intuitive etc. Also test whatever weird pretty string representation you can find for btrfs tree keys as input for the dump option for min-key or max-key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.