Skip to content
Permalink
python
Go to file
 
 
Cannot retrieve contributors at this time
126 lines (98 sloc) 4.79 KB

Python 3 bindings for offline low level btrfs operations

The idea

The btrfs-progs project contains a lot of necessary C code to do offline file system operations for btrfs.

By writing bindings from python to C, we can create a duct tape layer which allows, for example:

  • Start an interactive session in which we can "walk around" inside an offline file system to inspect it. Start at a superblock, follow the links to the chunk tree etc...
  • Quickly cobble up scripts to do ad hoc repair actions, in case we already exactly know where the problem is and how it needs change. (e.g. without having to wait for btrfschk for a long time, or if btrfsck can't repair it)
  • Quickly cobble up scripts to damage a file system (or image) in very specific ways for testing purposes.

The library will be the non-identical twin of the current python-btrfs library which exclusively deals with mounted online file systems.

The intended audience is btrfs developers themselves, not the average end user.

Proof of concept

Where to start? Well, let's build a walking skeleton, and let's start by writing up some lines of code, imagining how we'd like it to look.

The first example that I'd want to create as a proof of concept is a python alternative for the fix-dev-count.c script that was made to ad hoc fix a wrong number of devices field in a superblock, because of the device remove bug in linux 4.8.

#!/usr/bin/python3

import cbtrfs
import os
import sys

path = sys.argv[1]
devcount = int(sys.argv[2])

bd = cbtrfs.BlockDevice(path, os.O_RDWR | os.O_EXCL)
sb = bd.superblocks()[0]
sb.num_devices = devcount
sb.update_checksum()
sb.write()

This is just a first idea, as simple as possible. It only changes the value in the first superblock of the device you point it to, just like fix-dev-count.c.

PoC TODO

  • How do I write a python SuperBlock class in C?
  • How do I use btrfs-progs code in there?
  • How do I get the code compiled into a python module?
  • ...

Design: where does the code live, how is it versioned?

While the already existing python-btrfs is a purely in python implemented separate project, which can work with any linux kernel version, this project will tightly integrate with the C code of btrfs-progs. Therefore, it makes sense to keep the code in there, so there's a single version number for both. That's the reason I just start off working in a clone.

Design: naming?

Currently, I used the very boring name cbtrfs. I couldn't think of anything better yet.

Design: convenience vs. control

A very interesting design question for this new library is how to find the right balance between abstracting things away for convenience (e.g. should sb.write() automatically update the checksum) vs. level of control (e.g. deliberately writing invalid data to disk).

Probably there should be different modi operandi (have to find fun names for these):

  1. A mode where you just directly edit stuff on disk. I.e. assisted hexedit with super powers. (like the above superblock example)
  2. A mode in which we can edit values in place and e.g. when editing metadata items we automatically update the checksum and write two copies when it's DUP etc.
  3. A mode in which the full cow machinery starts and we cow writes, do transactions etc...

All of this of course depends on what the btrfs-progs C code can do and how easily and well organized building blocks needed are available in that code.

Design: Python vs. C

  • Functionality in the C code will not be reimplemented in python.
  • The python code will look like python, and it will not look like someone is typing C code in python. There will be objects with functions and properties. There will be exceptions. There won't be just bytearrays, a separate collection of functions, side effect rich programming and integer return values, which need checking all the time.
  • All data is kept in C structures and data buffers. The python objects wrapping the data structures will also be written in C (e.g. the cbtrfs.SuperBlock class). Only when accessing a specific value (e.g. sb.num_devices in the PoC), the value will be taken from the struct, and translated to a python integer object. So, leaving everything in C and duct taping things together should be fast. Actually accessing a lot of data, causing it to be dragged back and forth over the C / python border will be slower.

Roadmap

No, just baby steps. Maybe after trying some things we find out that the whole idea is garbage and throw it away.

Also, no timeline or planning. I'm just working on this for fun, in my spare time.

You can’t perform that action at this time.