Skip to content

Conversation

@Schamper
Copy link
Member

@Schamper Schamper commented Aug 12, 2025

Some additional improvements on top of #52. Also add unit tests and closes #54.

Introduces some breaking changes that will need to be addressed in dissect.target (fox-it/dissect.target#1160).

@Schamper Schamper force-pushed the qcow2-improvements branch from 78c3fbe to a3327fd Compare August 12, 2025 10:36
@codecov
Copy link

codecov bot commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 86.86131% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.80%. Comparing base (f9bdad9) to head (9037794).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
dissect/hypervisor/disk/qcow2.py 86.66% 18 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #61       +/-   ##
===========================================
+ Coverage   66.06%   76.80%   +10.73%     
===========================================
  Files          24       24               
  Lines        2066     2100       +34     
===========================================
+ Hits         1365     1613      +248     
+ Misses        701      487      -214     
Flag Coverage Δ
unittests 76.80% <86.86%> (+10.73%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

allow_no_backing_file: bool = False,
):
self.fh = fh.open("rb") if isinstance(fh, Path) else fh
self.fh.seek(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the __init__ method a bit hard to follow.

I suggest something like:

def __init__(
        self,
        fh: BinaryIO,
        data_file: BinaryIO | None = None,
        backing_file: BinaryIO | None = None,
        *,
        allow_no_data_file: bool = False,
        allow_no_backing_file: bool = False,
        # The original path is passed to resolve relative data/backing files
        original_path: Path | None = None,
    ):
        self.fh = fh
        self.original_path = original_path
        self.fh.seek(0)

        self.header = c_qcow2.QCowHeader(self.fh)
        self._validate_header()

        self._initialize_properties()

        self._read_extensions()

        self.data_file = self._setup_data_file(data_file, allow_no_data_file)
        self.backing_file = self._setup_backing_file(backing_file, allow_no_backing_file)

        self.l2_table = lru_cache(128)(self.l2_table)

    @classmethod
    def from_path(
        cls,
        path: str | Path,
        data_file: BinaryIO | None = None,
        backing_file: BinaryIO | None = None,
        *,
        allow_no_data_file: bool = False,
        allow_no_backing_file: bool = False,
    ) -> QCow2:
        """Create a QCow2 instance from a file path."""
        path = Path(path)
        fh = path.open("rb")
        return cls(
            fh,
            data_file,
            backing_file,
            allow_no_data_file=allow_no_data_file,
            allow_no_backing_file=allow_no_backing_file,
            original_path=path,
        )

Copy link
Member Author

@Schamper Schamper Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the opening of the data and backing file out, but I'm in general not a fan of "obscuring" the header validation or initialization of important properties. I find it harder to keep mental track of what happens or is initialized where with it hidden in "magic setter and non-reusable methods".

With regards to the path thing, it's not super ideal but I don't find your proposed solution to be the best either. The way it's currently added is at least consistent with e.g. VMDK or VHDX (and part of the "it just works Dissect API"). I propose we defer changing that until we think of something better. I did slightly change it to make it a bit easier to understand and follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the opening of the data and backing file out, but I'm in general not a fan of "obscuring" the header validation or initialization of important properties. I find it harder to keep mental track of what happens or is initialized where with it hidden in "magic setter and non-reusable methods".

That is fair. I would still go all the way though, because we are now mixing levels of abstraction in the __init__ method, where the high-level abstractions are calls to open_data_file and open_backing_file, and the low level abstractions like self.csize_shift = 62 - (self.cluster_bits - 8), which causes mental fatigue in my brain.
That being said, I think we found a middle ground

With regards to the path thing, it's not super ideal but I don't find your proposed solution to be the best either. The way it's currently added is at least consistent with e.g. VMDK or VHDX (and part of the "it just works Dissect API"). I propose we defer changing that until we think of something better. I did slightly change it to make it a bit easier to understand and follow.

I was going to reply with "constructors are not part of the interface of a class", but I guess in Python they actually are.
In any case, I only see explicit solutions now, like the builder pattern, but that contrasts even more with the implicit "It just works Dissect API".
I am fine with deferring.

Supports both data-file and backing-file, but must be manually given as arguments.
If a data-file is required and ``fh`` is not a ``Path``, it's required to manually pass a file like object
in the `data_file` argument. Otherwise, the data file will be automatically opened if it exists in the same directory.
It's possible to defer opening the data file by passing ``allow_no_data_file=True``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the usecase of deferring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used yet, but in dissect.target we may want to defer loading it in case the backing file is not a QCOW2 image (which would be the only thing we can support within this code right now), and load it instead using container.open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well perhaps we could remove the functionality for now but I am fine either way.

@Schamper Schamper requested a review from twiggler August 15, 2025 08:16
Copy link
Contributor

@twiggler twiggler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved (added 2 replies)

allow_no_backing_file: bool = False,
):
self.fh = fh.open("rb") if isinstance(fh, Path) else fh
self.fh.seek(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the opening of the data and backing file out, but I'm in general not a fan of "obscuring" the header validation or initialization of important properties. I find it harder to keep mental track of what happens or is initialized where with it hidden in "magic setter and non-reusable methods".

That is fair. I would still go all the way though, because we are now mixing levels of abstraction in the __init__ method, where the high-level abstractions are calls to open_data_file and open_backing_file, and the low level abstractions like self.csize_shift = 62 - (self.cluster_bits - 8), which causes mental fatigue in my brain.
That being said, I think we found a middle ground

With regards to the path thing, it's not super ideal but I don't find your proposed solution to be the best either. The way it's currently added is at least consistent with e.g. VMDK or VHDX (and part of the "it just works Dissect API"). I propose we defer changing that until we think of something better. I did slightly change it to make it a bit easier to understand and follow.

I was going to reply with "constructors are not part of the interface of a class", but I guess in Python they actually are.
In any case, I only see explicit solutions now, like the builder pattern, but that contrasts even more with the implicit "It just works Dissect API".
I am fine with deferring.

Supports both data-file and backing-file, but must be manually given as arguments.
If a data-file is required and ``fh`` is not a ``Path``, it's required to manually pass a file like object
in the `data_file` argument. Otherwise, the data file will be automatically opened if it exists in the same directory.
It's possible to defer opening the data file by passing ``allow_no_data_file=True``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well perhaps we could remove the functionality for now but I am fine either way.

twiggler pushed a commit to andreia-oca/dissect.target that referenced this pull request Aug 18, 2025
@Schamper Schamper merged commit 2e2d0aa into main Aug 18, 2025
25 checks passed
@Schamper Schamper deleted the qcow2-improvements branch August 18, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change QCOW2 implementation to return stream on .open() instead of being a stream

3 participants