-
Notifications
You must be signed in to change notification settings - Fork 9
Some QCOW2 improvements and unit tests #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
78c3fbe to
a3327fd
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #61 +/- ##
===========================================
+ Coverage 66.06% 76.80% +10.73%
===========================================
Files 24 24
Lines 2066 2100 +34
===========================================
+ Hits 1365 1613 +248
+ Misses 701 487 -214
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a3327fd to
45eb564
Compare
| allow_no_backing_file: bool = False, | ||
| ): | ||
| self.fh = fh.open("rb") if isinstance(fh, Path) else fh | ||
| self.fh.seek(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the __init__ method a bit hard to follow.
I suggest something like:
def __init__(
self,
fh: BinaryIO,
data_file: BinaryIO | None = None,
backing_file: BinaryIO | None = None,
*,
allow_no_data_file: bool = False,
allow_no_backing_file: bool = False,
# The original path is passed to resolve relative data/backing files
original_path: Path | None = None,
):
self.fh = fh
self.original_path = original_path
self.fh.seek(0)
self.header = c_qcow2.QCowHeader(self.fh)
self._validate_header()
self._initialize_properties()
self._read_extensions()
self.data_file = self._setup_data_file(data_file, allow_no_data_file)
self.backing_file = self._setup_backing_file(backing_file, allow_no_backing_file)
self.l2_table = lru_cache(128)(self.l2_table)
@classmethod
def from_path(
cls,
path: str | Path,
data_file: BinaryIO | None = None,
backing_file: BinaryIO | None = None,
*,
allow_no_data_file: bool = False,
allow_no_backing_file: bool = False,
) -> QCow2:
"""Create a QCow2 instance from a file path."""
path = Path(path)
fh = path.open("rb")
return cls(
fh,
data_file,
backing_file,
allow_no_data_file=allow_no_data_file,
allow_no_backing_file=allow_no_backing_file,
original_path=path,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the opening of the data and backing file out, but I'm in general not a fan of "obscuring" the header validation or initialization of important properties. I find it harder to keep mental track of what happens or is initialized where with it hidden in "magic setter and non-reusable methods".
With regards to the path thing, it's not super ideal but I don't find your proposed solution to be the best either. The way it's currently added is at least consistent with e.g. VMDK or VHDX (and part of the "it just works Dissect API"). I propose we defer changing that until we think of something better. I did slightly change it to make it a bit easier to understand and follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the opening of the data and backing file out, but I'm in general not a fan of "obscuring" the header validation or initialization of important properties. I find it harder to keep mental track of what happens or is initialized where with it hidden in "magic setter and non-reusable methods".
That is fair. I would still go all the way though, because we are now mixing levels of abstraction in the __init__ method, where the high-level abstractions are calls to open_data_file and open_backing_file, and the low level abstractions like self.csize_shift = 62 - (self.cluster_bits - 8), which causes mental fatigue in my brain.
That being said, I think we found a middle ground
With regards to the path thing, it's not super ideal but I don't find your proposed solution to be the best either. The way it's currently added is at least consistent with e.g. VMDK or VHDX (and part of the "it just works Dissect API"). I propose we defer changing that until we think of something better. I did slightly change it to make it a bit easier to understand and follow.
I was going to reply with "constructors are not part of the interface of a class", but I guess in Python they actually are.
In any case, I only see explicit solutions now, like the builder pattern, but that contrasts even more with the implicit "It just works Dissect API".
I am fine with deferring.
| Supports both data-file and backing-file, but must be manually given as arguments. | ||
| If a data-file is required and ``fh`` is not a ``Path``, it's required to manually pass a file like object | ||
| in the `data_file` argument. Otherwise, the data file will be automatically opened if it exists in the same directory. | ||
| It's possible to defer opening the data file by passing ``allow_no_data_file=True``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the usecase of deferring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used yet, but in dissect.target we may want to defer loading it in case the backing file is not a QCOW2 image (which would be the only thing we can support within this code right now), and load it instead using container.open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well perhaps we could remove the functionality for now but I am fine either way.
twiggler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved (added 2 replies)
| allow_no_backing_file: bool = False, | ||
| ): | ||
| self.fh = fh.open("rb") if isinstance(fh, Path) else fh | ||
| self.fh.seek(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the opening of the data and backing file out, but I'm in general not a fan of "obscuring" the header validation or initialization of important properties. I find it harder to keep mental track of what happens or is initialized where with it hidden in "magic setter and non-reusable methods".
That is fair. I would still go all the way though, because we are now mixing levels of abstraction in the __init__ method, where the high-level abstractions are calls to open_data_file and open_backing_file, and the low level abstractions like self.csize_shift = 62 - (self.cluster_bits - 8), which causes mental fatigue in my brain.
That being said, I think we found a middle ground
With regards to the path thing, it's not super ideal but I don't find your proposed solution to be the best either. The way it's currently added is at least consistent with e.g. VMDK or VHDX (and part of the "it just works Dissect API"). I propose we defer changing that until we think of something better. I did slightly change it to make it a bit easier to understand and follow.
I was going to reply with "constructors are not part of the interface of a class", but I guess in Python they actually are.
In any case, I only see explicit solutions now, like the builder pattern, but that contrasts even more with the implicit "It just works Dissect API".
I am fine with deferring.
| Supports both data-file and backing-file, but must be manually given as arguments. | ||
| If a data-file is required and ``fh`` is not a ``Path``, it's required to manually pass a file like object | ||
| in the `data_file` argument. Otherwise, the data file will be automatically opened if it exists in the same directory. | ||
| It's possible to defer opening the data file by passing ``allow_no_data_file=True``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well perhaps we could remove the functionality for now but I am fine either way.
Some additional improvements on top of #52. Also add unit tests and closes #54.
Introduces some breaking changes that will need to be addressed in dissect.target (fox-it/dissect.target#1160).