Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(processing): add multi-volume handler support #564

Merged
merged 5 commits into from May 26, 2023
Merged

Conversation

martonilles
Copy link
Contributor

unblob/cli.py Outdated Show resolved Hide resolved
unblob/models.py Outdated Show resolved Hide resolved
unblob/models.py Show resolved Hide resolved
unblob/models.py Outdated Show resolved Hide resolved
unblob/models.py Outdated Show resolved Hide resolved
unblob/models.py Outdated Show resolved Hide resolved
unblob/handlers/archive/sevenzip.py Outdated Show resolved Hide resolved
unblob/cli.py Show resolved Hide resolved
@qkaiser
Copy link
Contributor

qkaiser commented Apr 26, 2023

Wait, since when do you develop in Rust ?? :)

@qkaiser qkaiser added enhancement New feature or request performance performance improvements tasks labels Apr 26, 2023
tests/test_processing.py Show resolved Hide resolved
tests/test_processing.py Outdated Show resolved Hide resolved
tests/test_processing.py Outdated Show resolved Hide resolved
tests/test_processing.py Show resolved Hide resolved
tests/test_processing.py Outdated Show resolved Hide resolved
unblob/models.py Outdated Show resolved Hide resolved
unblob/models.py Show resolved Hide resolved
unblob/handlers/archive/sevenzip.py Show resolved Hide resolved
unblob/processing.py Show resolved Hide resolved
unblob/processing.py Show resolved Hide resolved
Copy link
Contributor

@e3krisztian e3krisztian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ready to be merged, I have commented on some minor things, that would be better fixed now, than live with it (Extractable.extractable_id).

I would also like to get #579 merged first, to have an up to date type checker as soon as possible.

tests/test_processing.py Outdated Show resolved Hide resolved
tests/test_processing.py Outdated Show resolved Hide resolved
tests/test_processing.py Outdated Show resolved Hide resolved
unblob/models.py Outdated Show resolved Hide resolved
unblob/models.py Outdated Show resolved Hide resolved
unblob/processing.py Outdated Show resolved Hide resolved
unblob/processing.py Outdated Show resolved Hide resolved
unblob/processing.py Outdated Show resolved Hide resolved
unblob/processing.py Outdated Show resolved Hide resolved
vulture_whitelist.py Show resolved Hide resolved
@martonilles martonilles force-pushed the multi-volume branch 3 times, most recently from 5e69bc9 to 3345e0d Compare May 15, 2023 16:56
@qkaiser
Copy link
Contributor

qkaiser commented May 17, 2023

The whole thing looks good ! I guess we could take advantage of these changes later on to allow users to provide a directory rather than a file as part of the command line by using a DirectoryTask, what do you think ?

@e3krisztian e3krisztian force-pushed the multi-volume branch 2 times, most recently from 958a05e to 97052dc Compare May 24, 2023 19:54
@martonilles martonilles changed the title feat(processing): add multi-volume handler support (PoC) feat(processing): add multi-volume handler support May 25, 2023
@martonilles martonilles marked this pull request as ready for review May 25, 2023 12:42
unblob/handlers/archive/sevenzip.py Outdated Show resolved Hide resolved
unblob/handlers/archive/sevenzip.py Outdated Show resolved Hide resolved
unblob/handlers/archive/sevenzip.py Outdated Show resolved Hide resolved
unblob/models.py Show resolved Hide resolved
unblob/processing.py Outdated Show resolved Hide resolved
docs/development.md Outdated Show resolved Hide resolved
There are certain formats where the content is split between
multiple files. Currently unblob operates under the assumption
that all content resides in a single file.

A few examples where this might be relevant:
- multi-volume archives, such as 7zip, rar etc.
- VM snapshots
- content + index type formats

This change introduces a DirectoryHandler which can operate
on multiple files residing in one directory or at least under
one subtree. Most formats there is a "main" file which can
be identified by a directory file name pattern. Using this
first file the handler can identify the other files and
return a MultiFile object, similar to ValidChunks.

We do not support cases where a single file is part of
multiple MultiFile, also a file processed & extracted in
the context of a MultiFile is not processed by traditional
handlers. Also there is no carving step rather the files
are extracted directly into an extraction directory. The
original files are kept and never deleted, as these are normal
files, unlike carved out temporary chunks.

Files extracted from a MultiFile have a MultiFile as their
parent. This required extending the current File -> Chunk
reporting concept by introducing an abstract Blob
type which is the parent of Chunk and MultiFile as well.

MultiFileReports are reported under the directory Task,
but contains all included file paths as well.
@martonilles martonilles merged commit e0865e9 into main May 26, 2023
11 checks passed
@martonilles martonilles deleted the multi-volume branch May 26, 2023 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance performance improvements tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants