Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for voidtools everything DB #515

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cobyge
Copy link
Contributor

@cobyge cobyge commented Jan 26, 2024

Inspired by #505, I remembered I had some code lying around to parse the database of Voidtools Everything, very similar to mlocate/plocate, but for Windows.

I updated the code and added it to the codebase.
Because Everything is closed source, this is completely based off of reverse-engineering the code, and I haven't found any reference implementation on the internet to help (AFAIK this is the only parser), so this is all based off of my (not too great) reversing skills.
I've tested this on ~10 random database files I had lying around, from multiple computers, all of them have given exactly the same exact results as Everything itself (checked by exporting to CSV and comparing md5sums).
It should support any DB created since 2017, and if given a broken file, I'm willing to add support for earlier versions as well.

All comments are mine, written while reversing the code.

This is relatively slow code (takes 4.5 seconds for a DB with 126828 files),
I have a version written in Rust which is 22 times faster, and if that's something you are interested in, then I'm happy to try creating bindings with Py03.

@cobyge cobyge force-pushed the feature/add-everything-plugin branch 2 times, most recently from eaffcfa to 00f8351 Compare January 27, 2024 10:18
@cobyge cobyge force-pushed the feature/add-everything-plugin branch from 90979c3 to a228c74 Compare January 27, 2024 23:14
@cobyge
Copy link
Contributor Author

cobyge commented Jan 27, 2024

I've now added support for all filesystem types supported by Everything stable (Currently NTFS/REFS/EFU/Folder), along with tests for each.

When I have some more time I'll add support for more versions (Everything 1.5.0alpha currently uses version 1.7.49 and also supports FAT, network drives, and network indexes)

@Horofic Horofic self-requested a review January 28, 2024 16:12
@Horofic
Copy link
Contributor

Horofic commented Jan 29, 2024

Really cool PR! Since this is another big one, please give it some time for us to do the review :). Stay tuned!

@Schamper Schamper self-requested a review January 29, 2024 23:00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this file not put in lfs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if possible, it is a good idea to compress the bigger files

pyproject.toml Outdated Show resolved Hide resolved
dissect/target/helpers/locate/everything.py Outdated Show resolved Hide resolved
dissect/target/helpers/locate/everything.py Outdated Show resolved Hide resolved
class EverythingDBParser:
def __init__(self, file_handle: IO[bytes]):
self.fh = file_handle
magic = self.__parse_magic(self.fh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently see you are manually doing a lot of manual reading in this parser. Can't you convert it to dissect.cstruct definitions? Then it gets more readable and consistent with dissect.target

dissect/target/helpers/locate/everything.py Outdated Show resolved Hide resolved
elif isinstance(item, EverythingFile):
typ = EverythingFileRecord
else:
raise NotImplementedError(f"type {type(item)} is not Recordable")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do it like this in the plugin, even if there is still valid data in everything_file it will completely end this plugin run.

Even if it still needs to process other paths in self.config

dissect/target/plugins/os/windows/everything.py Outdated Show resolved Hide resolved
Comment on lines 86 to 89
if isinstance(item, EverythingDirectory):
typ = EverythingDirectoryRecord
elif isinstance(item, EverythingFile):
typ = EverythingFileRecord
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just make a EverythingRecord, which adds a file_type field?
Same with the distinction of EverythingFile and EverythingDirectory.
Wouldn't a attribute in the previous class suffice?

@cobyge
Copy link
Contributor Author

cobyge commented Mar 2, 2024

Hey, thanks for the review.
I updated the code according to your request, and I've also added support for a previous version of Everything, in order to what supporting multiple versions might look like.

The only request I haven't worked on yet is the request regarding using dissect.cstruct. I'll have to think a bit about how to implement it, because of differences between structs for multiple versions.

I'd be happy to hear thoughts about how I handled different versions in the code (I'm not quite happy about with the version handling).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants