Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend --skip-magic option for better matching #647

Closed
7homasSutter opened this issue Aug 30, 2023 · 3 comments · Fixed by #693 or #695
Closed

Extend --skip-magic option for better matching #647

7homasSutter opened this issue Aug 30, 2023 · 3 comments · Fixed by #693 or #695
Labels
enhancement New feature or request question Further information is requested

Comments

@7homasSutter
Copy link

7homasSutter commented Aug 30, 2023

It would be nice to have a more fine grained option to select what file-types should be skipped. The current --skip-magic option allows to select specific skipped types but it does not allow to extend the default list of magics. It would be nice to have the possibility to extend the default list instead of overwriting it. Moreover, using the magic prefix is confusing in scenarios where the magic bytes of a file is the same (for example, zip and apk files).

Is your feature request related to a problem? Please describe.
Issue #262 has described the exact same problem and the solution was to integrate the --skip-magic option which is a nice way to solve the problem but I suggest extending the feature.

The problem that I'm facing is, that I want to extract an Android image but don't want to extract certain file types (e.g., .apk, .ttf). However, using the --skip-magic option isn't really user friendly because I would need to define a list of --skip-magic parameters for every filetype to exclude as well as for the default list of magic defined by unblob.

Consider the following example: Let's assume I have a .zip file that contains only three files (an .xlsx, an .apk., and a .jar file). We then want to extract this file with unblob but don't want to extract the .apk, .jar, and .xlsx file. As user, I would expect that it is sufficient to add --skip-magic "APK" --skip-magic "JAR" to skip these file extensions. However, adding these two parameters doesn't match apk and jar files as it seems. Moreover, when setting a --skip-magic parameter it overwrites the default list of skip-magic in unblob. Thus, unblob extracts all the files including the .xlsx, which is not what we want.

docker run --platform=linux/amd64 --rm --pull always -v /Volumes/ExtremeSSD/test/output/:/data/output -v /Volumes/ExtremeSSD/test/input/:/data/input ghcr.io/onekey-sec/unblob:latest --skip-magic "APK" --skip-magic "JAR" "/data/input/Test.apk.zip"

To overcome this problem, we have to figure out the correct magic prefix for apk and jar files. So we figured out that adding the magic "Android" and "Java" would actually skip the apk and jar files. However, we would need to add for all defaults another --skip-magic parameter to prevent overwriting the default magic list and skip as well the .xlsx file. The list of defaults to skip is quiet long. Thus, we would need to add around 20 --skip-magic parameters to skip all the defaults.

docker run --platform=linux/amd64 --rm --pull always -v /Volumes/ExtremeSSD/test/output/:/data/output -v /Volumes/ExtremeSSD/test/input/:/data/input ghcr.io/onekey-sec/unblob:latest --skip-magic "Android" --skip-magic "Java" --skip-magic "Microsoft Excel" "/data/input/Test.apk.zip"

I hope the example is understandable.

Describe the solution you'd like
There is two things I would like to suggest to make the --skip-magic parameter more user friendly:

  1. Add the possibility to extend the default magic list without overwriting it.
  2. Map file extensions within unblob to a magic if it is a known file type. For instance, "APK" = "Android"

I think users should just be able to type --skip-magic "<some-file-extension>" to match a correct magic instead of having to extract the magic from a file by themselves.

Ps. if there is a better solution to match apk files I'm up for suggestions.

@7homasSutter 7homasSutter changed the title Extend --skip-magic options for better matching Extend --skip-magic option for better matching Aug 30, 2023
@qkaiser qkaiser added enhancement New feature or request question Further information is requested labels Aug 31, 2023
@qkaiser
Copy link
Contributor

qkaiser commented Aug 31, 2023

Hi @7homasSutter ! Thanks for the suggestion, we'll discuss it internally to see what would be the best course of action here. Will keep you posted.

@qkaiser
Copy link
Contributor

qkaiser commented Aug 31, 2023

Discussion has some relation to #243

@qkaiser
Copy link
Contributor

qkaiser commented Dec 24, 2023

Hi @7homasSutter ! Finally got some time to work on unblob feature requests. I opened an MR that changes the way we handle skip magic lists. See #693

Regarding the ability to filter between apk, jar, zip, etc I think the best way to handle it would be to introduce an extension based filter. Even more so if the libmagic version differs, since some of the older versions do not differentiate between an apk and a jar for example.

Introducing a mapping between extension and magic mime as you suggest would bring too much confusion since end users would not really know if they need to provide a magic or an extension.

We have a similar problem described in #600 where we do not want to extract .rlib files, but the magic mime is current ar archive. If we want to keep extracting other ar archive, we need to filter on extension.

From my perspective, the use case you described should be solved by #693 and the introduction of a --skip-extension CLI argument. We'll see what my team members have to say about this tho :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
2 participants