Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace imghdr #650

Merged
merged 24 commits into from
Jul 13, 2023
Merged

Replace imghdr #650

merged 24 commits into from
Jul 13, 2023

Conversation

jcjgraf
Copy link
Contributor

@jcjgraf jcjgraf commented Jul 2, 2023

replaces #646

Since imghdr is getting deprecated from the standard library (see #579), this PR replaces that module by a own (and thereby, legally completely decoupled independent (unlike in #646)), implementation of the magic byte file type detection.

Some advantages of this approach, over using the package:

  • Detect (all) file types which are actually supported by QT natively or extended (by using qtiamgeformats)
    • Well, some are not detectable...
  • Detect none which are not supported
  • Have a central place for all magic byte checks
    • Cleanup of utils/files and plugin/imageformats
  • Well structured magic byte checks allows for better maintainability
  • plugin/imageformats has a specific and well defined purpose
    • I.e. enable image type support that require the (manual) installation of some other QT module

And something that I like particularly well:
The runtime validation of check functions, (used to be done for all checks that were registered using utils/files/add_image_formats), is now done for all image types, except the ones natively supported by QT. This way, clients get a notification when they try to open an image of a type that:

  • is supported only by qtimageformats, without having that module installed
  • is supported by a 3rd party QT module and was activated via plugins/imageformats, without having that module installed
    This should prevent confusions as in Unable to open WebP image format #619, and previous similar issues.

Steps

  • Add check function for all types natively or extended supported
    • WBMP is completely undetectable
    • TGA version 1 is undetectable, while version 2 is
    • XBM has no signature, but can be matched based on its content
    • See docstring in utils/imagheader for status supported types
  • Add registration function
    • Allow for runtime check validation
  • Add type detection function
  • Cleanup and adaption of codebase to new utils/imageheader
  • Cleanup of plugins/imageformats
  • Tests

I have already added a basic tests that verifies the detection of all filetypes which QPixmap is capable of writing (/creating) (as you suggested in the other PR). Unfortunately, it is capable of creating less types than what it is able to read. I have to think about ways to create and test the other types.

So, code-wise this PR is done, and if time allows, you may have a look at it. I will spend some more time for testing.

In what way does it make sense that copyright in the header of the new file is required to start at 2017? Should it not start in the year the file was first added to the project (i.e. 2023 in this case)?

fixes #579
replaces #646

Also copy own check functions from `utils.files` over.
This is equivalent to the functionality originally provided by
`add_image_format`. This implementation has the advantage that it
allows uf us to add this validation for ALL types not natively
supported (including types supported by qtimageformates).
This also includes removing jp2, as this check is included in
`utils.imageheaer`.
Added wrong function to `_registry`, resulting in dynamic check not
actually be done.
@buzzingwires
Copy link
Contributor

buzzingwires commented Jul 4, 2023

A few comments on this implementation of image detection:

  • Have you considered using an existing library like python-magic to detect images? It is permissively licensed and libmagic has very extensive format support. libmagic should already be very commonly available on Linux systems, though I don't know how easily it will be to support that on Windows.

  • For MNG files, isn't the magic number 8A 4D 4E 47 0D 0A 1A 0A, very similar to the PNG magic number, but with 4D 4E 47 == MNG instead?

  • I think WBMP also lack explicit magic numbers, and I'm not sure what .cur files even are. Mouse cursor images for Windows?

  • XBM and TGA indeed lack explicit magic numbers.

  • For XBM files, at least the way imghdr does it is to just see if the file starts with '#define'. A better way to do it might be how libmagic does it, which is roughly equivalent to the Regex #define [a-zA-Z0-9]+_width [0-9]+\n#define [a-zA-Z0-9]+_height [0-9]+. Either way, XBM starts out in a pretty distinct way.

  • TGA files are much less trivial, but some newer files end with TRUEVISION-XFILE.\0 Otherwise, Targa identification seems to work based on the first 18 bytes of the file, and there are many rules to determine what generally 'makes sense' for a Targa file. In my opinion though, TGA is probably the most likely to be encountered of the more obscure formats mentioned, since it still has relevance to video games. I think this image identification module should be very solid with the already-supported formats, the low-hanging fruit of MNG and XBM, and nontrivial but relevant TGA support.

@karlch
Copy link
Owner

karlch commented Jul 4, 2023

@jcjgraf thanks a lot for the effort that went into this, this is really nice, I hope to take a closer look during this week!

Concerning the Copyright header, you are probably right, but I wouldn't bother in this PR. It may make sense to simplify this overall and remove the years and just keep a small notice at the top of each file. Discussion in a new issue?

@buzzingwires thanks a bunch for your comments and details! 😊

We actually did start a discussion on this in #579 . The reason for pursuing this path was initially, that just copying imghdr seemed absolutely straightforward and did the trick for years for us. To keep this PR on the implementation, I will add another comment in this issue directly to continue the discussion.

@karlch karlch mentioned this pull request Jul 4, 2023
Copy link
Owner

@karlch karlch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely amazing work, thanks again! I only left a few nitpicks, so besides the points you mentioned and including the magic bytes described by @buzzingwires, this is basically ready to go.

It is really cool to see how much cleaner this makes the handling of filetypes, together with the Qt support, and the error / warning messages. And the documentation is amazing.

tests/unit/utils/test_imageheader.py Outdated Show resolved Hide resolved
vimiv/api/__init__.py Show resolved Hide resolved
vimiv/utils/imageheader.py Outdated Show resolved Hide resolved
vimiv/utils/imageheader.py Outdated Show resolved Hide resolved
Initially, I thought (hoped) that this is not needed. But it seems that
for certain types it is required. That is e.g. the case when the first
32 bytes are not sufficient to detect type.
Version 1 is undetectable
Seeking relative to the end of file seems to not work for some files,
resulting in an exception. This was fixed.
Assuming a certain seek is not a good idea. Also, apparently, the prefix
of `_width` and `_height` is optional (QImageWriter does not produce
one).
It would be even cleaner to omit the else bloc and set `check =
check_verified` in the then block, and register `check` instead.
But python has some recursion issue with that.
It does not like my informative capitalization ;(
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jul 12, 2023

@buzzingwires Thanks for your really helpful comment! And sorry for the late reply...

I know that some of the formats are probably used by literally nobody (anymore). But since these formats (including .cur, and yes, I had also no clue that they are😅) are supported by QT, and therefore do not require any additional effort except registering a check function, I would like to support them.

For MNG files, isn't the magic number 8A 4D 4E 47 0D 0A 1A 0A

Your correct, thanks! I have added a check for it.

For XBM files, at least the way imghdr does it is to just see if the file starts with '#define'. A better way to do it might be how libmagic does it, which is roughly equivalent to the Regex #define [a-zA-Z0-9]+_width [0-9]+\n#define [a-zA-Z0-9]+_height [0-9]+. Either way, XBM starts out in a pretty distinct way.

XBM is indeed a funky file definition; it is a valid C source file🙈 I have also added a check for it.

TGA files are much less trivial, but some newer files end with TRUEVISION-XFILE.

Indeed. There are two version of TGA. In version 2 they have added a signature to the footer of the file. I have added support for that.

Otherwise, Targa identification seems to work based on the first 18 bytes of the file, and there are many rules to determine what generally 'makes sense' for a Targa file.

I am not really sure if it is works the effort to check for dynamic set bytes. This seems rather complex and is in the end not more that a heuristic (well, that is also the case for some other types, that have a rather "badly" chosen signature 😂), that may lead to false positives.
I think I will not add support for them here, but may do so if there is a specific request.

@karlch karlch mentioned this pull request Jul 12, 2023
4 tasks
From python 3.11 on try induces zero-overhead when no exception is
raised in the try. This is much more efficient than using suppress.
The try is placed inside for loop to ensure that a check that raises
does not affect other checks.
@jcjgraf jcjgraf requested a review from karlch July 13, 2023 06:16
Could also be useful in other unit tests in the future.
@karlch karlch merged commit 39f7be2 into karlch:master Jul 13, 2023
jcjgraf added a commit to jcjgraf/vimiv-qt that referenced this pull request Jul 31, 2023
This should have been changed as port of karlch#650.
This plugin is no longer needed for jp2 support, but jp2 is natively
supported, given that the qt-imageformats module is installed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecation of imghdr module in python 3.11
3 participants