Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal metadata keys #325

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

jcjgraf
Copy link
Contributor

@jcjgraf jcjgraf commented Jan 22, 2021

This PR adds internal metadata keys to vimiv, as outlined in #324. Unlike @karlch has suggested, I have put everything into the preexisting ExifHandler. I had the feeling that the proposed idea of having separate classes for the internal and external keys is slightly overkill for the few internal keys we are actually adding.

The basic idea is the following: Clients do not longer call the ExifHandlers child-specific get_formatted_exif but get_formatted_metadata. This method is not overwritten by any child. It iterates over all desired_keys and handles all internal keys directly. For the exif keys, the child-specific _fetch_external_key is called which extracts the exif using the appropriate library. get_formatted_metadata combines all the extracted data and returns it to the client.

To get the image dimension I use the QImageReader class instead of accessing the pixmap somehow. This has the advantage that it is fully decoupled from the qt pixmap of vimiv. This reader class is actually meant for accessing image properties without rendering a fullblown pixmap.

In my view, the new functionality could be added the ExifHandler in a rather clean manor and the complexity of the class is still reasonable.

Nevertheless, this PR is just a proposition and in the current state merely a proof-of-concept. If you think that it is still better to split the functionality I will redo this PR.

If we use this PR as a basis, then there are the following open TODOs:

  • Rename class, methods and docstring to metadata instead of exif
  • Add/update docstrings of new/changed methods.
  • How should we handle the case when no exiv library is installed? Just present these 4 lonely keys? 😆
  • How to represent the keys internally? Currently, all keys are stored in a list, but in the get_formatted_metadata I still redefine them multiple times. Would something like a dict cleanup the mess?
  • Are there any other useful internal keys?
  • Update docs.
  • Better way to create the generator in get_keys?
  • Unit test
  • Changelog
  • Improve check if key is equal to desired_key in piexif

@karlch
Copy link
Owner

karlch commented Jan 23, 2021

That was fast, awesome, thanks for the PR! I really like using QImageReader for this, that makes a lot of sense.

I still feel like an additional class for the internal keys could make sense, especially if it inherited from dict so we could simplify the get_formatted_metadata to something like:

    def get_formatted_metadata(self, desired_keys: Sequence[str]
    ) -> Dict[Any, Tuple[str, str]]:
        metadata = dict()

        for base_key in desired_keys:
            try:
                key, key_name, key_value = self._fetch_key(base_key)
                metadata[key] = key_name, key_value
            except (KeyError, TypeError):
                _logger.debug("Invalid key '%s'", base_key)

        return metadata

    def _fetch_key(self, key: str):
        if key.lower().startswith("vimiv"):
            return self._internal_handler[key]
        return self._fetch_external_key(key)

If it were in it's own class, the class could also lazy-initialize the QImageReader to avoid constructing it multiple times, i.e. something like

class _InternalKeyHandler(dict):

    def __init__(self, path: str):
        super().__init__(
            # Define all the keys here with the corresponding actions
        )
        self._path = path
        self._reader: Optional[QImageReader] = None

    @property
    def reader(self) -> QImageReader:
        if self._reader is None:
            self._reader = QImageReader(self._path)
        return self._reader

We would then override __getitem__ in the internal key handler to:

  • Find the corresponding action case-insensitvely
  • Call the action to get the corresponding value
  • Return in the same format as _fetch_external_key

This is getting rather long, but I will try to add my thoughts on some of the other open issues:

  • I am not sure if we should expose the 4 lonely keys. If so, we may need to do some refactoring in the metadata widget, and certainly remove the has_exif_support part.
  • If we go for the internal key handler class, this class would store and encapsulate all the keys.
  • I really like the way you chain the two parts into the get_keys generator. Possibly this logic could only be done once in the base class, so we don't have to repeat the code concerning chaining / getting the internal keys in the children. Very similar to how you split get_formatted_metadata and _fetch_external_key.

@karlch karlch mentioned this pull request Jan 23, 2021
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 23, 2021

Thank you for your comments and suggestions. I will look into it and see what I can do. But be patient, I am really busy the new two weeks an I am unsure when I can look into this.

@karlch
Copy link
Owner

karlch commented Jan 23, 2021

Awesome, thank you, and as always: no hurry! 😊

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 28, 2021

I have split the ExifHander class into three separate ones; MetadataHandler is the main class and it is used by clients to interact with metadata. InternalKeyHandler deals with the internal keys and it build upon the framework you have given me. ExternalKeyHandler{Base, Piexif, Exiv} is the old ExifHander{Base, Piexif, Exiv}.

There is still lots of cleanup etc. to do. But before I do that I would be grateful if you could have a glimpse at the code and comment on the general structure. Just to make sure that I do not have to do the same work twice when we decide to rearrange some larger things... 😊

@karlch
Copy link
Owner

karlch commented Jan 31, 2021

So you did end up adding another class which combines the two handlers. Personally, I like this a lot, we keep the fetching and combining logic separated. Only downside is the need to more or less redefine the functions for copy_metadata and get_date_time. How about also exposing the ExternalKeyHandler so we can call copy_metadata and get_date_time directly in the respective parts of the code, and only use MetadataHandler to get the metadata keys and information?

I will add a few inline comments to clarify some of my ideas 😊

vimiv/imutils/exif.py Outdated Show resolved Hide resolved
vimiv/imutils/exif.py Outdated Show resolved Hide resolved
vimiv/imutils/exif.py Outdated Show resolved Hide resolved
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 31, 2021

Yeah, as you initially suggested, three classes allow to split these different modules in a clean manor.
I am against exposing the ExternalKeyHandler. I think the clients just want the metadata and they should not have to bother if the data is coming from the ExternalKeyHandler or from the MetadataHandler.

@karlch
Copy link
Owner

karlch commented Feb 1, 2021

Oh, sorry, the wording "expose" was probably rather bad. I fully agree that no-one should have to bother where metadata comes from. What I meant was to make the _ExternalKeyHandler class public for usage within vimiv so we don't have to wrap the two functions copy_metadata and get_date_time. We currently only use these internally in imutils._file_handler / imutils.filelist and I don't really see the point in having them in api-related functionality after exposing the metadata handler to the api as proposed in #323.

In this case I actually wonder if it also makes sense to make MetadataHandler._fetch_key public or even make it __getitem__ so, once exposed via the api, it is easy for clients to retrieve single keys.

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Feb 1, 2021

Ah, this makes much more sense. I thought you are talking about exposing it to the API 🙃
Thanks for these suggestions and ideas! I will look into it in the following week.

@jcjgraf jcjgraf force-pushed the feature/internalmetadata branch 2 times, most recently from c60e20c to 56775ac Compare February 2, 2021 20:33
vimiv/version.py Outdated Show resolved Hide resolved
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Feb 4, 2021

Commit 71ac242 has somehow broken the test_switch_metadata_information_upon_new_image testcase. Do you have an idea what went wrong (I am sorry, I am not very familiar with unit tests).

How should we unit test this module? If I am not mistaken the metadata functionality is only tested via the metadata widget. Probably it would make sense to test these handlerclasses separately.

Copy link
Owner

@karlch karlch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! I left a few comments that should hopefully fix the tests as well as make the complete CI pass. Have a nice week-end! 😊

vimiv/gui/metadatawidget.py Outdated Show resolved Hide resolved
vimiv/gui/metadatawidget.py Outdated Show resolved Hide resolved
vimiv/imutils/metadata.py Outdated Show resolved Hide resolved
vimiv/imutils/metadata.py Outdated Show resolved Hide resolved
vimiv/imutils/metadata.py Outdated Show resolved Hide resolved
vimiv/imutils/metadata.py Outdated Show resolved Hide resolved
@karlch
Copy link
Owner

karlch commented Feb 6, 2021

Concerning the unit-tests: I agree that we should probably test them all in a nice and isolated fashion in tests/unit/test_metadata.py. As we no longer have the setting defining which metadata keys to get, this should be possible quite nicely by first setting up an image with known exif content, and then testing the individual parts fetch_key and get_keys once on the individual handlers and then maybe even on the combined one.

If you like, you can take a shot at this and ask any questions. Otherwise I am happy to do this after merging.

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Feb 7, 2021

Thank you for your suggestions, hits and comments. I will go though it in the following week(s) and also give it a try to create some appropriate unit test. Though be warned, I have to not that before 😄

@karlch
Copy link
Owner

karlch commented Feb 14, 2021

Awesome work, this is taking getting in good shape! 😊

I noticed that the noexif and the piexif environments are failing, both are trivial fixes:

  • noexif: We apply a custom skip marker to the tests in the test_exif.py unit test in tests/conftest.py. As the test was renamed, the marker has to be renamed as well. This is in the beggining of the apply_exif_markers function.
  • piexif: I had a typo in the code snippet I sent you. It should be contextlib.suppress(piexif.InvalidImageDataError) not contextlib.suppress(piexif.InvalidImageDateError). Sorry 😄

Finally, you somehow managed to wreck havoc with the git history 😆 I guess a simple rebase against the current master here should fix this.

@karlch
Copy link
Owner

karlch commented Apr 11, 2021

Would you like some more time to work on this or is it time to take another look and hopefully provide some help? 😊

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Apr 11, 2021

I am sorry that I have neglected this PR a little in the last few weeks ☺️

It would actually be great if you could give me some hints any tips. I am kind off stuck with two things:

  • Concerning the test setup and preparation of a dummy test image. Metadatahandlerrequires the path to an image (and not an instance if QImage or something). Therefore, in imutils/test_metadata.py have added the fixture dummy_image() which creates a image using PIL and saves it to the disk.

    • Is this actually the right approach or is there some cleaner method?
    • How to remove the image after test? Just create a fixture wich runs something like os.system("rm ./image.jpg") after the test?
  • I have started with a first test test_MetadataHandler_fetch_key() which should test the MetadataHandler.fetch_key(). In this test, firstly I get a dictionary of pyexiv2.exif.ExifTag instances (returned by metadata_content()). Then I iterate over all elements of this dictionary in order to check if my prepared image contains all these metadata keys/values. Weirdly, accessing properties of these pyexiv2.exif.ExifTag instances while iterating over the dictionary causes a segmentation fault.

    • Now, I am unsure if I do make some obvious mistake or if this is a non-trivial problem (e.g. a problem with pyexiv2).
    • Currently, the test just fails but does not report segfault. So see the segfault change the test to e.g.
def test_MetadataHandler_fetch_key(get_MetadataHandler, metadata_content):
    for key, value in metadata_content.items():
        print(value.human_value)
    assert True

After solving the first two points, a follow up question is the following. Let's say I have defined the test test_A, test_B, ... test_Z. How do I run these test on piexif, pyexiv2 and noexif? I guess it is somehow relate to monkeypatches but I have not yet figured out how that all words.

I hope explaining me how to write these tests does not take more time if you would just write them yourself.

@karlch
Copy link
Owner

karlch commented Apr 11, 2021

No worries at all, I have neglected work on vimiv completely over the past months ...

This took some debugging, and it indeed required writing the code, so I will just dump the snippets I came up with and some hopefully useful information for future PRs 😊

@pytest.fixture()
def dummy_image(qapp, tmp_path):
    filename = str(tmp_path / "image.jpg")  # We use pytest's tmp_path fixture to create the image in a tmp directory that is cleaned for us
    QPixmap(300, 300).save(filename)  # Let's use PyQt directly instead of another external module, requires the "qapp" fixture to create a default QApplication
    return filename


@pytest.fixture
def metadata_handler(add_metadata_information, dummy_image, metadata_content):
    assert pyexiv2 is not None, "pyexiv2 required to add metadata information"
    add_metadata_information(dummy_image, metadata_content)
    return MetadataHandler(dummy_image)


@pytest.fixture
def metadata_handler_piexif(dummy_image, metadata_handler):
    # This is certainly not beautiful, but I could not come up with a consistent solution as we need pyexiv2
    # to add the metadata information and the external handler is defined upon import level
    metadata_handler._ext_handler = metadata._ExternalKeyHandlerPiexif(dummy_image)
    return metadata_handler


def test_metadatahandler_fetch_key(metadata_handler, metadata_content):
    for key, value in metadata_content.items():
        fetched_key, _, fetched_value = metadata_handler.fetch_key(key)
        assert fetched_key == key
        try:
            assert fetched_value == value.human_value
        except AttributeError:
            assert fetched_value == value.raw_value


def test_metadatahandler_fetch_key_piexif(metadata_handler_piexif, metadata_content):
    # I also ended up writing an additional test for piexif as
    # 1) the keys are in their short form and thus must be compared differently
    # 2) there is no support for the human value also making this comparison different
    for key, value in metadata_content.items():
        fetched_key, _, fetched_value = metadata_handler_piexif.fetch_key(key)
        short_key = key.rpartition(".")[-1]
        assert fetched_key == short_key
        assert fetched_value == value.raw_value

and I had to modify the add_metadata_information fixture as follows:

        for tag in content.values():
            _metadata[tag.key] = tag.value

It seems like pyexiv2 invalidates the python ExifTag object after writing and reading. Here is a short script that can reproduce the segfault without any of the vimiv-related stuff:

import pyexiv2
from PyQt5.QtGui import QImage


tag = pyexiv2.ExifTag("Exif.Image.Copyright", "vimiv-AUTHORS-2021")

path = "image.jpg"
image = QImage(300, 300, QImage.Format_ARGB32)
image.save(path)

handler = pyexiv2.ImageMetadata(path)
handler.read()
# Uncomment for segfault
# handler[tag.key] = tag
handler[tag.key] = tag.value

handler.write()

handler = pyexiv2.ImageMetadata(path)
handler.read()

print(tag.human_value)

Concerning the piexif part:
In the quick snippet I ended up defining an additional fixture to force using piexif and not pyexiv2 as the test needed to be adjusted anyway. Once there are tests that run as-is with both handlers, we should be able to make us of the parametrized external_handler fixture in combination with another metadata_handler_both(external_handler) fixture that updates the external handler much like I did for the piexif one. When this is then used in any tests, the tests are run with both (all) parameters of the fixture parametrization.

Let me know if there are any questions and happy hacking 😊

@TornaxO7
Copy link

This pull request would probably solve this issue (it pops up, if I open vimiv within vifm):
image

Just mentioning this for "motivation" ;)

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Apr 16, 2021

Thanks @TornaxO7 for using vimiv and reporting this incidence! It is a bit unfortunate that vifm reports this an error but it is actually by purpose. You are missing the optional dependency which is required for metadata support. See here for the link to the required library: https://karlch.github.io/vimiv-qt/documentation/exif.html

@karlch Maybe we should improve this log by indicating how to enable metadata support to prevent further confusion. Don't you think so too?

@karlch
Copy link
Owner

karlch commented Apr 16, 2021

Welcome to vimiv and thanks from me as well @TornaxO7.

Yes, I fully agree with everything you said @jcjgraf! Something simple like

    _logger.warning(
        "There is no exif support and therefore:\n"
        "1. Exif data is lost when writing images to disk.\n"
        "2. The `:metadata` command and associated `i` keybinding is not available.\n"
        "3. The {exif-date-time} statusbar module is not available.\n\n"
        "Please install pyexiv2 or piexif to silence this warning.\n"
        "For more information see\n"
        "https://karlch.github.io/vimiv-qt/documentation/exif.html.\n"
    )

could suffice IMHO. If you agree with this and the wording, I can just commit this snippet to master.

- Propagate KeyError from KeyHandler
- Do not catch UnsupportedMetadataOperation in metadatawidget since it
will never be thrown
- Fix lint
@jcjgraf jcjgraf force-pushed the feature/internalmetadata branch 2 times, most recently from 0cad693 to aae1fe2 Compare April 27, 2021 18:30
However, this is not supported by piexif
Again, that is not supported by piexif
Upon loading metadata information of a image with a type not supported by
`piexif` a not caught exception was raised.
Thanks @BachoSeven for pointing this out!
@jcjgraf jcjgraf marked this pull request as ready for review April 28, 2021 18:11
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Apr 28, 2021

This PR is finally ready for an in-depth review. I hope my commits are not too messy. I did not dear to clean it up more. I fear that I would break something 😄

I am not sure what happend with the CI pipeline. Somehow it has gone 😕 Probably due to the conflicts with the master.

Commit 3050b22 is "cherry-pick" from #371 since it turned out to be a little bit too painfull to rebase this branch with master. Meaning, the conflicts with docs/documentation/exif.rst and vimiv/imutils/exif.py can be ignored (overwrite changes to master). While changes to docs/changelog.rst require manuel merging.

Copy link
Owner

@karlch karlch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, this was a lot of great work!

The tests test what they should for the most part, but I feel like the structure could be improved in general. There are an awful lot of fixtures, maybe this could also be broken down with some more parametrization? If there is anything I can help with, just let me know!

None of this is super-critical, so feel free to tell me in case you don't want to go into the havoc of refactoring the tests, I can also take a look at this some other time. On the other hand, if you feel like it could be a fun way to play around with pytest, just go ahead and hack away 😊

docs/documentation/metadata.rst Outdated Show resolved Hide resolved
docs/documentation/metadata.rst Outdated Show resolved Hide resolved
vimiv/gui/metadatawidget.py Outdated Show resolved Hide resolved
vimiv/imutils/metadata.py Outdated Show resolved Hide resolved
vimiv/imutils/metadata.py Outdated Show resolved Hide resolved
value: {val}\
tag: {tag}"
)
if keyname != key:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a brutal approach to look through every tag for every key. I propose keeping this for now, but considering thinking about an improvement, e.g. reading the piexif stuff into a dictionary we can deal with upon construction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um yeah, I agree that this could be solved more efficiently. I have added it to the todo 😊

vimiv/imutils/metadata.py Outdated Show resolved Hide resolved
vimiv/imutils/metadata.py Outdated Show resolved Hide resolved
@@ -0,0 +1,335 @@
# vim: ft=python fileencoding=utf-8 sw=4 et sts=4
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I like having the fixtures at the top of a test module, followed by all the tests, to give some more structure to the file. Feel free to adopt this if you like 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was to add more specific fixtures (ones which are not widely used) above the test cases in which they are used. But obviously I will change it to conform to the general styling rule of this project.


def test_external_keyhandler_get_date_time(
external_keyhandler, external_keyhandler_piexif, dummy_image
):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really funky way to do this as pytest provides parametrization precisely for this purpose 😊

As an example, the (unused?) fixture at the top called external_handler will make tests run twice, once with metadata.ExternalKeyHandler as value for external_handler and once with metadata._ExternalKeyHandlerPiexif as the argument. This fixture could easily be modified to actually create the handler with some image as path, but it might be nice to do this within this test, so the structure would be something along:

  • add metadata to dummy image if any
  • create handler with path to dummy image
  • compare to expected result

Both the metadata and the expected results could also be parameters, now not of the fixture but of the test itself. The structure would end up looking somewhat like this (dummy code):

@pytest.mark.parametrize("metadata, expected_date", [({"name": ExifTag("name", "value"}, "value"), ({}, "")]
def test_external_keyhandler_copy_metadata(external_handler, dummy_image, add_metadata_information, metadata, expected_date):
    src = dummy_image("image.jpg")
    add_metadata_information(src, metadata)
    handler = external_handler(src)
    assert handler.get_date_time() == expected_date

While this doesn't look much better than the four assert statements in your implementation, it does have a few advantages:

  • Each assert is run in its own test, i.e. we test pyexiv2 and piexif as well as content / no content separately
  • We can very easily add more combinations

This kind of logic applies to some more parts of the tests you wrote, e.g. the copy_metadata test could then also be parametrized instead of having a for loop. Hope this makes sense 😊

Mention the new internal key and improve the comparison between the two
libraries
Thanks @karlch for pointing them out
@jcjgraf
Copy link
Contributor Author

jcjgraf commented May 3, 2021

Thanks for catching some silly typos and mistakes and for the constructive feedback on my tests. I have written so many fixtures in order that I have great flexibility in how to combine them 🙃 But you are right, I have not read the full documentation of pytest yet, else I would probably not have written a single testcases by now 😏

I will do some more research on parametrisation and try to improve the test accordingly!

@jcjgraf
Copy link
Contributor Author

jcjgraf commented May 3, 2021

@karlch Am I heading into the right direction based on the changes 2fb9a4c?

What do you think about the fixtures external_content, internal_content and metadata_content? Can I leave them and use them as they are or is there a better way to deal with them? Personally, I do not see much of an advantage of removing them and re-defining all desired metadata data in each parametrisations.

@jcjgraf jcjgraf mentioned this pull request May 3, 2021
4 tasks
@karlch
Copy link
Owner

karlch commented May 23, 2021

This is a nice first step, I guess this style can also be used for the test_metadatahandler_get_keys test case to avoid looping over the handlers.

I am perfectly fine keeping those fixtures, you could also just define a global constant for the fixed dictionaries at the top to reduce the number of fixtures passed to a testcase. I.e. for example:

EXTERNAL_CONTENT = {
...
}

Independent of the fixtures, the try ... except AttributeError nesting has gotten a bit out of hand in value_match and value_match_piexif 😄

@jcjgraf jcjgraf mentioned this pull request Feb 25, 2022
32 tasks
@karlch karlch added this to the v0.9.0 milestone Jan 7, 2023
@jcjgraf jcjgraf mentioned this pull request Jul 15, 2023
4 tasks
@karlch karlch modified the milestones: v0.9.0, v0.10.0 Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants