Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop flattening EXIF IFD into getexif() #4947

Merged
merged 7 commits into from
Mar 19, 2021
Merged

Conversation

radarhere
Copy link
Member

@radarhere radarhere commented Oct 5, 2020

Resolves #3973 and #5273

At the moment, the Exif class flattens the EXIF IFD into the rest of the Exif data. Losing information in this way is not ideal.

Here is the new test added to demonstrate the difference.

def test_exif_ifd(self):
    im = Image.open("Tests/images/flower.jpg")
    exif = im.getexif()

    reloaded_exif = Image.Exif()
    reloaded_exif.load(exif.tobytes())
    assert reloaded_exif.get_ifd(0x8769) == exif.get_ifd(0x8769)

To try and prevent this, while maintaining as much backwards compatibility as possible, this PR stops this flattening only for getexif(), while keeping it in place for the older _getexif(). So this is backwards incompatible, but only for code written after the release of 6.0.0.

Only one existing test had to be changed to match this breaking change.

@@ -3346,91 +3351,100 @@ def tobytes(self, offset=8):
head = b"MM\x00\x2A\x00\x00\x00\x08"
ifd = TiffImagePlugin.ImageFileDirectory_v2(ifh=head)
for tag, value in self.items():
if tag in [0x8769, 0x8225] and not isinstance(value, dict):
value = self.get_ifd(tag)
Copy link
Contributor

@kkopachev kkopachev Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work for interop since it's inside of Exif IFD.

im = Image.open('10_years_of_Wikipedia_by_Guillaume_Paumier.jpg')
good_exif = im.getexif()

bad_exif = Exif()
bad_exif.load(good_exif.tobytes())

print("Good interop dictionary size: %d" % len(good_exif.get_ifd(0xA005)))
print(" Bad interop dictionary size: %d" % len(bad_exif.get_ifd(0xA005)))

I suspect same for makernote too, but since makernote is vendor specific, I think it would be safer to make it read only property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've added another commit to cover Interop.

Copy link
Contributor

@kkopachev kkopachev Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've being thinking through this on and off for some time.
Thinking of consumer of this class, it would make sense when reading Exif IFD to get Interop value decoded into a dict at that time. Or shimmed with ImageFileDirectory_v2 instance. Otherwise caller must know Interop tag ID, must know to call get_ifd, etc. So that's a bit inconvenient and adds extra knowledge to consumer.
Same for Makernote, however saving it back we don't want to encode it back to vendor-specific format and probably worth just copying whatever bytes were there originally.
On the other hand Exif is quite complicated and some knowledge from consumer is expected anyway.
🤷
What do you think of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit of using ImageFileDirectory_v2 as a shim for sub-IFDs is that we get lazy loading.

@kkopachev
Copy link
Contributor

Also consider that Exif might contain multiple IFDs one after another. Currently only first IFD is preserved on save.

src/PIL/Image.py Outdated Show resolved Hide resolved
@radarhere
Copy link
Member Author

Also consider that Exif might contain multiple IFDs one after another. Currently only first IFD is preserved on save.

I'm not following. What do you mean, 'only first IFD'?

@kkopachev
Copy link
Contributor

I was referring to IFD1 - thumbnail image info which might be present in the file. Accessible through self._info.next after reading IFD0.
Not sure how important it is to preserve it, thought we're here already and it might affect code design anyway.

f"Expecting to read {size} bytes but only got "
f"{len(data)}. Skipping tag {ifd_tag}"
if tag not in self._ifds:
if tag in [0x8769, 0x8825]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think __getitem__ method should be updated as well to call get_ifd for Exif IFD. I can't comment on non-changed code, but currently it's line 3474 should be if tag in [0x8769, 0x8825]:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I can see that this would be more useful for most cases, if you wanted to actually read the IFD offset, you wouldn't be able to anymore (apart from looking directly into _data or _info)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point.
I think getting offset of Exif IFD is useless by itself, unless you want manually parse IFD structure yourself, in which case you could do so with raw exif bytes.
Additionally, I think code should be consistent: either decode both GPS and Exif IFDs into dictionaries on read in __getitem__, or return offset in both cases.

@radarhere
Copy link
Member Author

I was referring to IFD1 - thumbnail image info which might be present in the file. Accessible through self._info.next after reading IFD0.
Not sure how important it is to preserve it, thought we're here already and it might affect code design anyway.

Couldn't that easily lead to a situation where the thumbnail image no longer matches the main image, if the main image is modified by Pillow, but the thumbnail image isn't?

@kkopachev
Copy link
Contributor

Couldn't that easily lead to a situation where the thumbnail image no longer matches the main image, if the main image is modified by Pillow, but the thumbnail image isn't?

That's a great point. In that case it's better to ignore IFD1 for writing.

@hugovk
Copy link
Member

hugovk commented Mar 7, 2021

@kkopachev How does this PR look now?

@kkopachev
Copy link
Contributor

@hugovk Apart from inconsistent treatment of GPS IFD in __getitem__, it looks good :shipit:

@radarhere
Copy link
Member Author

Ok, I've added a commit to treat it consistently.

@hugovk hugovk merged commit 5a20908 into python-pillow:master Mar 19, 2021
@hugovk
Copy link
Member

hugovk commented Mar 19, 2021

Thanks both!

@radarhere radarhere deleted the exif branch March 19, 2021 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exif_transpose errors out on some images
3 participants