Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop flattening EXIF IFD into getexif() #4947

Merged
merged 7 commits into from Mar 19, 2021
Merged

Conversation

@radarhere
Copy link
Member

@radarhere radarhere commented Oct 5, 2020

Resolves #3973 and #5273

At the moment, the Exif class flattens the EXIF IFD into the rest of the Exif data. Losing information in this way is not ideal.

Here is the new test added to demonstrate the difference.

def test_exif_ifd(self):
    im = Image.open("Tests/images/flower.jpg")
    exif = im.getexif()

    reloaded_exif = Image.Exif()
    reloaded_exif.load(exif.tobytes())
    assert reloaded_exif.get_ifd(0x8769) == exif.get_ifd(0x8769)

To try and prevent this, while maintaining as much backwards compatibility as possible, this PR stops this flattening only for getexif(), while keeping it in place for the older _getexif(). So this is backwards incompatible, but only for code written after the release of 6.0.0.

Only one existing test had to be changed to match this breaking change.

@@ -3346,91 +3351,100 @@ def tobytes(self, offset=8):
head = b"MM\x00\x2A\x00\x00\x00\x08"
ifd = TiffImagePlugin.ImageFileDirectory_v2(ifh=head)
for tag, value in self.items():
if tag in [0x8769, 0x8225] and not isinstance(value, dict):
value = self.get_ifd(tag)
Copy link
Contributor

@kkopachev kkopachev Oct 6, 2020

This does not work for interop since it's inside of Exif IFD.

im = Image.open('10_years_of_Wikipedia_by_Guillaume_Paumier.jpg')
good_exif = im.getexif()

bad_exif = Exif()
bad_exif.load(good_exif.tobytes())

print("Good interop dictionary size: %d" % len(good_exif.get_ifd(0xA005)))
print(" Bad interop dictionary size: %d" % len(bad_exif.get_ifd(0xA005)))

I suspect same for makernote too, but since makernote is vendor specific, I think it would be safer to make it read only property.

Copy link
Member Author

@radarhere radarhere Oct 7, 2020

Ok, I've added another commit to cover Interop.

Copy link
Contributor

@kkopachev kkopachev Oct 7, 2020

I've being thinking through this on and off for some time.
Thinking of consumer of this class, it would make sense when reading Exif IFD to get Interop value decoded into a dict at that time. Or shimmed with ImageFileDirectory_v2 instance. Otherwise caller must know Interop tag ID, must know to call get_ifd, etc. So that's a bit inconvenient and adds extra knowledge to consumer.
Same for Makernote, however saving it back we don't want to encode it back to vendor-specific format and probably worth just copying whatever bytes were there originally.
On the other hand Exif is quite complicated and some knowledge from consumer is expected anyway.
🤷
What do you think of this?

Copy link
Contributor

@kkopachev kkopachev Oct 7, 2020

Another benefit of using ImageFileDirectory_v2 as a shim for sub-IFDs is that we get lazy loading.

@kkopachev
Copy link
Contributor

@kkopachev kkopachev commented Oct 6, 2020

Also consider that Exif might contain multiple IFDs one after another. Currently only first IFD is preserved on save.

src/PIL/Image.py Outdated Show resolved Hide resolved
@radarhere
Copy link
Member Author

@radarhere radarhere commented Oct 7, 2020

Also consider that Exif might contain multiple IFDs one after another. Currently only first IFD is preserved on save.

I'm not following. What do you mean, 'only first IFD'?

@kkopachev
Copy link
Contributor

@kkopachev kkopachev commented Oct 7, 2020

I was referring to IFD1 - thumbnail image info which might be present in the file. Accessible through self._info.next after reading IFD0.
Not sure how important it is to preserve it, thought we're here already and it might affect code design anyway.

f"Expecting to read {size} bytes but only got "
f"{len(data)}. Skipping tag {ifd_tag}"
if tag not in self._ifds:
if tag in [0x8769, 0x8825]:
Copy link
Contributor

@kkopachev kkopachev Oct 7, 2020

I think __getitem__ method should be updated as well to call get_ifd for Exif IFD. I can't comment on non-changed code, but currently it's line 3474 should be if tag in [0x8769, 0x8825]:

Copy link
Member Author

@radarhere radarhere Oct 11, 2020

While I can see that this would be more useful for most cases, if you wanted to actually read the IFD offset, you wouldn't be able to anymore (apart from looking directly into _data or _info)

Copy link
Contributor

@kkopachev kkopachev Oct 12, 2020

I see your point.
I think getting offset of Exif IFD is useless by itself, unless you want manually parse IFD structure yourself, in which case you could do so with raw exif bytes.
Additionally, I think code should be consistent: either decode both GPS and Exif IFDs into dictionaries on read in __getitem__, or return offset in both cases.

@radarhere
Copy link
Member Author

@radarhere radarhere commented Oct 12, 2020

I was referring to IFD1 - thumbnail image info which might be present in the file. Accessible through self._info.next after reading IFD0.
Not sure how important it is to preserve it, thought we're here already and it might affect code design anyway.

Couldn't that easily lead to a situation where the thumbnail image no longer matches the main image, if the main image is modified by Pillow, but the thumbnail image isn't?

@kkopachev
Copy link
Contributor

@kkopachev kkopachev commented Oct 12, 2020

Couldn't that easily lead to a situation where the thumbnail image no longer matches the main image, if the main image is modified by Pillow, but the thumbnail image isn't?

That's a great point. In that case it's better to ignore IFD1 for writing.

@hugovk
Copy link
Member

@hugovk hugovk commented Mar 7, 2021

@kkopachev How does this PR look now?

@kkopachev
Copy link
Contributor

@kkopachev kkopachev commented Mar 9, 2021

@hugovk Apart from inconsistent treatment of GPS IFD in __getitem__, it looks good :shipit:

@radarhere
Copy link
Member Author

@radarhere radarhere commented Mar 15, 2021

Ok, I've added a commit to treat it consistently.

@hugovk hugovk merged commit 5a20908 into python-pillow:master Mar 19, 2021
50 checks passed
@hugovk
Copy link
Member

@hugovk hugovk commented Mar 19, 2021

Thanks both!

@radarhere radarhere deleted the exif branch Mar 19, 2021
radarhere added a commit to radarhere/Pillow that referenced this issue Mar 19, 2021
hugovk added a commit that referenced this issue Mar 19, 2021
@radarhere radarhere mentioned this pull request Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants