Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exif: separate entries into raw vs human-readable #2518

Closed

Conversation

lovell
Copy link
Member

@lovell lovell commented Nov 6, 2021

Here's a work-in-progress PR to fix #2388, which takes the approach suggested in kleisauke/net-vips#74 (comment)

It still needs unit tests (and more testing).

$ vipsheader -a test-1.jpeg | grep exif
exif-data: 5872 bytes of binary data
exif-ifd0-Orientation: 1
exif-ifd0-Orientation-human: Top-left, Short, 1 components, 2 bytes
exif-ifd0-XResolution: 72/1
exif-ifd0-XResolution-human: 72, Rational, 1 components, 8 bytes
exif-ifd0-YResolution: 72/1
exif-ifd0-YResolution-human: 72, Rational, 1 components, 8 bytes
exif-ifd0-ResolutionUnit: 2
exif-ifd0-ResolutionUnit-human: Inch, Short, 1 components, 2 bytes
exif-ifd0-Software: Windows Photo Editor 10.0.10011.16384
exif-ifd0-Software-human: Windows Photo Editor 10.0.10011.16384, ASCII, 38 components, 38 bytes
exif-ifd0-DateTime: 2021:08:06 17:17:03
exif-ifd0-DateTime-human: 2021:08:06 17:17:03, ASCII, 20 components, 20 bytes
exif-ifd0-Copyright: Arnold Jerocki/Divergence (Arnold Jerocki/Divergence (Photographer) - [None]
exif-ifd0-Copyright-human: Arnold Jerocki/Divergence (Arnold Jerocki/Divergence (Photographer) - [None] (Photographer) - [None] (Editor), ASCII, 77 components, 77 bytes
exif-ifd0-Padding: 2060 bytes undefined data
exif-ifd0-Padding-human: 2060 bytes undefined data, Undefined, 2060 components, 2060 bytes
exif-ifd1-Compression: 6
exif-ifd1-Compression-human: JPEG compression, Short, 1 components, 2 bytes
exif-ifd1-XResolution: 96/1
exif-ifd1-XResolution-human: 96, Rational, 1 components, 8 bytes
exif-ifd1-YResolution: 96/1
exif-ifd1-YResolution-human: 96, Rational, 1 components, 8 bytes
exif-ifd1-ResolutionUnit: 2
exif-ifd1-ResolutionUnit-human: Inch, Short, 1 components, 2 bytes
exif-ifd2-ExifVersion: Exif Version 2.1
exif-ifd2-ExifVersion-human: Exif Version 2.1, Undefined, 4 components, 4 bytes
exif-ifd2-DateTimeOriginal: 2021:08:05 12:37:28
exif-ifd2-DateTimeOriginal-human: 2021:08:05 12:37:28, ASCII, 20 components, 20 bytes
exif-ifd2-DateTimeDigitized: 2021:08:05 12:37:28
exif-ifd2-DateTimeDigitized-human: 2021:08:05 12:37:28, ASCII, 20 components, 20 bytes
exif-ifd2-SubSecTimeOriginal: 58
exif-ifd2-SubSecTimeOriginal-human: 58, ASCII, 3 components, 3 bytes
exif-ifd2-SubSecTimeDigitized: 58
exif-ifd2-SubSecTimeDigitized-human: 58, ASCII, 3 components, 3 bytes
exif-ifd2-FlashPixVersion: FlashPix Version 1.0
exif-ifd2-FlashPixVersion-human: FlashPix Version 1.0, Undefined, 4 components, 4 bytes
exif-ifd2-ColorSpace: 1
exif-ifd2-ColorSpace-human: sRGB, Short, 1 components, 2 bytes
exif-ifd2-PixelXDimension: 1440
exif-ifd2-PixelXDimension-human: 1440, Long, 1 components, 4 bytes
exif-ifd2-PixelYDimension: 960
exif-ifd2-PixelYDimension-human: 960, Long, 1 components, 4 bytes
exif-ifd2-Padding: 2060 bytes undefined data
exif-ifd2-Padding-human: 2060 bytes undefined data, Undefined, 2060 components, 2060 bytes

@jcupitt
Copy link
Member

jcupitt commented Nov 6, 2021

I had a thought about this --- how about leaving the current exif outpuf as is for compatibility, and (as fredprobidi suggests) adding a new set of fields using JSON? Something like:

exif-ifd2-ShutterSpeedValue: 253/50 (5.06 EV (1/33 sec.), SRational, 1 components, 8 bytes)
exif-json-ifd2-ShutterSpeedValue: {value: "253/50", humanValue: "5.06 EV (1/33 sec.)", dataType: "SRational", components: 1, length: 8}

Now we get compatibility, plus easy and reliable structured data handling in all language bindings. We could maybe hide the old exif tags in vipsheader output. We'd need to detect user edits to the JSON on save.

We'd need to add a dependency for a JSON read-write library though :-( An obvious choice might be:

https://gitlab.gnome.org/GNOME/json-glib

It's mature, well-maintained, integrates with glib, it's by ebassi (who is great), etc.

@lovell
Copy link
Member Author

lovell commented Nov 6, 2021

If we're thinking of providing EXIF as JSON then perhaps this should be rolled up at a higher level as a single property, so the dirs and keys are also part of the JSON?

{
  "ifd2": {
    "ShutterSpeed": {
      "value": "253/50",
      "humanValue": "5.06 EV (1/33 sec.)",
      "dataType": "SRational",
      "components": 1,
      "length": 8
    }
  }
}

Alternatively, for backwards compatibility, we could provide the non-human values in new -raw suffix properties:

exif-ifd2-ShutterSpeed: 253/50 (5.06 EV (1/33 sec.), SRational, 1 components, 8 bytes)
exif-ifd2-ShutterSpeed-raw: 253/50

@jcupitt
Copy link
Member

jcupitt commented Nov 7, 2021

That's true, though you wonder what value we are adding then --- the user could simply parse the exif-data blob to make json themselves.

Off-topic, but I just reread exif.c to remind myself how this works and ... what a mess! libexif (and exif itself) is pretty bad.

Alternatively, for backwards compatibility, we could provide the non-human values in new -raw suffix properties:

Maybe this is the best compromise. We keep back compat, and we don't need to add a json library. I'm not sure how useful the extra info in the json output would be anyway.

@lovell
Copy link
Member Author

lovell commented Nov 8, 2021

A possible problem with the alternative -raw suffix approach is that whilst it keeps the read "API" backwards compatible, it either breaks the write "API" or makes the implementation more complex.

At the moment you can e.g. call image.set("exif-ifd2-ShutterSpeed", "253/50") and it overwrites the non-suffixed "human" variant. To continue to support this with two variants, where there is no longer any "tail dropping", we'll probably need to track which variant is to be written to the output (original raw vs updated human) for each field.

@jcupitt
Copy link
Member

jcupitt commented Nov 8, 2021

Yes, write is harder. We could maybe do:

parse the exif-data block (contains the original values) to make `exif`

for each item in `exif`:
    is there a tag called `exif-$tag-raw`? If not:
        the user must have deleted the tag, so we must delete the tag from `exif` as well
    also test for `exif-$tag`. If that's missing:
        we must be dealing with old code. Also delete `$tag` from `exif`

for each metadata item on the image starting with `exif`:
    if it ends `-raw`:
        add the tag to `exif`, or update any existing tag
    if it doesn't end with `-raw`, AND there's no matching `-raw` tag:
        we are dealing with old API code, add or update `exif` from the `xx (yy)` value

ie. we add a raw case, and it takes priority over the old xx (yy) API. I think that code's all there now, except for the raw stuff.

@jcupitt
Copy link
Member

jcupitt commented Nov 8, 2021

Thinking about this again, perhaps we could fix this another way?

Here's a tiny ruby program to show the issue:

#!/usr/bin/ruby

require "vips"

x = Vips::Image.new_from_file ARGV[0]

x = x.mutate do |mutable| 
  mutable.set_type! GObject::GSTR_TYPE, "exif-ifd0-XPComment", "hello ( there"
end

x.write_to_file ARGV[1]

I can run like this and it works:

$ ~/try/badcomment.rb k2.jpg x.jpg
$ vipsheader -f exif-ifd0-XPComment x.jpg 
hello ( there (hello ( there, Byte, 28 components, 28 bytes)

So that's the correct xx (xx, Byte, 28 components, 28 bytes) value, it's just that xx contains an embedded ( character.

When we later copy the image, it fails:

$ vips copy x.jpg y.jpg
$ vipsheader -f exif-ifd0-XPComment y.jpg 
hello ( there (hello (hello ( there (hello, Byte, 42 components, 42 bytes)

What's happened is:

  1. The 28-byte value (it's in utf-16) hello ( there is read from x.jpg.
  2. It is correctly coded as hello ( there (hello ( there, Byte, 28 components, 28 bytes) and attached to the image.
  3. On jpegsave, exif.c takes the coded value from the image metadata, finds the trailing ( ) group, ie, ( there, Byte, 28 components, 28 bytes), and removes it, leaving hello ( there (hello.
  4. This stripped value is compared to the value in the original exif data block (hello ( there) and is found to be different.
  5. The exif data is updated with what we have erroneously identified as the new value, and the saved string grows to hello ( there (hello.

So for a fix, how about just changing steps 3 and 4.

Don't try to transform the new image metadata value to match the original exif value (ie. reversing the stringification from step 2, a difficult problem), instead just re-stringify the exifdata block and look for differences in the coded value.

If we find they are different, we simply attach the new value to the image, and assume the user didn't painstakingly create the trailing (yy) stuff (very hard to get right anyway).

This removes the need to reverse the stringification and hopefully fixes the bug. What do you think?

It means we keep this stupid and broken interface. Perhaps we should have a separate issue for replacing it with something better.

jcupitt added a commit that referenced this pull request Nov 9, 2021
@jcupitt
Copy link
Member

jcupitt commented Nov 9, 2021

I did a quick hack: 8195b67 It was easier than I thought -- you only need to add a change detector near the start. With this I see:

$ ~/try/badcomment.rb k2.jpg x.jpg
$ vipsheader -f exif-ifd0-XPComment x.jpg 
hello ( there (hello ( there, Byte, 28 components, 28 bytes)
$ vips copy x.jpg y.jpg
$ vipsheader -f exif-ifd0-XPComment y.jpg 
hello ( there (hello ( there, Byte, 28 components, 28 bytes)

ie. it's correctly seeing that the item hasn't changed and shouldn't be regenerated.

What do you think?

@jcupitt jcupitt added this to the 8.12 milestone Nov 12, 2021
@lovell
Copy link
Member Author

lovell commented Nov 12, 2021

Looks good, and much simpler. Is it possible to remove an EXIF property with this approach?

@kleisauke
Copy link
Member

Looks good to me as well. I tried to cover this by a unit test, see commit kleisauke@008fe2d.

@jcupitt
Copy link
Member

jcupitt commented Nov 13, 2021

Yes, remove should work, I expanded the test prog:

#!/usr/bin/ruby

require "vips"

# set the bad comment
x = Vips::Image.new_from_file ARGV[0]
x = x.mutate do |mutable|
  mutable.set_type! GObject::GSTR_TYPE, "exif-ifd0-XPComment", "hello ( there"
end
x.write_to_file "t1.jpg"

# copy the image with the bad comment
x = Vips::Image.new_from_file "t1.jpg"
comment = x.get("exif-ifd0-XPComment")
puts "comment after save, load:"
puts "#{comment}"
x.write_to_file "t2.jpg"

# read back again and check the comment is OK
x = Vips::Image.new_from_file "t2.jpg"
comment = x.get("exif-ifd0-XPComment")
puts "comment after save, load, save, load:"
puts "#{comment}"

# delete and save
x = x.mutate do |mutable|
  mutable.remove! "exif-ifd0-XPComment"
end
x.write_to_file "t3.jpg"

# read back again and check the comment is OK
x = Vips::Image.new_from_file "t3.jpg"
puts "try to get after remove, save, load:"
typeof = x.get_typeof("exif-ifd0-XPComment")
puts "typeof = #{typeof}" 

I see:

$ ./badcomment.rb ~/pics/k2.jpg
comment after save, load:
hello ( there (hello ( there, Byte, 28 components, 28 bytes)
comment after save, load, save, load:
hello ( there (hello ( there, Byte, 28 components, 28 bytes)
try to get after remove, save, load:
typeof = 0

So it's successfully saved and loaded again unchanged, and you can remove.

@jcupitt
Copy link
Member

jcupitt commented Nov 13, 2021

Ah, @kleisauke 's unit test tested remove as well, that's easy.

OK, I've merged the change to master, I think we're done! Let's kill off the last few tiny things and make 8.12-rc1.

@jcupitt jcupitt closed this Nov 13, 2021
@lovell
Copy link
Member Author

lovell commented Nov 13, 2021

Excellent, thanks John.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EXIF metadata strings containing multiple brackets have their values altered
3 participants