Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to write simple Exif metadata to JPEG from scratch. #586

Closed
garretwilson opened this issue Jan 31, 2021 · 15 comments
Closed

How to write simple Exif metadata to JPEG from scratch. #586

garretwilson opened this issue Jan 31, 2021 · 15 comments

Comments

@garretwilson
Copy link

garretwilson commented Jan 31, 2021

I don't understand how to write simple Exif metadata to a JPEG file I've read using Java Image I/O.

I'm successfully reading, processing, and writing JPEG metadata using Java Image I/O. It's works fine.

I set my image reader to ignore reading metadata. I don't care about the original metadata (in this context). Indeed the original JPEG may have already had its metadata removed, and have no metadata whatsoever:

ImageInputStream imageInputStream= …;
ImageReader imageReader = …;
ImageReadParam imageReadParam = imageReader.getDefaultReadParam();
imageReader.setInput(imageInputStream, true, true);
Buffered Image image = imageReader.read(0, imageReadParam);

I write it out:

ImageWriter imageWriter = getImageWriter(imageReader);
ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam();
…
final ImageOutputStream imageOutputStream = …;
imageWriter.setOutput(imageOutputStream);
IIOImage iioImage = new IIOImage(newImage, null, null);
imageWriter.write(null, iioImage, imageWriteParam);

This gives me no metadata, as I would expect.

Now with TwelveMonkeys, I want to add the simplest of the very simplest Exif metadata to my JPEG. Let's say I want to add the Exif field ImageDescription (270, 0x010E) with a value of "foobar". What could be simpler? But I don't see how to do that with TwelveMonkeys.

In How to write Exif to a JPEG with TwelveMonkeys' ExifWriter class, @haraldk mentioned some sort of super low-level approach that involved actually serializing strings to bytes. I'm not sure how this relates to the TwelveMonkeys EXIFWriter/TIFFWriter, as I see that they seem to do serializing internally.

But the bigger issue is how to get an IIOMetadata instance. If I had one of those, I could just pass it to new IIOImage(newImage, null, iioMetadata) couldn't I? I see that there is a JPEGImage10Metadata class, but how do I get one of those? Moreover how do I do this in a plugin-agnostic way, so that I don't tie the code to TwelveMonkeys?

I'm a little lost here. Could someone point me in the right direction? Surely writing a single Exif "foobar" description value from scratch is the simplest Exif test I could make, but I can't even figure out how to do that.

@garretwilson
Copy link
Author

As a workaround I'm using Apache Commons Imaging to completely rewrite the metadata after the image is completely processed. This is the only way I've found so far in Java to discard all metadata and write Exif metadata from scratch. But it seems so wasteful to rewrite the entire image file, when I already had it loaded in memory and was processing it with Java Image I/O. I hope you can tell me way to simply add in the metadata as I process it as explained above.

@haraldk
Copy link
Owner

haraldk commented Feb 1, 2021

Hi Garret,

I agree that the approach described in the StackOverflow answer is quite verbose and somewhat low-level. But that's really a problem with the javax.imageio.metadata.* API, it is quite verbose and low-level, to allow for any kind of image metadata.

Surely writing a single Exif "foobar" description value from scratch is the simplest Exif test I could make, but I can't even figure out how to do that.

I'm afraid this assumption is at best a little misleading, as writing Exif data given the javax.imageio.metadata.* API isn't simple... And the thing about Exif data in JPEG, is that it is actually a TIFF structure inside the JPEG stream, making this a multi-step process. Finally, the JPEG metadata class has no support for Exif.

But, really, everything you need except adding the specific ImageDescription tag to the entries is there in the StackOverflow answer. It's somewhat trivial:

entries.add(new TIFFEntry(TIFF.TAG_IMAGE_DESCRIPTION, "foobar"));

You can obtain the image metadata in many ways, most typically you would get it from the original image, through the ImageReader, using either getImageMetadata or the readAll method (returns an IIOImage with metadata and thumbnails). But as mentioned, you can also obtain a "blank" one from the ImageWriter, using the getDefaultImageMetadata. These methods are all plugin-agnostic, but the metadata instances they return are not (passing metadata read from one plugin to another plugin's writer will typically only preserve a small subset of information).

The full code will be something like:

// Get a "blank" JPEG metadata
ImageWriteParam param = writer.getDefaultWriteParam();
IIOMetadata metadata = writer.getDefaultImageMetadata(ImageTypeSpecifier.createFromRenderedImage(image), param); 

IIOMetadataNode root = new IIOMetadataNode("javax_imageio_jpeg_image_1.0");
IIOMetadataNode markerSequence = new IIOMetadataNode("markerSequence");
root.appendChild(markerSequence);
    
Collection<Entry> entries = new ArrayList<>();
entries.add(new TIFFEntry(TIFF.TAG_IMAGE_DESCRIPTION, "foobar"));

// Write the full Exif segment data
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
// APPn segments are prepended with a 0-terminated ASCII identifer
bytes.write("Exif".getBytes(StandardCharsets.US_ASCII));
bytes.write(new byte[2]); // Exif uses 0-termination + 0 pad for some reason
// Write the Exif data (note that Exif is a TIFF structure)
new TIFFWriter().write(entries, new MemoryCacheImageOutputStream(bytes));

// Wrap it all in a meta data node
IIOMetadataNode exif = new IIOMetadataNode("unknown");
exif.setAttribute("MarkerTag", String.valueOf(0xE1)); // APP1 or "225"
exif.setUserObject(bytes.toByteArray());

// Append Exif node 
markerSequence.appendChild(exif);

// Merge with original data 
metadata.mergeTree("javax_imageio_jpeg_image_1.0", root);

The code is otherwise the same as the SO example.

Feel free to suggest a method for adding Exif data, or contribute a PR with such a method. 😀

--
Harald K

@garretwilson
Copy link
Author

Good morning and thank you so much, @haraldk . I will put the Apache Commons Imaging approach on hold and see how far I can get with what you provided. I have a couple of clarification questions.

I see the code you gave uses TwelveMonkeys classes, but there is really no guarantee that the SPI is TwelveMonkeys. Will adding this metadata (e.g. TIFFEntry) still work if the SPI for the current image happens to be some other implementation?

Secondly I started out adding this new feature to my product thinking I could add both Exif and XMP metadata, but now just adding a little Exif metadata seems like a big goal. I would still want to add XMP or IPTC metadata down the road. Do you know how that would interact with the Exif approach outlined above? Could I still add XMP metadadata alongside this, or would I have to switch approaches completely (e.g. rewriting the entire file using Apache Common Imaging or something).

@garretwilson
Copy link
Author

garretwilson commented Feb 1, 2021

I have a slightly unrelated question if you don't mind, @haraldk . I see this interesting code:

ImageTypeSpecifier.createFromRenderedImage(image)

The code I have to scale an image (I wrote this code five or seven years ago) I was determining the image type for the scaled image like this:

Image scaledImage = oldImage.getScaledInstance(newWidth, newHeight, Image.SCALE_SMOOTH);
int oldImageType = oldImage.getType();
int newImageType = oldImageType != BufferedImage.TYPE_CUSTOM ? oldImageType //use the existing image type if it isn't custom
    : (oldImage.getTransparency() == Transparency.OPAQUE) ? BufferedImage.TYPE_INT_RGB : BufferedImage.TYPE_INT_ARGB; //otherwise use RGB unless ARGB is needed for transparency

Back then (as now) it is hard to find info on Java Image I/O. I think I pulled this logic from scaling code. Does the ImageTypeSpecifier.createFromRenderedImage(image) approach do everything this would do, and is it better?

Update: This int is a different type than ImageTypeSpecifier, and may not even be referring to the same thing. I'm a little distracted this morning with something else. I'll look at this calmly another day, but any insights into "image type" and ImageTypeSpecifier are welcome!

(Sorry for getting too off-topic. If you prefer I can transfer it to Stack Overflow.)

@haraldk
Copy link
Owner

haraldk commented Feb 1, 2021

Will adding this metadata (e.g. TIFFEntry) still work if the SPI for the current image happens to be some other implementation?

Yes, it should. There will be no TIFFEntry in the metadata. new TIFFWriter().write(...) will serialize the Exif to a byte array blob. This blob should completely opaque to the writer.

However, keep in mind that everything in the answer is metadata format specific, ie. will only work for plugins that supports the javax_imageio_jpeg_image_1.0 format.

I would still want to add XMP or IPTC metadata ... Do you know how that would interact with the Exif approach outlined above?

There's no interaction. It's just binary data attached.

Could I still add XMP metadadata alongside this?

Yes, you could use the same approach to add IPTC or XMP too. Just find the proper APPn segment, add the identifier and serialize the data to a byte array. But, be aware that the application (ie, your code) would need to keep the values in sync between the different meta data blobs (this is probably mostly a concern when adding XMP).

--
Harald K

@haraldk
Copy link
Owner

haraldk commented Feb 1, 2021

@garretwilson Yes, I think general questions that isn't library specific is better left for SO. 😉

ImageTypeSpecifier.createFromRenderedImage(image) is "better" in the sense that it can also tell different types of TYPE_CUSTOM apart. But there's no need to use it, just create a new BufferedImage using the ColorModel and a compatible WritableRaster from the original in your scaling code.

But then again, if using TYPE_INT_(A)RGB is good enough why change it?

--
Harald K

@garretwilson
Copy link
Author

But, be aware that the application (ie, your code) would need to keep the values in sync between the different meta data blobs (this is probably mostly a concern when adding XMP).

Keep in mind that I've stripped out all the metadata and I'm only writing controlled metadata from scratch, so synchronized metadata segments is not an issue in my case.

Thanks so much! I'll try all this out in a day or two.

@garretwilson
Copy link
Author

I just got the chance to try this out. I get an exception:

Caused by: javax.imageio.metadata.IIOInvalidTreeException: JPEGvariety and markerSequence nodes must be present
	at java.desktop/com.sun.imageio.plugins.jpeg.JPEGMetadata.mergeNativeTree(JPEGMetadata.java:1101)
	at java.desktop/com.sun.imageio.plugins.jpeg.JPEGMetadata.mergeTree(JPEGMetadata.java:1077)
	at io.guise.mummy.mummify.image.DefaultImageMummifier.processImage(DefaultImageMummifier.java:246)
	at io.guise.mummy.mummify.image.DefaultImageMummifier.mummifyFile(DefaultImageMummifier.java:130)
	... 20 common frames omitted

@garretwilson
Copy link
Author

garretwilson commented Feb 2, 2021

I've thought about my approach to image processing. Even though I want to completely rewrite the Exif/IPTC/XMP metadata, there may be color profile (I don't know how all that stuff is stored) or other metadata that I need to keep.

It looks like (as usual) that I'm going to have to get my hands dirty and dig deep down into the Java Image I/O library, the IIOMetadata structure, the JPEG/JFIF image format, the Exif format, etc. At the end of the day I guess I'll have to do low-level manipulation of this stuff: walk the IIOMetadata tree, overwrite Exif sections, discard IPTC/XMP sections (until I can figure out how to update them with the same values as I put in the Exif section), and keep all the other metadata sections. (I can't believe that in 10 or 20 years nobody has written something like this in Java.)

So it looks like I'm going to have to become an expert in all this. I have basically every Java media-processing book since the AWT/Swing days, but I don't know anything that covers IIOMetadata. I'll read the API docs, but any other books or articles you can recommend would be appreciated.

@haraldk
Copy link
Owner

haraldk commented Feb 2, 2021

You can hire me, if you like? 😉

I've learnt the API by reading the API docs/specifications and following the Java 2D tutorials, but mostly by using it, and developing my own plugins through more than 10 years... Never read a book I'm afraid... Maybe I should write one? 🤔

I sent you the link to the JPEG plugin's metadata specification yesterday. As you see the JPEGVariety node is mandatory in the DTD, along with its app0JFIF sub node (which may be empty). I think the code snippet I sent was originally written using the existing metadata as a starting point, so I missed it, but it's an easy fix.

Anyway, I hear you. Personally, I think the ImageIO metadata API is really sad, compared to the rest of the ImageIO API, which usually makes sense. The entire idea of using XML-ish document and nodes in the metadata is just... Mmm.. It creates extremely cluttered, verbose and inefficient code.

There's also the (missing) distinction between what I consider "real" metadata (like caption, artist, date of creation, etc.) and "essential" data disguised as metadata (like color profile, compression, pixel layout etc). It seems to me that you are mostly concerned about the former, but due to the API, you have to understand the latter. Which shouldn't really be necessary if the API was better (although the distinction is sometimes more blurry).

Creating something that works for all formats is hard, I guess. But I do think we can make something better in 2020 2021. 😀

--
Harald K

@garretwilson
Copy link
Author

You can hire me, if you like?

Actually that is a possibility. How can I get in touch with you?

@haraldk
Copy link
Owner

haraldk commented Feb 3, 2021

I can be reached at harald d kuhr a gmail d com.

I also started looking into a more reasonable metadata API, using immutable and builder pattern, but I'm realizing this is quite a task... 😀

--
Harald K

@garretwilson
Copy link
Author

garretwilson commented Feb 6, 2021

@haraldk thank your for the offline discussion. I had tentatively decided, at least in the short term, to use Apache Commons Imaging to add metadata to my image after first processing it in Java Image I/O. Although this is less efficient than adding metadata during the initial processing, at least that would get me up and running immediately while I consider improved options in the future.

However it appears that Apache Commons Imaging can't even write two simple Exif values to a JPEG without corrupting them! I've filed ticket IMAGING-281, seeing that the same corrupted value is returned by ExifTool, Metadata++, and metadata-extractor (see drewnoakes/metadata-extractor#528), I'm inclined to think that Apache Commons Imaging is broken (unless I'm making a mistake in my code—but then why would the other value show up fine, and why can both values be read in IrfanView?).

This unbelievable. Do you realize that this means that there does not exist in 2021 a way to write simple, non-corrupted Exif metadata to a JPEG file from scratch in Java? This blows my mind.

@haraldk
Copy link
Owner

haraldk commented Apr 25, 2021

Closing for now. Might add something to the Wiki later.

@haraldk haraldk closed this as completed Apr 25, 2021
@garretwilson
Copy link
Author

garretwilson commented Apr 25, 2021

However it appears that Apache Commons Imaging can't even write two simple Exif values to a JPEG without corrupting them!

I wanted to circle back and note that I was mistaken on this point. As I explained in IMAGING-281, the Exif property in question was a special Windows property that instead of using UTF-8 required UCS-2 to be used. That's why a single tag seemed to "corrupted" while the others were working.

So Apache Commons Imaging is able to write metadata from scratch to a JPEG, although it requires rewriting an existing image. For now this is working for me, although it would be preferable at some point to add the metadata dynamically while I am processing it using Java Image I/O.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants