Skip to content
This repository has been archived by the owner on Sep 8, 2021. It is now read-only.

Implementing Exif Parsing #1

Open
niclashoyer opened this issue Oct 18, 2015 · 8 comments
Open

Implementing Exif Parsing #1

niclashoyer opened this issue Oct 18, 2015 · 8 comments

Comments

@niclashoyer
Copy link

Hi,

I started to write some basic Exif parsing for JPEGs based on this description.

Basically Exif is encoded TIFF inside the APP1 payload. Unfortunately the format is quite messy. One problem that bothers me is, that for each Exif tag the value is stored at an offset if it is larger than 4 bytes. This offset is relative to the start of the TIFF header (APP1 starting point + fixed offset).

I see two strategies here:

  1. store all references to data and proceed until all tags are read. Then read all references (as they are usually stored directly after the IFD). This would involve sorting the references, because they can be in any order and we need to read them in order.
  2. read all references directly after reading the reference, this involves seeking, thus the LoadableMetadata::load method needs Seek. But that gives an error for load_from_buf as Seek is not implemented for &[u8].

I prefer the second method, because it would make parsing a lot easier and doesn't need any sorting on references, but I don't know if it is possible to fix load_from_buf to be compatible.

Any ideas?

@fisch42
Copy link

fisch42 commented Oct 18, 2015

Maybe you could buffer the hole Exif payload. Is that feasible? Could also waste some memory.

@netvl
Copy link
Owner

netvl commented Oct 19, 2015

I don't think that adding Seek to LoadableMetadata::load() is feasible because I intended for this library to be used in arbitrary contexts, for example, to read image metadata in a streaming mode from a socket. Seek requirement is too strong. That said, to overcome the error in load_from_buf() you can use std::io::Cursor which implements Seek.

So, I think that the first approach is better. Alternatively, as @fisch42 suggested, you can load the whole EXIF payload into memory. I think that his kind of buffering is okay, it is unlikely that EXIF metadata would take lots of memory. Then you can use direct offsets in a byte slice.

@tdryer
Copy link

tdryer commented Nov 17, 2015

I've implemented basic Exif support in my fork. The code probably isn't great (first thing I've written in Rust), but it might be a useful starting point.

My first approach for reading the tags was storing references, sorting them, and reading the data in order. I ended up dropping that and buffering the Exif data since using Seek simplified the code, and I was going to have to track how much data had been read in order to ensure it ends up at the end of the segment when finished parsing.

Another annoying thing about the Exif format is that it can be either big or little endian. To deal with that, I had to add my own read_u16 and read_u32 methods that take the byte order as an argument.

@niclashoyer
Copy link
Author

The implementation looks great so far! One thing I noticed while trying to implement it using buffering is, that Exif data really is just TIFF, so conceptually it would be the best if we had a TIFF metadata parser and use that to parse Exif. The biggest problem with TIFF is, that image data can be anywhere in the file (just like Exif values), so unlike Exif the TIFF data can get really large, and buffering is no efficient option there.

@netvl
Copy link
Owner

netvl commented Nov 17, 2015

Looks great, thanks! If you want, you can submit a pull request.

However, I agree with @niclashoyer in that I want to do things in general way if possible; this means that implementing a TIFF parser is the best option.

Well, it seems I should think of how to integrate the ability to seek inside the image data, while not requiring Seek implementation for those image formats which don't need it...

@niclashoyer
Copy link
Author

@netvl just a thought: if you design optional seeking keep in mind that it may be worth to implement both types (seeking / non-seeking) and let the user of the library decide, e.g. if one wants to use immeta to parse TIFF files sent via network buffering the whole file is really bad and a little more complex implementation is "ok". But if one wants to use immeta to parse TIFF files from harddisk a seeking implementation is the best option, as it is still very fast.

@netvl
Copy link
Owner

netvl commented Nov 17, 2015

Yes, that's something I was thinking of when I was writing that sentence, thanks!

@netvl
Copy link
Owner

netvl commented Jan 30, 2016

I've added a read_from_seek<R: BufRead + Seek>() to LoadableMetadata and changed read() to require BufRead instead of just Read. This should make EXIF parsing implementation easier.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants