Skip to content
This repository has been archived by the owner on Jul 22, 2018. It is now read-only.

Implement a safer TiffEntry/CiffEntry way of accessing data #142

Open
pedrocr opened this issue Feb 21, 2016 · 7 comments
Open

Implement a safer TiffEntry/CiffEntry way of accessing data #142

pedrocr opened this issue Feb 21, 2016 · 7 comments

Comments

@pedrocr
Copy link
Contributor

pedrocr commented Feb 21, 2016

After the work in PR #141 I've come up with a sketch for a further API change to achieve two things at once:

  1. Make the TiffEntry/CiffEntry APIs safer against out of bounds access
  2. Finally remove the TiffIFD/TiffIFDBE split, as discussed in PR Rename 'BE' to "Swap", #27

So here's the basic sketch:

  • Remove getIntArray()/getShortArray() functions and instead make getInt()/getShort()/getData() take an optional integer argument that's the array offset (and in the case of getData() a count as well)
  • Make TiffEntry/CiffEntry no longer have a pointer to data but instead just keep around the reference to the underlying FileMap
  • Move the get4BE() and friends macros into the FileMap API
  • Have TiffEntry/CiffEntry know the host and file endianness (pushing into FileMap is also possible but sometimes there are BE bits inside a LE file and vice-versa)
  • Now when doing getInt()/getShort()/getData() just call into FileMap with getNBE()/getNLE()/getData() insuring proper bounds checks. It is also endian clean so there's no need to keep around the swapped versions of TiffIFD/TiffEntry. It also avoids the memory copying in TiffEntryBE.

Since the TiffEntry API is used for metadata only the performance implications should be minimal if any. How does the plan sound?

@pedrocr
Copy link
Contributor Author

pedrocr commented Mar 25, 2016

@klauspost any opinion on this? getShortArray() and getIntArray() make bounds checking non-obvious so I'd rather just move everything into getInt() and getShort() as described. Any objection to that?

@axxel
Copy link

axxel commented Nov 30, 2016

@pedrocr : looking at the memcpy vs aligned int cast issue (see #164) I came from one thing to the next and ended up thinking about how to improve the code duplication situation around the 5 more-or-less equivalent bitpump implementations and the two ByteStream(Swap) variants, which is related to your ideas about removing TiffEntryBE. Are you still pursuing this idea? And if so, got this stuck just because @klauspost did not give any feedback to your suggestions?

@LebedevRI
Copy link
Contributor

As far i'm concerned this got implemented and merged a while ago.

@axxel
Copy link

axxel commented Nov 30, 2016

I was referring to the second point and the 'sketch', especially the FileMap API changes regarding byte swapping.

@pedrocr
Copy link
Contributor Author

pedrocr commented Nov 30, 2016

@axxel I haven't worked on this in a while but it's still a simplification opportunity in the code. Of the sketch I describe above only the first point is implemented. I've been toying with a rust implementation for a raw loading library and have implemented it like that:

https://github.com/pedrocr/rawloader/blob/master/src/decoders/tiff.rs

There is no TiffEntry/TiffEntryBE split, there's only a single TiffEntry that gets instantiated with an endianness and then an Endian implementation that abstracts away the swapping of bytes:

https://github.com/pedrocr/rawloader/blob/master/src/decoders/basics.rs

Works quite well. As for the BitPumps I've so far only implemented two and did them with code duplication so far but the rawspeed way of just having a base class and then implementing filling sounds good to me. BitPumps are very sensitive performance wise so optimizing for less code isn't ideal. I did that initially for the MSB/LSB pumps and the performance penalty was significant.

@axxel
Copy link

axxel commented Nov 30, 2016

@pedrocr Are you intending to pursue those sketched out plans further?

As to the potential performance implications: I am well aware of them and would in no way sacrifice performance (my motivation looking into this in the first place, was my worry about a performance drop caused by the memcpy() fix for the alignment issues on some platforms). And I see further performance improvement opportunities in the handling of the byteswapping. Therefore my asking about your plans in this regard.

@pedrocr
Copy link
Contributor Author

pedrocr commented Nov 30, 2016

@axxel not really, if you're interested go ahead

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants