DeltaVision: Improve accuracy of isThisType stream checks #2658

rleigh-codelibre · 2016-11-08T11:24:49Z

Check that it's not a TIFF
Check that the file size meets a minimum size

Testing: Check with the images from QA 17411 https://www.openmicroscopy.org/qa2/qa/feedback/17411/ which should now be detected as TIFF, not DeltaVision.

- Check that it's not a TIFF - Check that the file size meets a minimum size

melissalinkert · 2016-11-08T14:52:30Z

components/formats-gpl/src/loci/formats/in/DeltavisionReader.java

@@ -174,7 +180,7 @@ public boolean isThisType(RandomAccessInputStream stream) throws IOException {
    int x = stream.readInt();
    int y = stream.readInt();
    int count = stream.readInt();
-    return x > 0 && y > 0 && count > 0;
+    return x > 0 && y > 0 && count > 0 && (x * y * count > stream.length());


Shouldn't this be x * y * count < stream.length()? It would probably also be good to cast x, y, and count to long when multiplying, otherwise integer overflow could cause a false positive.

Oops, yes. Fixed.

melissalinkert · 2016-11-08T14:56:04Z

components/formats-gpl/src/loci/formats/in/DeltavisionReader.java

+    // false negatives.
+    if(new TiffParser(stream).isValidHeader()) {
+      return false;
+    }


I think this will be redundant if the below check is fixed. isValidHeader is checking the first four bytes; in the case of a valid header, these would be 0x49492a00 or 0x4d4d002a, which would be read as x below and should already fail the comparison with stream.length().

Unfortunately, this isn't the case. The length check still passes with TIFF data, including the test data from 17411, so this is needed to be sure it's not TIFF. The dv magic and length checks are simply too ambiguous on their own to guarantee a correct false return, unless there's something more definitive we can check for.

This check will at least cover most TIFF-based formats to prevent misbehaviour, though ideally it wouldn't be needed.

This stream-based DV check is pretty weak, it's a pity we can't make more use of the filename and fall back to stream-based identification only as a very last resort if filename or some other reader's byte or magic number pattern can't offer more confidence it is some other format.

It also occurs to me that in reverse, a TIFF header could actually be valid DV data on occasion, so while this content-based check is likely more robust all around with this change (in terms of overall failure rates, by increasing the probability of a correct identification), but still far from perfect.

I'm still not 100% convinced that this can't be replaced by a further check on x * y * count later on, which is negative in this case due to long overflow. Perhaps wrapping each variable in a BigInteger instead of casting to long before multiplying would help and allow this to be removed?

simleo · 2016-11-09T15:16:04Z

Note that, without this PR, the behaviour is different with the two files provided. _t20270.tif fails to convert only if the -stitch option is used (the file is first detected as a TIFF, then as a DV after it goes through FileStitcher), while _t20271.tif always fails (immediately detected as DV).

Anyway, with this PR both files are correctly identified and converted, with or without -stitch.

mtbc · 2016-11-10T10:48:04Z

This looks good to me as at least being an improvement. If I were to criticize the TIFF check it would be more that it reflects the particular sample data we have that triggered this issue rather than our more properly putting the more confident checks from other non-TIFF formats too ahead of the weak DV check here. Real shame we can't make more use of filename too though in this circumstance, for whatever reason.

sbesson · 2017-06-06T21:31:42Z

@rleigh-codelibre is it worth trying to resolve the changes discussed above for 5.5.2 or should we create a card to fix this in a later release?

rleigh-codelibre · 2017-06-06T22:03:10Z

I'll double-check tomorrow.

rleigh-codelibre · 2017-06-07T14:49:04Z

Refreshing my memory of the discussion, I'm unsure that the suggsted size check addition is sufficiently robust-a valid TIFF could still end up being identified as DV. It all comes down to chance, which makes me uneasy about it.

sbesson · 2017-06-08T20:47:12Z

@rleigh-codelibre can you open a configuration PR for adding QA 17411 to the data repository tests?

sbesson · 2017-06-26T08:12:05Z

See https://ci.openmicroscopy.org/view/Failing/job/BIOFORMATS-DEV-merge-full-repository/573/. Merging as an improvement to a long-running detection issue.

DeltaVision: Improve accuracy of isThisType stream checks

3730b79

- Check that it's not a TIFF - Check that the file size meets a minimum size

rleigh-codelibre mentioned this pull request Nov 8, 2016

TIFF incorrectly identified as DeltaVision #2656

Closed

melissalinkert requested changes Nov 8, 2016

View reviewed changes

DeltaVision: Correct is this type conditional

c9decfe

dgault mentioned this pull request Jun 13, 2017

Add version history for 5.5.2 #2870

Merged

sbesson modified the milestone: 5.5.3 Jun 14, 2017

sbesson merged commit eb46bad into ome:develop Jun 26, 2017

rleigh-codelibre deleted the dv-type-check branch June 26, 2017 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeltaVision: Improve accuracy of isThisType stream checks #2658

DeltaVision: Improve accuracy of isThisType stream checks #2658

rleigh-codelibre commented Nov 8, 2016

melissalinkert Nov 8, 2016

rleigh-codelibre Nov 8, 2016

melissalinkert Nov 8, 2016

rleigh-codelibre Nov 8, 2016

mtbc Nov 10, 2016

rleigh-codelibre Nov 10, 2016

melissalinkert Nov 14, 2016

simleo commented Nov 9, 2016

mtbc commented Nov 10, 2016

sbesson commented Jun 6, 2017

rleigh-codelibre commented Jun 6, 2017

rleigh-codelibre commented Jun 7, 2017

sbesson commented Jun 8, 2017

sbesson commented Jun 26, 2017

DeltaVision: Improve accuracy of isThisType stream checks #2658

DeltaVision: Improve accuracy of isThisType stream checks #2658

Conversation

rleigh-codelibre commented Nov 8, 2016

melissalinkert Nov 8, 2016

Choose a reason for hiding this comment

rleigh-codelibre Nov 8, 2016

Choose a reason for hiding this comment

melissalinkert Nov 8, 2016

Choose a reason for hiding this comment

rleigh-codelibre Nov 8, 2016

Choose a reason for hiding this comment

mtbc Nov 10, 2016

Choose a reason for hiding this comment

rleigh-codelibre Nov 10, 2016

Choose a reason for hiding this comment

melissalinkert Nov 14, 2016

Choose a reason for hiding this comment

simleo commented Nov 9, 2016

mtbc commented Nov 10, 2016

sbesson commented Jun 6, 2017

rleigh-codelibre commented Jun 6, 2017

rleigh-codelibre commented Jun 7, 2017

sbesson commented Jun 8, 2017

sbesson commented Jun 26, 2017