Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeltaVision: Improve accuracy of isThisType stream checks #2658

Merged
merged 2 commits into from
Jun 26, 2017
Merged

DeltaVision: Improve accuracy of isThisType stream checks #2658

merged 2 commits into from
Jun 26, 2017

Conversation

rleigh-codelibre
Copy link
Contributor

  • Check that it's not a TIFF
  • Check that the file size meets a minimum size

Testing: Check with the images from QA 17411 https://www.openmicroscopy.org/qa2/qa/feedback/17411/ which should now be detected as TIFF, not DeltaVision.

- Check that it's not a TIFF
- Check that the file size meets a minimum size
@@ -174,7 +180,7 @@ public boolean isThisType(RandomAccessInputStream stream) throws IOException {
int x = stream.readInt();
int y = stream.readInt();
int count = stream.readInt();
return x > 0 && y > 0 && count > 0;
return x > 0 && y > 0 && count > 0 && (x * y * count > stream.length());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be x * y * count < stream.length()? It would probably also be good to cast x, y, and count to long when multiplying, otherwise integer overflow could cause a false positive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes. Fixed.

// false negatives.
if(new TiffParser(stream).isValidHeader()) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be redundant if the below check is fixed. isValidHeader is checking the first four bytes; in the case of a valid header, these would be 0x49492a00 or 0x4d4d002a, which would be read as x below and should already fail the comparison with stream.length().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this isn't the case. The length check still passes with TIFF data, including the test data from 17411, so this is needed to be sure it's not TIFF. The dv magic and length checks are simply too ambiguous on their own to guarantee a correct false return, unless there's something more definitive we can check for.

This check will at least cover most TIFF-based formats to prevent misbehaviour, though ideally it wouldn't be needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stream-based DV check is pretty weak, it's a pity we can't make more use of the filename and fall back to stream-based identification only as a very last resort if filename or some other reader's byte or magic number pattern can't offer more confidence it is some other format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also occurs to me that in reverse, a TIFF header could actually be valid DV data on occasion, so while this content-based check is likely more robust all around with this change (in terms of overall failure rates, by increasing the probability of a correct identification), but still far from perfect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not 100% convinced that this can't be replaced by a further check on x * y * count later on, which is negative in this case due to long overflow. Perhaps wrapping each variable in a BigInteger instead of casting to long before multiplying would help and allow this to be removed?

@simleo
Copy link
Member

simleo commented Nov 9, 2016

Note that, without this PR, the behaviour is different with the two files provided. _t20270.tif fails to convert only if the -stitch option is used (the file is first detected as a TIFF, then as a DV after it goes through FileStitcher), while _t20271.tif always fails (immediately detected as DV).

Anyway, with this PR both files are correctly identified and converted, with or without -stitch.

@mtbc
Copy link
Member

mtbc commented Nov 10, 2016

This looks good to me as at least being an improvement. If I were to criticize the TIFF check it would be more that it reflects the particular sample data we have that triggered this issue rather than our more properly putting the more confident checks from other non-TIFF formats too ahead of the weak DV check here. Real shame we can't make more use of filename too though in this circumstance, for whatever reason.

@sbesson
Copy link
Member

sbesson commented Jun 6, 2017

@rleigh-codelibre is it worth trying to resolve the changes discussed above for 5.5.2 or should we create a card to fix this in a later release?

@rleigh-codelibre
Copy link
Contributor Author

I'll double-check tomorrow.

@rleigh-codelibre
Copy link
Contributor Author

Refreshing my memory of the discussion, I'm unsure that the suggsted size check addition is sufficiently robust-a valid TIFF could still end up being identified as DV. It all comes down to chance, which makes me uneasy about it.

@sbesson
Copy link
Member

sbesson commented Jun 8, 2017

@rleigh-codelibre can you open a configuration PR for adding QA 17411 to the data repository tests?

@sbesson sbesson modified the milestone: 5.5.3 Jun 14, 2017
@sbesson
Copy link
Member

sbesson commented Jun 26, 2017

See https://ci.openmicroscopy.org/view/Failing/job/BIOFORMATS-DEV-merge-full-repository/573/. Merging as an improvement to a long-running detection issue.

@sbesson sbesson merged commit eb46bad into ome:develop Jun 26, 2017
@rleigh-codelibre rleigh-codelibre deleted the dv-type-check branch June 26, 2017 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants