-
Notifications
You must be signed in to change notification settings - Fork 120
Identify raw images #3632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify raw images #3632
Conversation
Test file is first 1kB of a raw file
Digital Negative Image (raw image format)
|
Tested and it works great! Probably should be revisited (to remove magic identification) after this is resolved upstream: drewnoakes/metadata-extractor#75. It correctly prevents uploading of Regarding future, better, support for DNGs, we could extract the preview JPEG and upload that after checking it’s the same dimensions as the RAW file. This is better than trying to develop RAW files ourselves, because they often come with proprietary development instructions, so using JPEG preview should get us closer to artists’ intent. But even that wouldn’t be 100% solution as previews may be out of date and even native resolution preview generated by Adobe apps is a highly compressed JPEG :-(. |
andrew-nowak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on test 👍
| <mime-info> | ||
| <mime-type type="image/x-raw-canon"> | ||
| <_comment>Canon raw image</_comment> | ||
| <magic priority="50"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This has the same priority as tiff files which I think should also match this magic. Should we increase this priority to make sure that whatever tika uses to tiebreak doesn't accidentally switch back to identifying these as tiffs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. I think custom-mimetypes have higher priority already, but no harm in increasing this.
| private def isDng(file: File) = { | ||
| val metadata = TiffMetadataReader.readMetadata(file) | ||
| val directory = metadata.getFirstDirectoryOfType(classOf[ExifIFD0Directory]) | ||
| directory.containsTag(0xC612) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think technically this could NPE (if somehow there wasn't a 0th ifd on the tiff), other places in the code we've checked that it's not null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent spot, thank you.
Great, thanks for testing it.
Happy to look at this when I get a chance, just send me some example images.
I'm reluctant to attempt to extract JPEG info from a DNG since I don't think this is a workflow we want to encourage. I'd rather we displayed a sensible error message. |
readMetadata can throw an exception, and getFirstDirectoryOfType can return null
This needs to be treated as higher priority than Tiff. Tika seems to already treat it has higher priority than Tiff, but this gives us confidence that it will continue to in future.
|
Seen on auth, usage, cropper, collections, media-api, kahuna, image-loader, metadata-editor, thrall, leases (merged by @davidfurey 8 minutes and 33 seconds ago) Please check your changes! |
|
Seen on image-loader-projection (merged by @davidfurey 8 minutes and 38 seconds ago) Please check your changes! |
|
Thanks!
Just download any from the folders I linked to. eg. this and that.
Yeah. Thing is, the current state of industry prevents us from always doing what’s best: proprietary Adobe, Sony or any other metadata development commands cannot be interpreted by open source software, because the renderers are proprietary too (not even commercial software can read competitors’ data). The error message could read something like: One alternative would be to sniff for acceptable previews (that have developments baked-in). Another would be whatever dcraw does: probably just renders raws using vanilla settings, although it may take some metadata into account, I don’t know (I would expect at least white balance to be interpreted correctly). Latter alternative would render raws/dngs different to how they were saved from raw developing software if they indeed were (although we could exclude those with proprietary-specific metadata?). In any case, this seems involved. Preview ingest less so than trying to be clever about raw development. Users should upload developed files saved as highest quality JPEGs… |
|
For future reference, I think the Canon raw magic came from http://lclevy.free.fr/cr2/#key_info / https://github.com/lclevy/libcraw2/blob/master/docs/cr2_poster.pdf But I might have got it from somewhere else and then cross-referenced with this resource |
What does this change?
This improves the mimetype detection so that DNG and CR2 files are correctly identified. Previously these were classified as Tiffs, and then later failed to import, with users seeing an unfriendly error message and developers getting a PagerDuty alert.
Before:
After
How can success be measured?
Fewer PagerDuty alerts
Tested? Documented?