-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No n_frames, or bad value for n_frames #1631
Comments
I'd expect .jpg files to always return 1 as the n_frames attribute; some from some cameras have an embedded thumbnail in the metadata, but I don't think that is particularly standard? Same environment as for #1630: Windows 10, Python 3.5.0, Pillow 3.0.0. So I've added a hasattr workaround to avoid the Attribute error, but when the attribute exists, but is wrong? Harder to deal with... |
Hi. I think the simple answer to your question is that 20151110-105826gl.jpg is not actually a JPEG file. It's an MPO, and Pillow reads it as such, giving it an JPEGs do not have multiple frames, and so currently don't have the If you think that a change should be made to Pillow to clarify this somehow, feel free to say. |
With regards to from PIL import Image
im = Image.open('test.jpg')
im.seek(1)
im.load() So it has nothing to do with n_frames. Is there any reason you think that the file isn't truncated as the error suggests? |
Re: no n_frames for JPEG... I thought I saw some comments here and a pull request for adding n_frames to file types that didn't have them, for uniformity. Maybe that isn't released yet. Re: MPO support. I had never heard of MPO until you mentioned them. I found a little info on Wikipedia, and my camera is listed as one that uses them. The picture is one I took with my camera. All consistent so far. I copied the file from the camera. I didn't truncate it. I don't know why the camera would. I haven't found good documentation for MPO yet, that explains what EXIF fields distinguish them from JPG. None of my other JPG software has complained about them not being JPG. This error may have nothing to do with n_frames, except that one shouldn't seek(1) if there are not n_frames, and n_frames produces 2. Thanks again for your analysis. I have some things to learn now... but there may be a bug, or at least a necessary workaround, for MPO files named JPG that come off standard consumer cameras. |
OK, I pulled a picture directly from the camera card. I found a version of the MPO spec at https://web.archive.org/web/20190227081740/http://www.cipa.jp/std/documents/e/DC-007_E.pdf and manually decoded my first MPO headers using a HEX editor, and as far as I can tell, it says 2 pictures. But also, as far as I (or PILLOW) can tell, there is only one picture. So it would seem that 1. My camera lies OR 2. there is a newer version of the spec that says that cameras are allowed to lie. It would seem that because my camera lies, that maybe other cameras lie also. So it would seem that maybe, in addition to checking the MPO image count field, that the 5.2.3.3.3 Individual Image Data Offset fields should be checked against EOF to determine the validity of the count. Sad, when cameras lie. |
Although, it seems (from Page 56 of the above standard) that files named .JPG are supposed to be treated as .JPG, and multi-image files are supposed to be named as .MPO, so perhaps even if a .JPG contains APP2 extension headers (MPO headers), that it should still be treated as a single image file format! |
Here's my work-around code, if others bump in to this... # some formats don't get n_frames
# some .jpg (having APP2/MPO headers) get n_frames != 1 !! (typically 2)
if hasattr( im, 'n_frames'):
imgcnt = im.n_frames
# if named .jpg, only one image, really. Multi-image named .mpo
if fnext.lower() == '.jpg':
imgcnt = 1 |
Oh, fnext -- File Name EXTension -- a variable containing that, for checking the name. And im is the Image that corresponds. |
From my reading of specification, extended multipicture files (multiple primary images) should have the extension MPO, and baseline multipicture files (single primary image) should have the extension JPG. However, I believe that while multipicture files with the extension JPEG should only contain one primary image, they may still have 'MPO' features, and so should be recognised as MPOs. I don't believe that Pillow supports any of these additional features at the moment, but I think it's still better to recognise that it's not an ordinary JPG. So the only way to stick further to the specification that I can see would be to drop any subsequent images, which seems unhelpful. |
Yes, I agree with your analysis of the specification, that is what I found too. I really can't say why the JPG from my camera claims to have a second image. There is no second image... but the MPO data say there is. There is no subsequent image to drop, so it would be quite impossible for Pillow to drop any subsequent images! However, there might be a compromise solution that would work better inside Pillow than my workaround hack above: checking the MPO data pointer for file size (already being done, maybe) and if the pointer is out of bounds and the file name is JPG, then adjusting n_frames. This is more work, indeed, to calculate a lesser value for n_frames based on other data, rather than using the n_frames value that the MPO headers supply, but we have here a situation where the n_frames data is clearly wrong, from a camera that is "in the wild" busily creating files that have this inconsistency. Mine probably isn't the only model camera from that vendor that is doing that sort of thing. This solution would allow future image files with JPG extension and multiple actual images to still work with Pillow, and allow access to all the images, yet would not require all Pillow programmers to be aware of the need to trap the "File is truncated" error as possibly spurious in the presence of JPG with n_frames > 1. Nor would they have to use my workaround above. |
There is a property in ImageFile, from PIL import Image, ImageFile
im = Image.open('test.jpg')
ImageFile.LOAD_TRUNCATED_IMAGES = True
im.seek(1)
im.show() This means that to implement your suggestion, the value of |
I was unaware of LOAD_TRUNCATED_IMAGES. But in this case, there is no (second) image, not a truncated one. Meaning that the "start" offset is already past EOF, rather than there being some bytes available to read. |
Is this a bug then or not an issue? |
Once radar educated me to the existence of MPO files, I would say the bug is really in the JPG file... but such JPG files are being created every click by lots of cameras. So that leaves Pillow a bunch of non-conformant JPG files to have thrown at it. The situation is detectable in two ways: the file extension, and the calculation of the second frame starting past EOF. So if it were to detect the condition, it could handle them like JPG, which would be helpful. |
I've noticed in a similar image (available below) that the MPO issue causes a segfault when using The image I used was the one referenced there (scikit-image/scikit-image#2445): However, upon further digging, I found that while As far as I can tell this means that PIL just hasn't yet integrated the full MPO spec correctly, right? Code to reproduce - the last line causes a segfault. from PIL import Image, ImageFile
im = Image.open("./bad_image.jpg")
im.load()
print(im)
print("Size 1:", im.size)
print(im.n_frames)
print("In header thing:", im.mpinfo[0xB001])
offsets = [mp["DataOffset"] for mp in im.mpinfo[0xB002]]
print("Offsets:", offsets)
print(im._MpoImageFile__mpoffsets)
print("Reading starting from second offset...")
with open("./bad_image.jpg", "rb") as fin:
#print(len(fin.read()))
fin.seek(im._MpoImageFile__mpoffsets[1])
#dat = fin.read()
im2 = Image.open(fin)
print("Size 2:", im2.size)
im2.load()
print("Load successful!")
print("Trying to load after seek(1)...")
im = Image.open("./bad_image.jpg")
im.load()
print(im.offset)
im.seek(1)
print("Seeked to 1, offset is:")
print(im.offset)
im.load() |
Looking at this further, MPO files contain multiple images. The images within the current Pillow test suite are 'Multi-Frame Image'. The two images causing a problem here have a 'Large Thumbnail' after the first image. Are either of the images in this issue able to added as test images, under Pillow's license? |
@abuccts is the OP who posted the image in the issue (scikit-image/scikit-image#2445) I referenced, they presumably will know about the usage rights for that image. |
I remember I crawled that image from Google Image Search two years ago but I cannot find the original link now. I think the usage rights is Fair Use. |
I may have found it: Page is : https://commons.wikimedia.org/wiki/File:EPP_summit_-_Vienna,_20._June_2013_(9093655476).jpg and the license is creative commons. |
I can confirm that PR #3588 fixes the issue I was having! 👍 |
When I feed the test file flower2.jpg into this code (from #1630)
I get "AttributeError: n_frames"
But other .jpg files do not get that error... this one: http://nevcal.com/temporary/20151110-105826gl.jpg has no problem with the attribute error on that line, but it gets a value of 2, apparently handles the seek OK, but dies in the second call to getcolors, with "OSError: image file is truncated (0 bytes not processed)".
The text was updated successfully, but these errors were encountered: