-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
imghdr doesn't recognize variant jpeg formats #60716
Comments
imghdr doesn't support jpegs that include an ICC Profile. |
Can you provide a patch? |
I can try, yes. I'll add one ASAP |
Here it is... It is against the latest hg version, should I write one for 2.7 too? |
Thanks for the patch.
Not necessary, 2.7 only gets bugs fixes. |
It looks like the test just walks a directory recursively while trying to identify its files, there's no "classic" test of the "this is a JPEG, is it detected correctly"-type |
The attached patch is insufficient, for example, it fails on http://nationalpostnews.files.wordpress.com/2013/03/budget.jpeg?w=300&h=1571 Note that the linux file utility identifies a files as "JPEG Image data" if the first two bytes of the file are \xff\xd8. A slightly stricter test that catches more jpeg files: def test_jpeg(h, f):
if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and b'JFIF' in h[:32]):
return 'jpeg' |
I vote we forget about JFIF/Exif headers and only use \xff\xd8 to identify the file. They are optional and there are tons of files out in the wild without such headers, for example: https://coverartarchive.org/release/5044b557-a9ed-4a74-b763-e20580ced85d/3354872309.jpg Proposed patch at https://bitbucket.org/intgr/cpython/commits/012cde305316e22a999d674a0a009200d3e76fdb |
Using \xff\xd8 sounds good to me. |
FYI, the test I currently use in calibre, which has not failed so far for millions of users: def test_jpeg(h, f):
if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and (b'JFIF' in h[:32] or b'8BIM' in h[:32])):
return 'jpeg' |
I'm okay with just testing the first two bytes, it's the method we currently use for our But maybe it can be interesting, to add another test, in order to detect incomplete file We use this patch of imghdr : -------------------------------------- def test_jpeg(h, f):
"""JPEG data in JFIF or Exif format"""
if not h.startswith(b'\xff\xd8'):#Test empty files, and incorrect start of file
return None
else:
if f:#if we test a file, test end of jpeg
f.seek(-2,2)
if f.read(2).endswith(b'\xff\xd9'):
return 'jpeg'
else:#if we just test the header, consider this is a valid jpeg and not test end of file
return 'jpeg' |
You cannot assume the file like object passed to imghdr is seekable. And IMO it is not the job of imghdr to check file validity, especially since it does not do that for all formats. |
@jcea, @bitdancer, @intgr, @PCManticore The imghdr module is now deprecated following the acceptance of PEP 594 by the Steering Council. Bugfixes and improvements to the module are therefore now considered to be very low priority. Given the large backlog of issues in the CPython repo, and the fact that this module has no active maintainer in the core dev team, I am therefore closing this issue as per the policy laid out in the dev guide. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: