Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imghdr doesn't recognize variant jpeg formats #60716

Closed
joril mannequin opened this issue Nov 20, 2012 · 14 comments
Closed

imghdr doesn't recognize variant jpeg formats #60716

joril mannequin opened this issue Nov 20, 2012 · 14 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@joril
Copy link
Mannequin

joril mannequin commented Nov 20, 2012

BPO 16512
Nosy @jcea, @vstinner, @ezio-melotti, @bitdancer, @intgr, @PCManticore
PRs
  • bpo-16512: one jpeg image added with different profile, code modified to detect all types of jpg #8322
  • [bpo-16512](https://bugs.python.org/issue16512): Improve jpeg detection in imghdr #14862
  • Files
  • peanuts15.jpg: JPEG including an ICC profile
  • imghdr_icc_jpeg.patch: patch against hg head
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2012-11-20.10:21:20.051>
    labels = ['type-feature', 'library']
    title = "imghdr doesn't recognize variant jpeg formats"
    updated_at = <Date 2019-07-19.14:24:34.361>
    user = 'https://bugs.python.org/joril'

    bugs.python.org fields:

    activity = <Date 2019-07-19.14:24:34.361>
    actor = 'pchopin'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2012-11-20.10:21:20.051>
    creator = 'joril'
    dependencies = []
    files = ['28046', '28054']
    hgrepos = ['210']
    issue_num = 16512
    keywords = ['patch']
    message_count = 13.0
    messages = ['175984', '175985', '175986', '176009', '176010', '176045', '184742', '198034', '220345', '220346', '220409', '221185', '221214']
    nosy_count = 9.0
    nosy_names = ['jcea', 'vstinner', 'ezio.melotti', 'r.david.murray', 'intgr', 'Claudiu.Popa', 'kovid', 'joril', 'mvignali']
    pr_nums = ['8322', '14862']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16512'
    versions = ['Python 3.5']

    @joril
    Copy link
    Mannequin Author

    joril mannequin commented Nov 20, 2012

    imghdr doesn't support jpegs that include an ICC Profile.
    This is because imghdr looks for "JFIF" somewhere at the beginning of the file, but the ICC_PROFILE shifts that further.
    (The ICC spec is here http://www.color.org/specification/ICC1v43_2010-12.pdf, annex B)

    @joril joril mannequin added the type-feature A feature request or enhancement label Nov 20, 2012
    @ezio-melotti
    Copy link
    Member

    Can you provide a patch?

    @joril
    Copy link
    Mannequin Author

    joril mannequin commented Nov 20, 2012

    I can try, yes. I'll add one ASAP

    @joril
    Copy link
    Mannequin Author

    joril mannequin commented Nov 20, 2012

    Here it is... It is against the latest hg version, should I write one for 2.7 too?

    @ezio-melotti
    Copy link
    Member

    Thanks for the patch.

    should I write one for 2.7 too?

    Not necessary, 2.7 only gets bugs fixes.
    OTOH it would be nice to have some tests for this new features (and for the module in general), but there doesn't seem to be any Lib/test/test_imghdr.py file. The module itself seems to contain some kind of tests at the end though.

    @ezio-melotti ezio-melotti added the stdlib Python modules in the Lib dir label Nov 20, 2012
    @joril
    Copy link
    Mannequin Author

    joril mannequin commented Nov 21, 2012

    It looks like the test just walks a directory recursively while trying to identify its files, there's no "classic" test of the "this is a JPEG, is it detected correctly"-type

    @kovid
    Copy link
    Mannequin

    kovid mannequin commented Mar 20, 2013

    The attached patch is insufficient, for example, it fails on http://nationalpostnews.files.wordpress.com/2013/03/budget.jpeg?w=300&h=1571

    Note that the linux file utility identifies a files as "JPEG Image data" if the first two bytes of the file are \xff\xd8.

    A slightly stricter test that catches more jpeg files:

    def test_jpeg(h, f):
        if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and b'JFIF' in h[:32]):
            return 'jpeg'

    @intgr
    Copy link
    Mannequin

    intgr mannequin commented Sep 18, 2013

    I vote we forget about JFIF/Exif headers and only use \xff\xd8 to identify the file. They are optional and there are tons of files out in the wild without such headers, for example: https://coverartarchive.org/release/5044b557-a9ed-4a74-b763-e20580ced85d/3354872309.jpg

    Proposed patch at https://bitbucket.org/intgr/cpython/commits/012cde305316e22a999d674a0a009200d3e76fdb

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Jun 12, 2014

    Using \xff\xd8 sounds good to me.

    @kovid
    Copy link
    Mannequin

    kovid mannequin commented Jun 12, 2014

    FYI, the test I currently use in calibre, which has not failed so far for millions of users:

    def test_jpeg(h, f):    
        if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and (b'JFIF' in h[:32] or b'8BIM' in h[:32])):
            return 'jpeg'

    @bitdancer
    Copy link
    Member

    bpo-21230 reports a parallel problem with recognizing photoshop images.

    We need a patch with tests covering the variant types we know about. I don't have a strong opinion on the simple two byte test versus the more complex test in msg220346, but following 'file' makes sense to me.

    @bitdancer bitdancer changed the title imghdr doesn't support jpegs with an ICC profile imghdr doesn't recognize variant jpeg formats Jun 13, 2014
    @mvignali
    Copy link
    Mannequin

    mvignali mannequin commented Jun 21, 2014

    I'm okay with just testing the first two bytes, it's the method we currently use for our
    internal tools.

    But maybe it can be interesting, to add another test, in order to detect incomplete file
    (created when a camera make a recording error for example, and very useful to detect, because an incomplete jpeg file, make a crash for a lot of application)

    We use this patch of imghdr :

    --------------------------------------

    def test_jpeg(h, f):
        """JPEG data in JFIF or Exif format"""
        if not h.startswith(b'\xff\xd8'):#Test empty files, and incorrect start of file
            return None
        else:
            if f:#if we test a file, test end of jpeg
                f.seek(-2,2)
                if f.read(2).endswith(b'\xff\xd9'):
                    return 'jpeg'
            else:#if we just test the header, consider this is a valid jpeg and not test end of file
                return 'jpeg'

    @kovid
    Copy link
    Mannequin

    kovid mannequin commented Jun 22, 2014

    You cannot assume the file like object passed to imghdr is seekable. And IMO it is not the job of imghdr to check file validity, especially since it does not do that for all formats.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @AlexWaygood
    Copy link
    Member

    @jcea, @bitdancer, @intgr, @PCManticore

    The imghdr module is now deprecated following the acceptance of PEP 594 by the Steering Council. Bugfixes and improvements to the module are therefore now considered to be very low priority. Given the large backlog of issues in the CPython repo, and the fact that this module has no active maintainer in the core dev team, I am therefore closing this issue as per the policy laid out in the dev guide.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants