New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image/gif: gif decoding fails `with too much image data` #16146

Closed
odeke-em opened this Issue Jun 22, 2016 · 15 comments

Comments

Projects
None yet
@odeke-em
Copy link
Member

odeke-em commented Jun 22, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    go version devel +1f44643 Wed Jun 22 00:12:55 2016 +0000 darwin/amd64
  2. What operating system and processor architecture are you using (go env)?
    GOARCH="amd64"
    GOBIN="/Users/emmanuelodeke/go/bin"
    GOEXE=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GOOS="darwin"
    GOPATH="/Users/emmanuelodeke/go"
    GORACE=""
    GOROOT="/Users/emmanuelodeke/go/src/go.googlesource.com/go"
    GOTOOLDIR="/Users/emmanuelodeke/go/src/go.googlesource.com/go/pkg/tool/darwin_amd64"
    CC="clang"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/kn/s30g1srd4h50bh6sd322tppm0000gn/T/go-build038095302=/tmp/go-build -gno-record-gcc-switches -fno-common"
    CXX="clang++"
    CGO_ENABLED="1"
  3. What did you do?
    Run program at https://play.golang.org/p/NqqvAkiIao with source of http://ualberta.ca/~odeke/tx.gif
$ go run NqqvAkiIao.go --source http://ualberta.ca/~odeke/tx.gif

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

  1. What did you expect to see?
    Successful decoding of a gif with no errors
  2. What did you see instead?
    gif: too much image data

/cc @montanaflynn who reported this issue first.

Code inlined

/*
  After running this gif
      https://67.media.tumblr.com//b02659b27081314a412215f4eb31dacf/tumblr_nu2x4whJgy1udf6d3o1_400.gif
  through ffmpeg, Go fails to decode the gif with error:
      `gif: too much image data`

  Command:
  $ ffmpeg -i https://67.media.tumblr.com//b02659b27081314a412215f4eb31dacf/tumblr_nu2x4whJgy1udf6d3o1_400.gif outf.gif
  $ go run main.go --source outf.gif
  - Repro URL
  http://ualberta.ca/~odeke/tx.gif

  Interestingly:
  - The outfile is 1.0MB.
  - The original file is 1.4MB.
*/
package main

import (
    "flag"
    "image/gif"
    "log"
    "net/http"
    "path/filepath"
    "strings"
)

func localAndNetBasedClient() *http.Client {
    transport := new(http.Transport)
    transport.RegisterProtocol("file", http.NewFileTransport(http.Dir("/")))
    client := new(http.Client)
    client.Transport = transport
    return client
}

var httpPrefixes = []string{"http", "https"}

func ensureFitForFetch(source string) string {
    for _, prefix := range httpPrefixes {
        if strings.HasPrefix(source, prefix) {
            return source
        }
    }

    absSource, err := filepath.Abs(source)
    if err == nil {
        source = absSource
    }
    return strings.Join([]string{"file://", source}, "/")
}

func main() {
    var source string
    flag.StringVar(&source, "source",
        "https://67.media.tumblr.com//b02659b27081314a412215f4eb31dacf/tumblr_nu2x4whJgy1udf6d3o1_400.gif",
        "the URI of the gif you want to ensure properly decodes in Go")
    flag.Parse()

    client := localAndNetBasedClient()
    source = ensureFitForFetch(source)
    res, err := client.Get(source)
    if err != nil {
        log.Fatal(err)
    }

    defer res.Body.Close()

    if _, err := gif.DecodeAll(res.Body); err != nil {
        log.Fatal(err)
    }
}
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Jun 22, 2016

CC @nigeltao

This happens with every Go release since 1.1, so it is not a new problem.

With 1.0.3 I get gif: extra data after image.

@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Jun 22, 2016

@ianlancetaylor ianlancetaylor changed the title image/gif: gif decoding fails `with too much data` image/gif: gif decoding fails `with too much image data` Jun 22, 2016

@nigeltao nigeltao self-assigned this Jun 23, 2016

@quentinmit quentinmit added the NeedsFix label Oct 7, 2016

@rsc rsc modified the milestones: Go1.9, Go1.8 Oct 26, 2016

@vsiv

This comment has been minimized.

Copy link

vsiv commented Feb 14, 2017

Any updates on this, we process thousands of images daily, and run into this everyday, albeit on a very small number of them.

Thanks.

@nigeltao

This comment has been minimized.

Copy link
Contributor

nigeltao commented Feb 15, 2017

No update, but it would be helpful if you could attach a small (in terms of file size) GIF that exhibits this problem. The original post linked to something weighing 1 MB but that's still kind of unwieldy to debug. The smallest 'bad' GIF you've got would be great.

@vsiv

This comment has been minimized.

Copy link

vsiv commented Feb 15, 2017

Most of our images come from giphy.com where, a typical GIF is > 1MB, so I havent been able to find one thats smaller - will post if I find one.

There is one that's quite wonky - its smaller but fails with a different message -
http://a3.fanbread.com/uploads/listicle/featured_image/65227/extra_large_cropped_wino.gif
fails with gif: frame bounds larger than image bounds

@montanaflynn

This comment has been minimized.

Copy link

montanaflynn commented Feb 15, 2017

@nigeltao here is one that's under 500kb and less than 15 frames: https://j.gifs.com/Z6KoKJ.gif

It seems much more prominent on larger gifs though.

@horgh

This comment has been minimized.

Copy link
Contributor

horgh commented Feb 20, 2017

I've debugged this. I have an explanation and a recommendation.

The problem gifs in question (tx.gif from @odeke-em and Z6KoKJ.gif from @montanaflynn) indeed have extra/too much data. In both cases, the extra data is a data sub-block in the image data with size 1 containing 0x00. In both cases, there is an LZW End of Information Code, then this extra sub-block, and then the Block Terminator.

In tx.gif this happens in the second image in the gif. In Z6KoKJ.gif it happens in the 11th image in the gif.

In well formed gifs, the last sub-block would not be present. It does not need to be there since we completed decompression.

Currently the decoder fully reads/decompresses the image's data, and then uses the block reader to check whether there is more. It sees there is more data (in this case, there is a block of 1 byte found) and raises this error.

If we don't decompress the data and simply look at the image data as a series of data sub-blocks, it is not extra. That is, the data sub-blocks are well formed. It is extra only when we take into account reaching the LZW End of Information code.

The question is what we should be doing. Other decoders (GIFLIB, browsers) apparently accept this. As far as the specification goes, it does not say one way or the other. In Appendix F it says "the code that terminates the LZW compressed data must appear before Block Terminator". It does not state what to do if there is data after the LZW data ends, nor that the code must be the last piece of data. Section 22 states that "[t]he sequence of indices is encoded using the LZW Algorithm [...]" so I think this data should not be treated as an index. But it is not useful and presumably is technically invalid.

I looked at what the most recent version of GIFLIB (5.1.4) does. After it fully decompresses an image, it silently reads and discards any extra data such as this. It stops when it reads the Block Terminator. You can see this in its DGifGetLine() function (the do while loop). I think doing this would work for Go's decoder too. Once we read and decompress the image, any extra data in its sub-blocks can be discarded. There should not be any, but if there are, ignore them.

As for where these problem gifs are coming from, given the transformation using ffmpeg produced one with the defect, its gif encoder may have a bug!

Note I didn't analyze @vsiv's extra_large_cropped_wino.gif. It appears corrupted to me in Firefox and is showing a different error, so possibly it is either truly corrupted or a different issue.

I wrote a patch with a possible solution. I realize this will need to be submitted differently, but I thought I would include it and get feedback here before doing so. https://github.com/horgh/go/commit/ab144b8f336c0b99eec9b416a7577a305190dbdc

Thanks for any feedback!

@cespare

This comment has been minimized.

Copy link
Contributor

cespare commented Feb 20, 2017

Hi @horgh, nice debugging work. I suggest you submit a CL. We cannot review code without a signed CLA, so a Gerrit CL is the only format for discussing patches.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Feb 20, 2017

@horgh, great debugging.

@nigeltao, thoughts?

@vsiv

This comment has been minimized.

Copy link

vsiv commented Feb 20, 2017

@horgh - thanks for the debugging!
All - In an ideal world, if chrome/safari can handle a gif without user errors, golang accomodates it as well - but i understand this can be a slippery slope, because as u pointed out, some of these gif encoders have bugs in them, and many being used in existence are older versions of those libs, so even though the lib fixes the issue, new gif's appear on the web with the buggy encoding. This is similar to forgiving html authoring errors, that browsers accomodate.

Another option is to work out an external gif decoder lib here on github for concerning users to use if needed and leave the stricter lib as part of the golang distribution. thanks.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Feb 20, 2017

CL https://golang.org/cl/37258 mentions this issue.

@horgh

This comment has been minimized.

Copy link
Contributor

horgh commented Feb 20, 2017

Thanks all! I've uploaded the possible patch to Gerrit now too: https://go-review.googlesource.com/37258 (oh I see @gopherbot beat me!)

@nightlyone

This comment has been minimized.

Copy link
Contributor

nightlyone commented Feb 20, 2017

Could it be limited somehow, how much stray data is allowed? Otherwise it feels like a new attack vector. What do other gif libraries do to limit it?

@horgh

This comment has been minimized.

Copy link
Contributor

horgh commented Feb 21, 2017

Good point @nightlyone.

Other gif libraries:

  • GIFLIB has no limit on how much extra data it reads (see the function I referenced in my first comment)
  • ffmpeg's decoder also has no limit as best I can tell. See its gif_read_image() call to ff_lzw_decode_tail() which states "read the garbage data until end marker is found". The while loop in the latter function reads and skips bytes until it reaches a block terminator.

That is not to say we need to use that same approach here. In the two samples mentioned in this issue, only a single byte is extra. We could be conservative and allow as little as 1 byte or some arbitrary small limit.

@nigeltao

This comment has been minimized.

Copy link
Contributor

nigeltao commented Feb 23, 2017

@horgh great debugging indeed.

I'd like to limit the slack to at most one extra byte. We can continue the conversation on the code review.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 5, 2017

Change https://golang.org/cl/68350 mentions this issue: image/gif: make blockReader a ByteReader, harden tests

gopherbot pushed a commit that referenced this issue Oct 19, 2017

image/gif: make blockReader a ByteReader, harden tests
golang.org/cl/37258 was committed to fix issue #16146.

This patch seemed intent to allow at most one dangling byte.  But, as
implemented, many more bytes may actually slip through.  This is because
the LZW layer creates a bufio.Reader which will itself consume data
beyond the end of the LZW stream, and this isn't accounted for anywhere.

This change means to avoid the allocation of the bufio.Reader by making
blockReader implement io.ByteReader.  Further, it adds a close() method
which detects extra data in the block sequence.  To avoid any
regressions with poorly encoded GIFs which may have worked accidentally,
there are no restrictions on how many extra bytes may exist in the final
full sub-block that contained LZW data.  If the end of the LZW stream
happened to align with the end of a sub-block, at most one more
sub-block with a length of 1 byte may exist before the block terminator.

This change aims to be at least as performant as the prior
implementation.  But the primary gain is avoiding the allocation of a
bufio.Reader per frame:

name      old time/op    new time/op    delta
Decode-8     276µs ± 0%     275µs ± 2%    ~     (p=0.690 n=5+5)

name      old speed      new speed      delta
Decode-8  55.9MB/s ± 0%  56.3MB/s ± 2%    ~     (p=0.690 n=5+5)

name      old alloc/op   new alloc/op   delta
Decode-8    49.2kB ± 0%    44.8kB ± 0%  -9.10%  (p=0.008 n=5+5)

name      old allocs/op  new allocs/op  delta
Decode-8       269 ± 0%       267 ± 0%  -0.74%  (p=0.008 n=5+5)

Change-Id: Iec4f9b895561ad52266313fbc73ec82c070c3349
Reviewed-on: https://go-review.googlesource.com/68350
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Nigel Tao <nigeltao@golang.org>

@golang golang locked and limited conversation to collaborators Oct 5, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.