Spurious failure extracting zip archive #171

alexcrichton · 2017-08-09T18:05:47Z

We've run into Invalid checksum errors a few times when working on rust-lang/rust, for example at https://ci.appveyor.com/project/rust-lang/rust/build/1.0.4224/job/ow4l9bb15wy56sht. This string apparently appears in the zip crate and comes from an invalid crc32 checksum.

How that actually managed to happen I'm not entirely sure! I'm not sure if this is a corrupt entry in the cache or a failed download, or if the download failed why it wasn't caught sooner...

The text was updated successfully, but these errors were encountered:

luser · 2017-08-09T19:32:26Z

Ok so that error originates from inside zip's Crc32Reader::read, and that struct is used inside of the ZipFileReader members, so presumably it's failing inside the io::copy in CacheRead::get_object.

...but yeah, I'm not really sure how we'd get an invalid zip file here unless the HTTP download failed somehow? I wonder if we could hash the zip file and store that digest as a header when storing in S3, and compare on download? Seems like something that shouldn't happen, in any event.

alexcrichton · 2017-08-09T21:43:59Z

Oh for some reason I thought that's what happened already but apparently not!

I think that we uploaded a valid zip archive b/c we're not getting 100% failure rate on MSVC right now. Presuambly they're all getting the same cached value and later builds succeed after one fails. In that sense I think that this is a download failure of some form. As to what kind of download failure... unsure!

We're checking for a is_success HTTP status and validating we read all the bytes, but we only validate the latter if there's a Content-Length header.

I'm not sure if S3 could serve us invalid content?

In any case though, one thing we could do is to detect a failed extraction of the archive and just count it as a cache miss maybe? That may be difficult to thread through.

ezyang · 2019-02-13T17:09:23Z

We've seen this error too, on a different project: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-devtoolset7-rocmrpm-centos7.5-build/2203//console

08:12:20 [ 72%] Building CXX object caffe2/CMakeFiles/caffe2.dir/sgd/learning_rate_adaption_op.cc.o
08:12:21 sccache: encountered fatal error
08:12:21 sccache: error : Invalid checksum
08:12:21 sccache:  cause: Invalid checksum
08:12:21 make[2]: *** [caffe2/CMakeFiles/caffe2.dir/build.make:15872: caffe2/CMakeFiles/caffe2.dir/sgd/gftrl_op.cc.o] Error 254
08:12:21 make[2]: *** Waiting for unfinished jobs....
08:12:21 make[1]: *** [CMakeFiles/Makefile2:3664: caffe2/CMakeFiles/caffe2.dir/all] Error 2

It's durable, so it definitely looks like there is something corrupted inside the cache.

sylvestre · 2024-02-19T12:55:14Z

I think it doesn't happen anymore

alexcrichton mentioned this issue Aug 9, 2017

Spurious failures in sccache rust-lang/rust#40240

Closed

kennytm mentioned this issue Aug 9, 2017

Tracking issue for Spurious failure in sccache: "Invalid checksum" rust-lang/rust#43775

Closed

sylvestre closed this as not planned Won't fix, can't repro, duplicate, stale Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spurious failure extracting zip archive #171

Spurious failure extracting zip archive #171

alexcrichton commented Aug 9, 2017

luser commented Aug 9, 2017

alexcrichton commented Aug 9, 2017

ezyang commented Feb 13, 2019

sylvestre commented Feb 19, 2024

Spurious failure extracting zip archive #171

Spurious failure extracting zip archive #171

Comments

alexcrichton commented Aug 9, 2017

luser commented Aug 9, 2017

alexcrichton commented Aug 9, 2017

ezyang commented Feb 13, 2019

sylvestre commented Feb 19, 2024