Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure to restore cache produces corrupted files passed to incremental builds #929

Closed
asottile opened this issue Oct 11, 2018 · 12 comments · Fixed by #930
Closed

failure to restore cache produces corrupted files passed to incremental builds #929

asottile opened this issue Oct 11, 2018 · 12 comments · Fixed by #930
Assignees

Comments

@asottile
Copy link
Contributor

there appear to be two problems here, first that a valid tar is being rejected and second that the partially written files are still being passed into subsequent builds.

I've had difficulty finding a minimal testcase yet so I'll include my logs, observations, and in parallel work on trying to make a minimal reproduction.

The command we're using to build is:

s2i build \
    --pull-policy=never --incremental-pull-policy=never \
    --incremental --copy \
    "$DIRECTORY" "$BUILDER" "$TARGET_TAG"

The log outputs (I've prefixed our helper tool's output with ~~~ )

Checking if image "$BUILDER" is available locally ...
Checking if image "$BUILDER" is available locally ...
Checking if image "$TARGET_TAG" is available locally ...
~~~ Saving pre-commit-cache...
~~~ --> sha256 of inputs adf5190953
ERROR: Error extracting tar stream
ERROR: Error extracting file "pre-commit-cache.tar": unexpected EOF
WARNING: Clean build will be performed because of error saving previous build artifacts
~~~ Attempting to restore pre-commit-cache...
~~~ --> sha256 of inputs adf5190953
~~~ tar: Skipping to next header
~~~ tar: Skipping to next header
~~~ tar: Skipping to next header
~~~ tar: Skipping to next header
~~~ tar: Skipping to next header
~~~ tar: Unexpected EOF in archive
~~~ tar: Error is not recoverable: exiting now

(see below for the structure of our artifacts, this error happens as it tries to untar pre-commit-cache.tar -- we're using tar-of-tar to work around the IO bottlenecks alluded to in #832)

If I manually run the save-artifacts myself it appears to succeed:

$ docker run --rm -ti 926f31bc68fb bash
lyftsvc@88d6197c4ad3:/code$ time /usr/local/s2i/save-artifacts > /tmp/artifact.tgz
Saving pre-commit-cache...
--> sha256 of inputs adf5190953

real	0m10.680s
user	0m0.152s
sys	0m5.376s

And that tarfile is fine:

$ tar --list -f /tmp/artifact.tgz 
pre-commit-cache.s4i.json
pre-commit-cache.tar

Is it possibly a file size thing -- the artifact involved is quite large:

$ du -hs /tmp/artifact.tgz 
678M	/tmp/artifact.tgz

The messages from our tool "Attempting to restore ..." only appear if something is in /tmp/artifacts -- which should (?) be empty given the message WARNING: Clean build will be performed because of error saving previous build artifacts.

I'll try and make a minimal reproduction that doesn't involve our builders / helpers / infra.

@bparees
Copy link
Contributor

bparees commented Oct 11, 2018

your save-artifacts script should not output anything except the tar stream. it's possible we are combining stdout+stderr. I'd ask that you revise your save-artifacts script to eliminate extraneous output before we investigate this further.

@asottile
Copy link
Contributor Author

stderr is displayed to the screen and not sent to the tar, we have many other builds which do the same and don't have issue (just with this particularly large one)

@bparees
Copy link
Contributor

bparees commented Oct 11, 2018

if you pass this flag:
https://github.com/openshift/source-to-image/blob/master/pkg/cmd/cli/util/util.go#L31

you can examine the staging dir that is being injected into your build and see what's in there.

@asottile
Copy link
Contributor Author

Here is the same with the prints removed from save-artifacts (yes I actually removed them -- I didn't just doctor the output!)

Checking if image "$BUILDER" is available locally ...
Checking if image "$BUILDER" is available locally ...
Checking if image "$TARGET_TAG" is available locally ...
ERROR: Error extracting tar stream
ERROR: Error extracting file "pre-commit-cache.tar": unexpected EOF
WARNING: Clean build will be performed because of error saving previous build artifacts
~~~ Attempting to restore pre-commit-cache...
~~~ --> sha256 of inputs adf5190953
~~~ tar: Skipping to next header
~~~ tar: Skipping to next header
~~~ tar: Skipping to next header
~~~ tar: Skipping to next header
~~~ tar: Unexpected EOF in archive
~~~ tar: Error is not recoverable: exiting now

@asottile
Copy link
Contributor Author

Here's what it's sending along:

# ls -ahl /tmp/s2i029383798/upload/*
/tmp/s2i029383798/upload/artifacts:
total 447M
drwx------ 2 root root 4.0K Oct 11 23:23 .
drwxr-xr-x 5 root root 4.0K Oct 11 23:23 ..
-rw-r--r-- 1 root root  110 Oct 11 19:13 pre-commit-cache.s4i.json
-rw-r--r-- 1 root root 447M Oct 11 23:23 pre-commit-cache.tar

/tmp/s2i029383798/upload/scripts:
total 8.0K
drwxr-xr-x 2 root root 4.0K Oct 11 23:23 .
drwxr-xr-x 5 root root 4.0K Oct 11 23:23 ..

/tmp/s2i029383798/upload/src:
total 152K
drwxr-xr-x 14 root root 4.0K Oct 11 23:23 .
...

note that that tar is missing ~200 MB

on subsequent runs that tar file is sometimes different sized -- maybe some timeout of some sort?

@bparees
Copy link
Contributor

bparees commented Oct 11, 2018

can you run with loglevel 5?

@asottile
Copy link
Contributor Author

gotta grab my train, I'll try with that tomorrow

@asottile
Copy link
Contributor Author

here's log level 5, I've cut out the filenames and substituted the variables seen in the original post:

I1012 15:23:11.765555     105 build.go:50] Running S2I version "v1.1.10"
I1012 15:23:11.766047     105 util.go:57] Getting docker credentials for $BUILDER
I1012 15:23:11.766098     105 util.go:57] Getting docker credentials for $TARGET_TAG
I1012 15:23:11.767302     105 util.go:269] Checking if image "$BUILDER" is available locally ...
I1012 15:23:11.770890     105 build.go:150] 
Builder Image:			$BUILDER
Source:				$DIRECTORY
Output Image Tag:		$TARGET_TAG
Environment:			CODE_ROOT=/code/fannersql,IAM_ROLE=fannersql-development-iad@173840052742,IMAGE_NAME=fannersql,IMAGE_VERSION=f428183904b061e060288228072b25e9352371dc,JENKINS_REF=__jenkins_ref_not_set__
Labels:				
Incremental Build:		enabled
Incremental Image Pull User:	
Remove Old Build:		disabled
Builder Pull Policy:		never
Previous Image Pull Policy:	never
Quiet:				disabled
Layered Build:			disabled
Docker Endpoint:		unix:///var/run/docker_sockets/docker.sock
Docker Pull Config:		/root/.docker/config.json
Docker Pull User:		
Bind mounts:			/tmp/control/tmp/tmpp9nu1c3u->/tmp/home/.ssh:ro

I1012 15:23:11.770917     105 util.go:269] Checking if image "$BUILDER" is available locally ...
I1012 15:23:11.777259     105 docker.go:510] Using locally available image "$BUILDER"
I1012 15:23:11.777365     105 docker.go:741] Image sha256:62b7a31a7aa69b8ef9472d30703b5a34d539440c6a0f83b2d321c7492769c38f contains io.openshift.s2i.scripts-url set to "image:///usr/local/s2i"
I1012 15:23:11.778145     105 scm.go:20] DownloadForSource $DIRECTORY
I1012 15:23:11.778204     105 sti.go:198] Preparing to build $TARGET_TAG
I1012 15:23:11.778826     105 download.go:30] Copying sources from "$DIRECTORY" to "/tmp/s2i434513589/upload/src"


... (snipped out the file list here)

I1012 15:23:11.783183     105 fs.go:236] F "$DIRECTORY/.editorconfig" -> "/tmp/s2i434513589/upload/src/.editorconfig"

...

I1012 15:23:17.262639     105 install.go:249] Using "assemble" installed from "image:///usr/local/s2i/assemble"
I1012 15:23:17.262790     105 install.go:249] Using "run" installed from "image:///usr/local/s2i/run"
I1012 15:23:17.262846     105 install.go:249] Using "save-artifacts" installed from "image:///usr/local/s2i/save-artifacts"
I1012 15:23:17.263011     105 ignore.go:63] .s2iignore file does not exist
I1012 15:23:17.263094     105 util.go:269] Checking if image "$TARGET_TAG" is available locally ...
I1012 15:23:17.269683     105 sti.go:205] Existing image for tag $TARGET_TAG detected for incremental build
I1012 15:23:17.269706     105 sti.go:210] Performing source build from $DIRECTORY
I1012 15:23:17.269789     105 sti.go:502] Saving build artifacts from image $TARGET_TAG to path /tmp/s2i434513589/upload/artifacts
I1012 15:23:17.272310     105 sti.go:521] The assemble user is not set, defaulting to "0:0" user
I1012 15:23:17.292252     105 docker.go:741] Image sha256:e3f325241e189a48db39f1fc69b3ec64131b43f472c159a54c33904f0616ce92 contains io.openshift.s2i.scripts-url set to "image:///usr/local/s2i"
I1012 15:23:17.292327     105 docker.go:816] Base directory for S2I scripts is '/usr/local/s2i'. Untarring destination is '/tmp'.
I1012 15:23:17.292356     105 docker.go:972] Setting "/usr/local/s2i/save-artifacts" command for container ...
I1012 15:23:17.292670     105 docker.go:981] Creating container with options {Name:"s2i_s2i_build_tmp_fannersql_4a4d0e8a" Config:{Hostname: Domainname: User:0:0 AttachStdin:false AttachStdout:true AttachStderr:false ExposedPorts:map[] Tty:false OpenStdin:false StdinOnce:false Env:[] Cmd:[/usr/local/s2i/save-artifacts] Healthcheck:<nil> ArgsEscaped:false Image:$TARGET_TAG:latest Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:&{Binds:[/tmp/control/tmp/tmpp9nu1c3u:/tmp/home/.ssh:ro] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:false PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[] UTSMode: UsernsMode: ShmSize:67108864 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DiskQuota:0 KernelMemory:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:0 Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] Init:<nil>}} ...
I1012 15:23:17.446180     105 docker.go:1013] Attaching to container "28d2d713985675909c901ecefc9021fbaa4b511fc818ed80b99ccf5b46818e3e" ...
I1012 15:23:17.448370     105 docker.go:1024] Starting container "28d2d713985675909c901ecefc9021fbaa4b511fc818ed80b99ccf5b46818e3e" ...
I1012 15:23:28.137260     105 tar.go:376] Creating directory /tmp/s2i434513589/upload/artifacts
I1012 15:23:28.137712     105 tar.go:444] Creating /tmp/s2i434513589/upload/artifacts/pre-commit-cache.s4i.json
I1012 15:23:28.138202     105 tar.go:454] Extracting/writing /tmp/s2i434513589/upload/artifacts/pre-commit-cache.s4i.json
I1012 15:23:28.138590     105 tar.go:376] Creating directory /tmp/s2i434513589/upload/artifacts
I1012 15:23:28.138670     105 tar.go:444] Creating /tmp/s2i434513589/upload/artifacts/pre-commit-cache.tar
I1012 15:23:28.139043     105 tar.go:454] Extracting/writing /tmp/s2i434513589/upload/artifacts/pre-commit-cache.tar
E1012 15:23:58.145766     105 tar.go:398] Error extracting tar stream
I1012 15:24:15.869611     105 docker.go:1055] Waiting for container "28d2d713985675909c901ecefc9021fbaa4b511fc818ed80b99ccf5b46818e3e" to stop ...
E1012 15:24:15.870792     105 tar.go:390] Error extracting file "pre-commit-cache.tar": unexpected EOF
I1012 15:24:16.219287     105 docker.go:991] Removing container "28d2d713985675909c901ecefc9021fbaa4b511fc818ed80b99ccf5b46818e3e" ...
I1012 15:24:16.649248     105 docker.go:1001] Removed container "28d2d713985675909c901ecefc9021fbaa4b511fc818ed80b99ccf5b46818e3e"
W1012 15:24:16.650596     105 sti.go:213] Clean build will be performed because of error saving previous build artifacts
I1012 15:24:16.650635     105 sti.go:214] error: timeout waiting for tar stream
I1012 15:24:16.650688     105 sti.go:221] Running "assemble" in "$TARGET_TAG"
I1012 15:24:16.650702     105 sti.go:559] Using image name $BUILDER
I1012 15:24:16.658499     105 docker.go:510] Using locally available image "$BUILDER"
I1012 15:24:16.659003     105 sti.go:445] No user environment provided (no environment file found in application sources)
I1012 15:24:16.659274     105 sti.go:673] starting the source uploading ...
I1012 15:24:16.659317     105 tar.go:217] Adding "/tmp/s2i434513589/upload" to tar ...
I1012 15:24:16.659466     105 tar.go:312] Adding to tar: /tmp/s2i434513589/upload/artifacts as artifacts
I1012 15:24:16.665528     105 docker.go:741] Image sha256:62b7a31a7aa69b8ef9472d30703b5a34d539440c6a0f83b2d321c7492769c38f contains io.openshift.s2i.scripts-url set to "image:///usr/local/s2i"
I1012 15:24:16.665555     105 docker.go:816] Base directory for S2I scripts is '/usr/local/s2i'. Untarring destination is '/tmp'.
I1012 15:24:16.665569     105 docker.go:972] Setting "/bin/sh -c tar -C /tmp -xf - && /usr/local/s2i/assemble" command for container ...
I1012 15:24:16.665635     105 docker.go:981] Creating container with options {Name:"s2i_lyft_precommit_6d020ab9a22e9c28bd230e8a1f9f55b36d3b3b38_c794daa9" Config:{Hostname: Domainname: User: AttachStdin:false AttachStdout:true AttachStderr:false ExposedPorts:map[] Tty:false OpenStdin:true StdinOnce:true Env:[CODE_ROOT=/code/fannersql IAM_ROLE=fannersql-development-iad@173840052742 IMAGE_NAME=fannersql IMAGE_VERSION=f428183904b061e060288228072b25e9352371dc JENKINS_REF=__jenkins_ref_not_set__] Cmd:[/bin/sh -c tar -C /tmp -xf - && /usr/local/s2i/assemble] Healthcheck:<nil> ArgsEscaped:false Image:$BUILDER Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:&{Binds:[/tmp/control/tmp/tmpp9nu1c3u:/tmp/home/.ssh:ro] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:false PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[] UTSMode: UsernsMode: ShmSize:67108864 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DiskQuota:0 KernelMemory:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:0 Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] Init:<nil>}} ...
I1012 15:24:16.797838     105 docker.go:1013] Attaching to container "f97c5eca1ddc16e7a4430c5d4a31d9060107524152acae00da479a4f9cfa3931" ...
I1012 15:24:16.799094     105 docker.go:1024] Starting container "f97c5eca1ddc16e7a4430c5d4a31d9060107524152acae00da479a4f9cfa3931" ...
I1012 15:24:17.326727     105 tar.go:312] Adding to tar: /tmp/s2i434513589/upload/artifacts/pre-commit-cache.s4i.json as artifacts/pre-commit-cache.s4i.json
I1012 15:24:17.327360     105 tar.go:312] Adding to tar: /tmp/s2i434513589/upload/artifacts/pre-commit-cache.tar as artifacts/pre-commit-cache.tar
I1012 15:24:19.449448     105 tar.go:312] Adding to tar: /tmp/s2i434513589/upload/scripts as scripts
I1012 15:24:19.450058     105 tar.go:312] Adding to tar: /tmp/s2i434513589/upload/src as src
I1012 15:24:19.450453     105 tar.go:312] Adding to tar: /tmp/s2i434513589/upload/src/.editorconfig as src/.editorconfig

... snipped many files here

I1012 15:24:20.894961     105 sti.go:685] Attempting to restore pre-commit-cache...
I1012 15:24:20.895270     105 sti.go:685] --> sha256 of inputs adf5190953
I1012 15:24:20.982413     105 sti.go:685] tar: Skipping to next header
I1012 15:24:20.983488     105 sti.go:685] tar: Skipping to next header
I1012 15:24:20.983891     105 sti.go:685] tar: Skipping to next header
I1012 15:24:20.984393     105 sti.go:685] tar: Skipping to next header
I1012 15:24:20.985976     105 sti.go:685] tar: Skipping to next header
I1012 15:24:20.986550     105 sti.go:685] tar: Skipping to next header
I1012 15:24:20.986954     105 sti.go:685] tar: Skipping to next header
I1012 15:24:20.987673     105 sti.go:685] tar: Archive contains ‘\0\330\331\035\0\0\0\0\0\343\331\035’ where numeric time_t value expected
I1012 15:24:20.987685     105 sti.go:685] tar: Skipping to next header
I1012 15:24:20.991310     105 sti.go:685] tar: Archive contains ‘\177\237’ where numeric mode_t value expected
I1012 15:24:20.991437     105 sti.go:685] tar: Archive contains ‘\0\004\0\221\277\177\237’ where numeric time_t value expected
I1012 15:24:20.991448     105 sti.go:685] tar: Skipping to next header
I1012 15:24:20.992482     105 sti.go:685] tar: Exiting with failure status due to previous errors
I1012 15:24:20.994548     105 sti.go:681] --> MISS! (cache tar corrupted)

... snipped (our script cold build, I've made it intentionally fail to terminate the build quicker)

I1012 15:27:07.906032     105 docker.go:1055] Waiting for container "f97c5eca1ddc16e7a4430c5d4a31d9060107524152acae00da479a4f9cfa3931" to stop ...
I1012 15:27:08.206605     105 docker.go:991] Removing container "f97c5eca1ddc16e7a4430c5d4a31d9060107524152acae00da479a4f9cfa3931" ...
I1012 15:27:08.397375     105 docker.go:1001] Removed container "f97c5eca1ddc16e7a4430c5d4a31d9060107524152acae00da479a4f9cfa3931"
I1012 15:27:08.398053     105 cleanup.go:31] Temporary directory "/tmp/s2i434513589" will be saved, not deleted
I1012 15:27:08.398075     105 build.go:156] Build failed
E1012 15:27:08.398089     105 errors.go:276] An error occurred: non-zero (13) exit code from $BUILDER
2018-10-12 15:27:09,409 control.cli: ERROR s2i build exited 1

we're using v1.1.10

@bparees
Copy link
Contributor

bparees commented Oct 12, 2018

Looks like there is indeed a 30s timeout, and you are hitting it:

I1012 15:23:28.139043     105 tar.go:454] Extracting/writing /tmp/s2i434513589/upload/artifacts/pre-commit-cache.tar
E1012 15:23:58.145766     105 tar.go:398] Error extracting tar stream

i'll create a fix to bump it up significantly.

@asottile
Copy link
Contributor Author

should there be an additional PR to empty the artifacts dir in the event that tar fails?

@bparees
Copy link
Contributor

bparees commented Oct 12, 2018

should there be an additional PR to empty the artifacts dir in the event that tar fails?

it wouldn't be a bad idea, any chance you can contribute it?

@asottile
Copy link
Contributor Author

I'll see if I can, the current messaging is misleadingly wrong about that though 😆

WARNING: Clean build will be performed because of error saving previous build artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants