Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large artifacts fail to upload #223

Closed
gajwani opened this issue May 30, 2014 · 13 comments

Comments

Projects
None yet
5 participants
@gajwani
Copy link
Contributor

commented May 30, 2014

We were trying to store a .zip file artifact from a job, and we repeatedly saw failure to upload. We tried to make a minimal reproducible pipeline for this, and we found that a file containing between 64M-500M of random data would never upload, but a file of 32M of random data successfully uploaded.

Here is the snippet of pipeline XML:

    <pipeline name="filesize-test">
      <materials>
        <git url="https://github.com/octocat/Spoon-Knife.git" />
      </materials>
      <stage name="sizetest">
        <jobs>
          <job name="test">
            <tasks>
              <exec command="dd">
                <arg>if=/dev/urandom</arg>
                <arg>of=big_file</arg>
                <arg>count=32</arg>
                <arg>bs=1M</arg>
                <runif status="passed" />
              </exec>
            </tasks>
            <artifacts>
              <artifact src="big_file" />
            </artifacts>
          </job>
        </jobs>
      </stage>
    </pipeline>

Changing the count to 64, 125, 250, or 500 all cause failures. Using a count of 500 but with if=/dev/zero successfully stored the artifact, presumably because it compresses very well.

Our agent is running on Ubuntu 14.04 Trusty, and that machine is a c3.xlarge instance on EC2 (which has about 7GB of RAM). There was nothing else running on the instance. We are running ThoughtWorks Go version 14.1.0(18882-d0272e1d227b5e).

@gajwani

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2014

Another observation: files around 64M in size intermittently upload, but files at 128, 256 and above consistently fail.

@mdaliejaz

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2014

How much time does the upload take in failure scenario?
If it's greater than 5 minutes, it could be due to issue #174

@mdaliejaz

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2014

Another reason could be less disk space. The size of the disk should be at least double the size of the artifact being uploaded.

Could you check server logs for any instance of a similar line as below:
[Artifact Upload] Artifact upload (Required Size * 2 = <artifact size * 2>) was denied by the server because it has run out of disk space (Available Space ).

@gajwani

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2014

From what we can tell, neither of those is the case. The server has it's TMPDIR variable set up correctly, it's drive has 100GB of free space, and this is all in Amazon's EC2, which means upload/download speeds are quite high and we not hitting the 5 minute timeout.

@sachinsudheendra

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2014

@gajwani Is it possible to give us access to that instance? Also, I'm assuming TMPDIR isn't a separate 100G mount and that there is sufficient space on every mount (this is with reference to an older issue #16).

@gajwani

This comment has been minimized.

Copy link
Contributor Author

commented Jun 4, 2014

It turns out that the JVM does not respect the system $TMPDIR variable and tries to store information straight to /tmp on Linux environments. We've set -Djava.io.tmpdir=$TMPDIR when invoking the server and agents.

@sachinsudheendra

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2014

@gajwani I'm a little confused by your reply. Are you saying that you set -Djava.io.tmpdir=$TMPDIR now and it works as expected, or was it always set to $TMPDIR and the java process continued to use /tmp?

@gajwani

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2014

@sachinsudheendra As of today, we are setting the parameter and it works as expected; previously it wasn't set and that's why we were seeing upload failures.

@sahilm

This comment has been minimized.

Copy link
Contributor

commented Jun 10, 2014

Java doesn't look at $TMPDIR on Linux - http://stackoverflow.com/a/1924576. Can this issue be closed? /cc @gajwani @sachinsudheendra

@sachinsudheendra

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2014

@sahilm I suppose this could be closed.

Unless @gajwani expects go to be setting

-Djava.io.tmpdir=$TMPDIR if $TMPDIR

in the init script before launching go-server. Happy to accept a pull-request in this case.

sahilm added a commit to sahilm/gocd that referenced this issue Jun 12, 2014

if $TMPDIR is defined set java.io.tmpdir=$TMPDIR.
Java ignores the $TMPDIR variable. This surprises
users who expect the Go server and agent to use $TMPDIR
as the temporary directory.

Fixes gocd#223.
@gajwani

This comment has been minimized.

Copy link
Contributor Author

commented Jun 12, 2014

@sachinsudheendra @sahilm thanks for the fix - we were at a conference for the last couple of days and didn't have time to create a pull request ourselves. I think this is considered closed as soon as the pull request gets merged.

Thanks for the help gentlemen!

@sahilm

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2014

@gajwani thanks for reporting and diagnosing the issue :)

@arikagoyal

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2014

Verified on 14.2.0(295-34653bc44c058e)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.