Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set status 503 when backup is in progress. Fixes #5221 #5222

Merged
merged 1 commit into from Oct 3, 2018

Conversation

Projects
None yet
2 participants
@ketan
Copy link
Member

commented Oct 2, 2018

Related to #5221

@ketan

This comment has been minimized.

Copy link
Member Author

commented Oct 2, 2018

Initial PR, I feel we should consider sending a 201 accepted status for requests to files, so that agents continue to "wait" and builds do not fail.

WDYT?

@arvindsv

This comment has been minimized.

Copy link
Member

commented Oct 2, 2018

Unless they wait indefinitely, it might not be enough. As far as I know, they try 3 times and then fail anyway.

@ketan

This comment has been minimized.

Copy link
Member Author

commented Oct 2, 2018

Alternatively, let calls to /go/files pass-through, since they do not affect the DB in any way.

@arvindsv

This comment has been minimized.

Copy link
Member

commented Oct 2, 2018

As long as permissions are handled properly, sure.

@@ -68,6 +68,12 @@ public void doFilter(ServletRequest request, ServletResponse response, FilterCha
}

String url = ((HttpServletRequest) request).getRequestURI();

if (isFilesAPI(url)) {

This comment has been minimized.

Copy link
@arvindsv

arvindsv Oct 3, 2018

Member

@ketan Rethinking this, I'm not sure if this is a good idea. Can't calls such as this affect the DB, while a backup is in progress?

This comment has been minimized.

Copy link
@ketan

ketan Oct 3, 2018

Author Member

Fair point, I'll yank that commit out.

This comment has been minimized.

Copy link
@ketan

ketan Oct 3, 2018

Author Member

Yanked. Can someone take a look?

@ketan ketan force-pushed the ketan:fix-5221 branch from 2a53091 to 9f17781 Oct 3, 2018

@arvindsv

This comment has been minimized.

Copy link
Member

commented Oct 3, 2018

Looks ok to me. Tried with a config such as this:

  <pipelines group="defaultGroup">
    <pipeline name="Upstream">
      <materials>
        <git url="https://github.com/arvindsv/faketime.git" />
      </materials>
      <stage name="stage1">
        <jobs>
          <job name="defaultJob">
            <tasks>
              <exec command="touch">
                <arg>abc.txt</arg>
                <runif status="passed" />
              </exec>
            </tasks>
            <artifacts>
              <artifact type="build" src="abc.txt" />
            </artifacts>
          </job>
        </jobs>
      </stage>
    </pipeline>
    <pipeline name="Downstream">
      <materials>
        <pipeline pipelineName="Upstream" stageName="stage1" />
      </materials>
      <stage name="defaultStage">
        <jobs>
          <job name="defaultJob">
            <tasks>
              <exec command="bash">
                <arg>-c</arg>
                <arg>while [ ! -f "/tmp/a" ]; do echo "Waiting ..."; sleep 1; done</arg>
                <runif status="passed" />
              </exec>
              <fetchartifact artifactOrigin="gocd" srcfile="abc.txt" pipeline="Upstream" stage="stage1" job="defaultJob">
                <runif status="passed" />
              </fetchartifact>
            </tasks>
          </job>
        </jobs>
      </stage>
    </pipeline>
  </pipelines>

... and suspended the thread at a breakpoint in BackupService#doPerformBackup.

Saw fetch artifact fail like this:

WARN  [scheduler-1] DownloadAction:70 - Could not fetch artifact https://127.0.0.1:8154/go/remoting/files/Upstream/1/stage1/1/defaultJob/cruise-output/md5.checksum. Pausing 14 seconds to retry. Error was : Unsuccessful response '503' from the server
WARN  [scheduler-1] DownloadAction:70 - Could not fetch artifact https://127.0.0.1:8154/go/remoting/files/Upstream/1/stage1/1/defaultJob/cruise-output/md5.checksum. Pausing 25 seconds to retry. Error was : Unsuccessful response '503' from the server
WARN  [scheduler-1] DownloadAction:70 - Could not fetch artifact https://127.0.0.1:8154/go/remoting/files/Upstream/1/stage1/1/defaultJob/cruise-output/md5.checksum. Pausing 37 seconds to retry. Error was : Unsuccessful response '503' from the server
ERROR [scheduler-1] DownloadAction:58 - Giving up fetching resource 'https://127.0.0.1:8154/go/remoting/files/Upstream/1/stage1/1/defaultJob/cruise-output/md5.checksum'. Tried 4 times and failed.
Giving up fetching resource 'https://127.0.0.1:8154/go/remoting/files/Upstream/1/stage1/1/defaultJob/cruise-output/md5.checksum'. Tried 4 times and failed.
  at com.thoughtworks.go.domain.DownloadAction.perform(DownloadAction.java:59)
  at com.thoughtworks.go.domain.builder.FetchArtifactBuilder.pullArtifact(FetchArtifactBuilder.java:67)
  at com.thoughtworks.go.domain.builder.FetchArtifactBuilder.downloadChecksumFile(FetchArtifactBuilder.java:63)
  at com.thoughtworks.go.domain.builder.FetchArtifactBuilder.fetch(FetchArtifactBuilder.java:52)
  at com.thoughtworks.go.publishers.GoArtifactsManipulator.fetch(GoArtifactsManipulator.java:172)
  at com.thoughtworks.go.work.DefaultGoPublisher.fetch(DefaultGoPublisher.java:76)
  at com.thoughtworks.go.domain.builder.FetchArtifactBuilder.build(FetchArtifactBuilder.java:48)
  at com.thoughtworks.go.remote.work.Builders.build(Builders.java:74)
  at com.thoughtworks.go.remote.work.BuildWork.execute(BuildWork.java:220)
  at com.thoughtworks.go.remote.work.BuildWork.buildJob(BuildWork.java:191)
  at com.thoughtworks.go.remote.work.BuildWork.build(BuildWork.java:137)
  at com.thoughtworks.go.remote.work.BuildWork.doWork(BuildWork.java:82)
  at com.thoughtworks.go.agent.JobRunner.run(JobRunner.java:54)
  at com.thoughtworks.go.agent.AgentHTTPClientController.retrieveWork(AgentHTTPClientController.java:134)
  at com.thoughtworks.go.agent.AgentHTTPClientController.work(AgentHTTPClientController.java:109)
  at com.thoughtworks.go.agent.AgentController.loop(AgentController.java:85)
  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Then, once the backup finished, the job rescheduled and finished.

Would have been nice to have had a way to test this by slowing down the backup. Currently, it can only be tested by suspending the thread.

@arvindsv arvindsv merged commit 3055c3a into gocd:master Oct 3, 2018

1 check passed

license/cla Contributor License Agreement is signed.
Details
@arvindsv

This comment has been minimized.

Copy link
Member

commented Oct 3, 2018

Also ran a few curl commands and they all returned 503.

@arvindsv

This comment has been minimized.

Copy link
Member

commented Oct 3, 2018

@ketan One thing I was thinking of. It would have been nice to see "Service Unavailable. Backup in progress" in the agent logs.

@arvindsv arvindsv added this to the Release 18.10 milestone Oct 3, 2018

@ketan ketan deleted the ketan:fix-5221 branch Oct 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.