Jenkins deletes workspace for mozmill-environment matrix-job after 30 days of last config change #151

Closed
whimboo opened this Issue Aug 21, 2012 · 18 comments

Projects

None yet

2 participants

@whimboo
Contributor
whimboo commented Aug 21, 2012

As seen:
http://10.250.73.243:8080/job/mozilla-central_addons/10/console

Started by user anonymous
Building remotely on mm-win-xp-1 in workspace c:\jenkins\workspace\mozilla-central_addons

Deleting project workspace... No emails were triggered.
Unable to access upstream workspace for artifact copy. Slave node offline?
Build step 'Copy artifacts from another project' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure
Sending email to: mozmill-ci@mozilla.org
Finished: FAILURE

Somehow this is related to the mozmill-environment artifact. I manually checked the workspace for each platform and none of those existed anymore, even I have seen them yesterday after running this job. So somehow they have been deleted. Running the jobs per platform fixes the problem for now. Lets monitor and get it fixed for real.

@whimboo whimboo was assigned Aug 21, 2012
Member

This happened again today. See http://10.250.73.243:8080/job/ondemand_update/1517/console

Rerunning get_mozmill-environments fixed it once more.

Contributor
whimboo commented Sep 22, 2012

But it failed again a couple of hours later when Juan did the ondemand update testrun. I have restarted Jenkins and now it seems to stick. I think early next quarter we have to ensure that we can upgrade Jenkins and all the plugins ASAP. I can believe that it is related to some bad interaction.

Contributor
whimboo commented Oct 22, 2012

So this issue is directly related to our mozmill-environment job which is a multiple axis job in Jenkins. It gets triggered manually whenever we release a new version of the environment. All the jobs ended successfully so far.

But under some unknown circumstances the workspace of this job gets deleted. We do not know why that happens and we have to investigate that further. So Dave will ask in #jenkins if this is a known issue. If not and we can't have it fixed soon, I will create a crontab script which checks for the workspace folder each 5 minutes, and if not present runs the job via the Jenkins API and sends us an email with details. That should help us to analyze the system log to retrieve further information.

Member

It would appear that we are hitting JENKINS-4501

Member

Some suggestions:

  • Change this job to a standard (not matrix) job, which downloads the environment for all platforms. Subsequent jobs could unzip just the relevant environment by name.
  • Disable workspace cleanup using the system property hudson.model.WorkspaceCleanupThread.disabled
  • Schedule this job to run regularly (once a week, for example) so that the workspace directory is always considered recent
Contributor
whimboo commented Oct 22, 2012

If it is possible that this job doesn't have to be a matrix job lets go with that! I'm totally behind that idea. Would you have the time to work on that Dave? If not I could try to get this started tomorrow.

Member

I might be able to work on this on the plane tomorrow.

Contributor
whimboo commented Oct 26, 2012

We landed the patch from dave and I think we don't have to go back to a matrix job. Lets close this issue now.

@whimboo whimboo closed this Oct 26, 2012
@whimboo whimboo reopened this Oct 29, 2012
Contributor
whimboo commented Oct 29, 2012

Looks like a bug in the xshell plugin stopped us from delivering it to production. As Dave pointed out it has been fixed by jenkinsci/xshell-plugin@a13e799 but we cannot upgrade to it because it requires a higher Jenkins version. Not sure why we haven't seen this problem earlier.

Member
davehunt commented Sep 6, 2013

We're not blocked from changing the environments job back to a standard job any more, but I don't think we've seen this error recently. If not, I think we should close this issue for now. What do you think @whimboo?

Contributor
whimboo commented Sep 6, 2013

If we could really switch away from a matrix job I would support that move!

Member
davehunt commented Sep 6, 2013

Probably just need to unbitrot my https://github.com/davehunt/mozmill-ci/tree/remove-matrix branch.

Member

We saw this again today on staging

Contributor
whimboo commented Sep 30, 2013

As a workaround we might want to re-save the config twice a month for that matrix job, which might be a workaround for the problem. As Dave mentioned he might find the time to work on that next week. Our goal would be to get the matrix job removed completely.

Contributor
whimboo commented Nov 5, 2013

I pushed the code from PR #325 to staging and noticed that we do not restrict this job to be run on master only. That means curl will fail because it is not installed on all the slaves. I will come up with a follow-up PR to fix that.

@whimboo whimboo added a commit to whimboo/mozmill-ci that referenced this issue Nov 5, 2013
@whimboo whimboo Don't allow roaming for get_mozmill-environments job (#151) 68b37e8
Contributor
whimboo commented Nov 5, 2013

Ok, everything is live on staging and works as expected. I have seen that we had another fallout of this issue on staging today, so lets see how everything works the next days.

@whimboo whimboo added a commit to whimboo/mozmill-ci that referenced this issue Nov 6, 2013
@whimboo whimboo Fix copy artifacts filter for Aurora update job (#151) 8b63cb7
Contributor
whimboo commented Nov 6, 2013

Follow-up fix for the broken aurora_update job has been pushed to staging and is active now. The only missing piece is the xshell plugin now. But @davehunt is working on a temporary version we can use in the interim.

Contributor
whimboo commented Nov 18, 2013

This is no longer an issue for us given that we do not use the matrix job anymore. The change is live on production, and we can close out this issue.

@whimboo whimboo closed this Nov 18, 2013
@whimboo whimboo was unassigned by moz-hwine Aug 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment