Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic worker cot #114

Merged
merged 5 commits into from May 31, 2017

Conversation

escapewindow
Copy link
Contributor

@petemoore , I am currently

  • detecting generic worker by the existence of task.payload.osGroups or task.payload.mounts, and
  • doing no checks about interactivity or docker image shas for generic worker. Are there some checks to make sure no one has RDPed or ssh'ed in?

Do those two things sound like valid tests for generic worker?

Once we have task.payload.features.chainOfTrust set in the build tasks and the pubkey for the level 3 windows build workerTypes in the cot repo, I think this will work.

@coveralls
Copy link

coveralls commented May 26, 2017

Coverage Status

Coverage decreased (-0.2%) to 99.825% when pulling 029046b on escapewindow:generic-worker-cot into 32cb707 on mozilla-releng:master.

@coveralls
Copy link

coveralls commented May 26, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 9705b3c on escapewindow:generic-worker-cot into 32cb707 on mozilla-releng:master.

@petemoore
Copy link
Contributor

Hi @escapewindow

Sorry for my delay in replying. Indeed, if mounts or osGroups payload property exist, it is fair to reason that this is a generic worker task payload. I fear that this could be something that changes (e.g. maybe mounts gets added to taskcluster-worker).

We don't currently have interactive support, although @grenade is working on this, so at some point soon we will. Machines that are loaned out will be terminated - but I guess an attack avenue might be to stop a machine from being terminated. However, @grenade can advise - I think secrets get wiped as part of the loan process, so it would be impossible for a worker to claim a new task after it has been loaned, I believe, even if the loaner was able to interfere with it and prevent it from being terminated. @grenade can confirm for sure.

The gecko windows machines don't have an ssh daemon (although other windows worker types, such as nss worker types do).

A question - if a task doesn't mount anything, and doesn't use osGroups feature - will these checks still work?

Also if there is anything I can do to make it less magic (like adding something in the cot certificate to make it clear it is generic worker) let me know.

Thanks!

@escapewindow
Copy link
Contributor Author

Sorry for my delay in replying. Indeed, if mounts or osGroups payload property exist, it is fair to reason that this is a generic worker task payload. I fear that this could be something that changes (e.g. maybe mounts gets added to taskcluster-worker).

We'll have to modify scriptworker.cot.verify to support taskcluster-worker once it's ready, so we'll have to make changes then anyway. I do worry about how we'll detect taskcluster-worker and differentiate it from generic-worker via task definition alone, but we still have the ability to define that. Worst case, we can curate a set of known workerTypes.

We don't currently have interactive support, although @grenade is working on this, so at some point soon we will. Machines that are loaned out will be terminated - but I guess an attack avenue might be to stop a machine from being terminated. However, @grenade can advise - I think secrets get wiped as part of the loan process, so it would be impossible for a worker to claim a new task after it has been loaned, I believe, even if the loaner was able to interfere with it and prevent it from being terminated. @grenade can confirm for sure.

Terminating loaners and removing their secrets sounds like a good plan. In that case, they won't be able to validly sign the cot artifact, and we won't have to worry about otherwise verifying validly-signed tasks aren't run on loaners.

The gecko windows machines don't have an ssh daemon (although other windows worker types, such as nss worker types do).

Which will be turned off unless it's a loaner, yes?

A question - if a task doesn't mount anything, and doesn't use osGroups feature - will these checks still work?

It checks to see if osGroups is in the task definition. It can be an empty list. If generic-worker takes the task even if osGroups and mounts are not keys in the task definition, then it will cause problems, and we may have to move to the workerType solution sooner. Alternately, we could put a 'workerImpl': 'generic-worker' in the task definition somewhere, and the absence of that or any scriptworker-isms would imply taskcluster-worker.

Also if there is anything I can do to make it less magic (like adding something in the cot certificate to make it clear it is generic worker) let me know.

We use the worker implementation detection to determine which gpg homedir to use to validate the cot certificate, so it's a bit of a circular dependency. We could read the cot certificate twice, once ignoring signature to determine worker implementation, second to validate the signature... that may be another option alongside the workerType list.

These are added in
mozilla-releng/cot-gpg-keys#26 ; let's ignore
them.
@coveralls
Copy link

coveralls commented May 30, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling cbac162 on escapewindow:generic-worker-cot into 32cb707 on mozilla-releng:master.

Copy link
Contributor

@JohanLorenzo JohanLorenzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting a new type of worker this way, looks good to me.

IIUC, generic-workers translates (for now) to windows machines, right?

Copy link
Contributor

@JohanLorenzo JohanLorenzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting a new type of worker this way, looks good to me.
IIUC, generic-workers translates (for now) to windows machines, right?

Meh, Github was down, I couldn't see my first comment.

@escapewindow
Copy link
Contributor Author

IIUC, generic-workers translates (for now) to windows machines, right?

I believe so, yes. Thanks for the super fast reviews!

@coveralls
Copy link

coveralls commented May 31, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 4090cbf on escapewindow:generic-worker-cot into 32cb707 on mozilla-releng:master.

@escapewindow escapewindow merged commit 9b77b39 into mozilla-releng:master May 31, 2017
@petemoore
Copy link
Contributor

IIUC, generic-workers translates (for now) to windows machines, right?

I believe so, yes. Thanks for the super fast reviews!

Update: we now have OS X tests running in generic worker in production.

@escapewindow
Copy link
Contributor Author

escapewindow commented Jun 19, 2017 via email

@escapewindow escapewindow deleted the generic-worker-cot branch August 22, 2017 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants