Skip to content
This repository has been archived by the owner on Sep 4, 2021. It is now read-only.

Implement Persistent Volumes #3897

Merged
merged 10 commits into from Mar 9, 2017
Merged

Implement Persistent Volumes #3897

merged 10 commits into from Mar 9, 2017

Conversation

lmars
Copy link
Contributor

@lmars lmars commented Feb 2, 2017

This pull request is a continuation of the work already done on the persistent-volumes branch to implement persistent volumes as per #2438.

Summary:

  • the scheduler syncs volumes from all hosts and persists them to the volumes and job_volumes tables in the controller database
  • when placing a job which requires volumes, the scheduler first tries to locate existing, unassigned volumes for the job's app / release / type and the volume's path, and if it finds one then places the job on the same host as the volume whilst also assigning the volume to the job (i.e. setting Volume.JobID)
  • volumes can be "decommissioned" via flynn volume decommission ID which causes the volume to not be attached to any new jobs
  • if a host which has unassigned volumes is down, jobs are still scheduled on that host but enter a new blocked state waiting for either the host to come back up or the volume to be decommissioned. This then means if a host is rebooted which has data for a process type, the job just stays down until the host comes back rather than being restarted on a different host with an empty volume, but if the host has really gone away then it is up to the operator to decommission the volume and unblock the job to be scheduled on a different host

Things which need to be considered but not included in this PR:

  • volumes won't be persisted through deployments, but the design allows for that to be added later (see this comment)
  • we should add flynn volume backup and flynn volume restore so that if a host is lost and volumes need to be decommissioned in order to move jobs to other hosts, operators can first restore volumes so that the unblocked job doesn't have to start with a completely empty volume

@lmars lmars force-pushed the persistent-volumes-1 branch 2 times, most recently from e24e66d to 0c3914e Compare February 10, 2017 16:34
register("volume", runVolume, `
usage: flynn volume
flynn volume show [--json] <id>
flynn volume decommission <id>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this command be scoped to the app by default?

Type: VolumeEventTypeDestroy,
}
}
ch <- e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this select on h.stop as well?

// and this volume doesn't exist on that host
if job.HostID != "" && vol.HostID != job.HostID {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is exclusivity enforced in flynn-host too? There can technically be two schedulers scheduling jobs at once under failure conditions.

@@ -1035,6 +1274,10 @@ func (s *Scheduler) HandleInternalStateRequest(req *InternalStateRequest) {
req.State.Formations[key.String()] = &f
}

for id, vol := range s.volumes {
req.State.Volumes[id] = &(*vol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for &(*vol)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to copy it to create a "snapshot" of the scheduler's state to pass back to the caller.

@lmars
Copy link
Contributor Author

lmars commented Mar 2, 2017

@titanous comments addressed.

lmars added 10 commits March 8, 2017 22:16
Signed-off-by: Lewis Marshall <lewis@lmars.net>
Signed-off-by: Lewis Marshall <lewis@lmars.net>
Signed-off-by: Lewis Marshall <lewis@lmars.net>
Signed-off-by: Lewis Marshall <lewis@lmars.net>
Useful for the scheduler creating volumes which it needs to track.

Signed-off-by: Lewis Marshall <lewis@lmars.net>
Signed-off-by: Lewis Marshall <lewis@lmars.net>
Signed-off-by: Lewis Marshall <lewis@lmars.net>
Signed-off-by: Lewis Marshall <lewis@lmars.net>
Signed-off-by: Lewis Marshall <lewis@lmars.net>
Signed-off-by: Lewis Marshall <lewis@lmars.net>
@lmars lmars merged commit 7669dff into master Mar 9, 2017
@lmars lmars deleted the persistent-volumes-1 branch March 9, 2017 14:36
@titanous titanous mentioned this pull request Dec 5, 2017
5 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants