-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Distributer Pipeline execution via Workers #107
Comments
Deals with #46. |
Awesome work, @Skarlso . 🤗 A few hints from my side:
|
|
@michelvocks just a heads up. I'm going to start on this soon, which means there will be merge conflicts all over the place as I extract the building functionality into a worker package. :) |
Awesome @Skarlso 🤗 Looking forward to all the merge conflicts 💀 🤓 |
@michelvocks Added some more info about Worker tags and requirements for resource tagging. |
One additional point which is missing in the RFC: Securing the communication between Master and Worker. Following the proposal:
This should work without any problems. The only disadvantage (from security perspective) is the initial registration process where the secret is sent to the master in plain text. What do you think @Skarlso ? |
Abstract
This document discusses the problem of executing pipelines in a distributed
manner.
Table of Contents
Introduction
Problem Statement
The problem poses the following set of challenges for Gaia:
worker.
or Go, Python, Java SDK is available on it... etc.
Terminology
Gaia Master: The Gaia Master is a running instance of gaia launched via make or the
released Gaia binary.
Worker: A worker is a server which is connected to the Gaia Master and has
certain capabilities like, what kind of SDK it supports or what operating system
is installed on it.
Pipeline: A pipeline is a configured entity with a set of Jobs.
Job: A job is a single running task like, create a user. A pipeline can have multiple jobs.
RPC: Remote Procedure Call
Architecture Diagram
Proposed Worker Distribution Model
The proposed model which aims to solve this problem is laid out as follows.
Managing Workers
The managing of the workers will happen through a set of API endpoints.
All workers are stored in the database with a designated set of labels
assigned name and IP address.
These endpoints will be Delete / Suspend. Since adding will be taken care of
by the Gaia agent, we don't support that operation here specifically.
Delete: Delete will simply remove the server from the rotation. It won't restart
the server, or shut it down, it will just simply delete it from the database which
holds the worker instances.
Suspend: Suspending a worker will take it out of rotation but will not delete it.
Suspended this worker will not be able to run any pipelines. This is a good option
if some kind of maintenance needs to be performed on the machine.
Worker Tags
The workers will need to be tagged with what kind of resource they are providing. For example:
When a pipeline is first created in needs to set on the pipeline creation window what kind of resources it requires. These tags will need to be made accessible by a drop down list for ease of usage. These tags can be created when a Worker is created and saved to Gaia. Tagging them can be done manually on the Worker Manager screen.
The Worker RPC API
The Workers will talk to the Gaia Master via a set of defined RPC interfaces.
These are as follows:
Gaia Master - Agent
The current Gaia implementation will still hold and will be designated as Gaia Master.
The master will be a hub for the worker to connect to, get pipelines from, and report
back on the current state of the pipelines they are running.
As such, Gaia Master will no longer be solely responsible to build and distribute
binaries. Since the operation system of the worker decides in what format the binaries
will be in, the workers will build their own binaries.
Which means a worker will get a repository to pull code from and do the whole thing
that Gaia does currently. This will not involve duplicating code however, since the
whole thing will be in the
worker
package. Gaia Master will use this package bysetting worker to
localhost
.The Workers will need to have the go-plugin extracted because HashiCorp's plugin
system does not support RPC calls over the network. Just strictly localhost communication
is allowed. Pipeline execution and communication between jobs' running and state
changes are all through RPC.
Scheduling Jobs
Scheduling jobs will also have to be included into the workers. Workers will schedule
their own parallel jobs execution model and Gaia Master will have to schedule and manage
which worker to distribute pipelines to. This means that the workers will need an indicator
to define when they are too busy to accept more pipelines.
Where jobs are built
Currently, once a user initiates a pipeline build, that pipeline is saved and built on Gaia Master.
This has to change in order for the worker to be able to run the pipeline. The binary
needs to be built on the worker. However, Gaia also needs to be aware of the jobs,
and does pre-validation which means it also needs to build the pipeline.
Scenario 1:
We build the pipeline on both, the Gaia master, and the Worker. Which means we get immediate validation of the pipeline but have to duplicate the building process.
Scenario 2:
We only build on the worker and just save the pipeline on the master to track it. The validation will be deferred until it's actually built on one of the workers. This way, validation is deferred but the building process isn't duplicated.
Implementation Approach
Extract all functionality regarding running and building pipelines including
the SDK and the go-plugin facility into a worker package. This should not change
the current behavior of Gaia. All tests should still pass. Including the WebHook
capability which should be able to still just call build. The worker package should
take care of building and distributing the binary.
Create the API which handles most of the things worker related. But still don't
bother extracting it.
Create an Agent binary which calls back to master's RPC API and registers a
server as a worker.
Implement the managing of the servers below settings on the left of the admin
screen.
The text was updated successfully, but these errors were encountered: