-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
I would like to propose a set of changes to the teleprobe architecture that, if accepted, should allow it to scale for running multiple jobs:
- Split teleprobe-server into 2 parts: teleprobe-api (one) and teleprobe-worker (many)
- The teleprobe-api accepts requests to run a job
- A job includes a list of binaries and associated tags which identifies on which each binary should run.
- Maintains an in-memory queue of jobs and schedules them across workers.
- Is public facing and authenticates requests to run jobs
- The teleprobe-worker runs a binary and reports result and logs back to teleprobe-api.
- A worker is configured with a list of targets. Each target contains the same information as today, but with a set of tags/labels.
- At startup, each worker announces to teleprobe-api it's identity and the list of targets with tags/labels it supports.
- Workers poll the teleprobe-api for binaries to run (long-polling with timeout) and runs those binaries (can run multiple in parallel, api knows if worker is busy).
- Workers report logs/results back to the teleprobe-api
- Are not public facing and is assumed to have an internal network for accessing the teleprobe-api
- The teleprobe-api accepts requests to run a job
A further improvement could be to even split the teleprobe-api into an api and a scheduler part, allowing job information to persist across restarts, running multiple API for failover etc, but that would require introducing persistence and some form of coordination. So I consider that a future step and a natural evolution of the above should the need arise.
I'm happy to instead fork teleprobe for this capability, but it feels like a lot of overlap in the use case.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels