Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Scaling Jobs docs and jobs scaling logic #187

Closed
hmoravec opened this issue Jun 5, 2020 · 3 comments
Closed

Improve Scaling Jobs docs and jobs scaling logic #187

hmoravec opened this issue Jun 5, 2020 · 3 comments
Labels
stale All issues that are marked as stale due to inactivity triage

Comments

@hmoravec
Copy link
Contributor

hmoravec commented Jun 5, 2020

Current documentation about jobs scaling is incomplete and the jobs scaler does not behave as one would intuitively expect.

The algorithm for spawning of new jobs is not described in detail in the docs. I would expect that:

  1. When a job pulls a new task from the queue, it should dequeue it from the queue on the start.
  2. The value of metric exposed by scaler should be the length of the queue (number of tasks that have not been pulled by any job yet).
  3. After each pollingInterval KEDA spawns new jobs and their number is equal to value of the metric (length of the queue) - number of jobs that don't have running or completed status.
  4. At the same time the number of new jobs is capped by inequality number of new jobs to spawn + number of jobs without running or completed status <= maxReplicaCount.

The 3. point was proposed also in kedacore/keda#525 (comment) to work like this but it still has not been resolved. If there are already jobs in pending status, KEDA does not take them into account and spawns all jobs in the queue again during next polling which can result in huge number of jobs if e.g. the pending jobs cannot be scheduled or are waiting for new nodes to be provisioned.

Further it is not clear, in the context of jobs scaling, what is the meaning of cooldownPeriod, minReplicaCount and threshold in Prometheus trigger (and similar parameters in other triggers).

@tomkerkhove
Copy link
Member

We are working on setting better expectations around Jobs in v2.0 and think this is good input for our docs.
As far as I know this works how we are aiming for it, right @zroubalik?

@hmoravec Are you up for contributing improved docs on this front?

@stale
Copy link

stale bot commented Oct 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Oct 13, 2021
@stale
Copy link

stale bot commented Oct 20, 2021

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale All issues that are marked as stale due to inactivity triage
Projects
None yet
Development

No branches or pull requests

2 participants