Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Housekeeping] Enqueue in subqueue only on actionable Pod phase #2856

Open
2 tasks done
Tracked by #2917
hamersaw opened this issue Sep 7, 2022 · 7 comments
Open
2 tasks done
Tracked by #2917

[Housekeeping] Enqueue in subqueue only on actionable Pod phase #2856

hamersaw opened this issue Sep 7, 2022 · 7 comments
Labels
good first issue Good for newcomers housekeeping Issues that help maintain flyte and keep it tech-debt free propeller Issues related to flyte propeller

Comments

@hamersaw
Copy link
Contributor

hamersaw commented Sep 7, 2022

Describe the issue

Currently the FlytePropeller workqueue contains two levels. The first is where FlytePropeller pulls workflows from the periodically evaluate. The second is used as a batching queue to ensure fast downstream node evaluations while mitigating issues caused by high frequency execution. FlytePropeller adds workflows the the subqueue when k8s resource updates are observed.

This means that everytime a Pod phase is updated we re-evaluate the workflow, which may result is many unnecessary evaluations. For example, on Pod transition from "QUEUED" to "INITIALIZING" there is nothing that FlytePropeller can do. It may make sense to only only enqueue the workflow when k8s resources reach terminal phases (ex. "FAILED" or "SUCCEEDED") and FlytePropeller can actually reschedule or schedule downstream operations.

What if we do not do this?

The subqueue may be flooded with workflows and FlytePropeller will unnecessarily evaluate the workflow, incurring additional overhead.

Related component(s)

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@hamersaw hamersaw added good first issue Good for newcomers propeller Issues related to flyte propeller housekeeping Issues that help maintain flyte and keep it tech-debt free labels Sep 7, 2022
@hamersaw hamersaw added this to the 1.3.0-candidate milestone Sep 7, 2022
@P3rcy-8685
Copy link

I would like to work on this issue @samhita-alla @hamersaw

@P3rcy-8685
Copy link

I would like to work on this issue @samhita-alla @hamersaw

I would love some guidance on this issue. I am familiar with the language but I am participating in open source projects for the first time

@daniel-shuy
Copy link
Contributor

@hamersaw Looking through the code, I see that the WorkerPool is getting the workflow namespace and name from the subqueue (https://github.com/flyteorg/flytepropeller/blob/26ad85757c57cda41bf23a2b054d49eccaa8145d/pkg/controller/workers.go#L39-L91) and calling Propeller.Handle() to reconcile the workflow.

Correct me if I'm wrong, but this seems contrary to your explanation though, as Propeller.Handle() ignores terminated workflows (https://github.com/flyteorg/flytepropeller/blob/26ad85757c57cda41bf23a2b054d49eccaa8145d/pkg/controller/handler.go#L205-L211):

	if w.GetExecutionStatus().IsTerminated() {
		if HasCompletedLabel(w) && !HasFinalizer(w) {
			logger.Debugf(ctx, "Workflow is terminated.")
			// This workflow had previously completed, let us ignore it
			return nil
		}
	}

@hamersaw
Copy link
Contributor Author

hamersaw commented Oct 31, 2022

@daniel-shuy I think I may not have been clear enough in the description. So the workqueue retrieval that you linked to is for the top-level queue. When we create the queue we create a two-tiered structure. Basically, every time we want FlytePropeller to process a worklfow it is added to the top-level queue.

Items are added to the subqueue when k8s resource changes are detected. This means that every time a Pod phase changes (ex. Initialized -> Queued) the item is added to the subqueue. Then periodically, items from the subqueue are batched and added to the top-level queue. This scheme was designed because in large workflows Pod updates can happen very frequently and we shouldn't be evaluating a workflow every few milliseconds.

So this issue was designed to further minimize the number of times FlytePropeller needs to evaluate the workflow. Now that I review this, I think we need more than terminal Pod phases, because we still want to update node phase status' dynamically. But we only need to add the workflow to the subqueue if FlytePropeller can update the workflow using the Pod phase. So phase updates from Initializing -> Queued or Queued -> Running may not be useful. This is going to require some parsing through which Pod phase are actionable. Looking at the k8s Pod plugin may be a great place to start.

@hamersaw hamersaw changed the title [Housekeeping] Enqueue in subqueue only on terminal Pod phase [Housekeeping] Enqueue in subqueue only on actionable Pod phase Oct 31, 2022
@daniel-shuy
Copy link
Contributor

@hamersaw Thanks for the detailed explanation. Looks like this is even more complicated than I expected 😅

@cosmicBboy cosmicBboy removed this from the 1.3.0-candidate milestone Jan 23, 2023
Copy link

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable.
Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Apr 11, 2024
@hamersaw
Copy link
Contributor Author

commenting to keep open.

@github-actions github-actions bot removed the stale label Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers housekeeping Issues that help maintain flyte and keep it tech-debt free propeller Issues related to flyte propeller
Projects
None yet
Development

No branches or pull requests

5 participants