Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change dataflow and nificluster controllers to avoid changing status on every reconciliation loop #142

Merged
merged 6 commits into from
Aug 25, 2022

Conversation

mh013370
Copy link
Member

Q A
Bug fix? yes
New feature? no
API breaks? no
Deprecations? no
Related tickets fixes #119
License Apache 2.0

What's in this PR?

Tweaks NifiCluster and NifiDataflow controllers to avoid changing their respective statuses on every reconciliation loop. Instead, the controllers will only change the status to ready/done and log any actions taken thereafter and they will respect their reconciliation interval settings (15s by default). Previously to this, the controllers would reconcile many times per second, which drastically increases the load on the control plane and on NiFi itself.

Additionally, i changed the logger timestamp encoder to user ISO8601 instead of epoch time so it is human readable.

Why?

Issue #119 does an excellent job of explaining why this is an issue.

Checklist

  • Implementation tested
  • Error handling code meets the guideline
  • Logging code meets the guideline
  • User guide and development docs updated (if needed)
  • Append changelog with changes

}

return Reconciled()
return RequeueAfter(intervalRunning)
Copy link
Member Author

@mh013370 mh013370 Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cannonpalms asked:

Can you speak to the strategy for watching child resources (namely pods) now that the NifiCluster controller is not flapping?

Prior to this change, the NifiCluster controller perpetually triggered reconciliation requests by updating its own status (Reconciling vs Running). The controller still watches all of the resources it was watching before, including all of the pods. The difference is that now the controller schedules reconciliation requests respecting its reconciliation interval of 15s -- that's the change on this line of code. Reconciliation requests will also be triggered if the NifiCluster CRD is modified by a user.

The same is true for the NifiDataflow controller.

Copy link
Contributor

@erdrix erdrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to replace the deleted state change with an event notification to keep the information, but in a better place ?

@mh013370
Copy link
Member Author

It would be nice to replace the deleted state change with an event notification to keep the information, but in a better place ?

Definitely! Let me add an event there instead

@mh013370
Copy link
Member Author

@erdrix I've made sure an event notification and log message are generated when the controllers begin & end reconciliation 👍

@r65535
Copy link
Contributor

r65535 commented Aug 24, 2022

LGTM!
@erdrix - was there a reason for adding RequeueAfter(interval / 3) to the dataflow controller here?

@erdrix
Copy link
Contributor

erdrix commented Aug 24, 2022

LGTM! @erdrix - was there a reason for adding RequeueAfter(interval / 3) to the dataflow controller here?

If so, I can't remember why...

@erdrix
Copy link
Contributor

erdrix commented Aug 24, 2022

@cannonpalms, can you confirm that this solves some of your problems?

@mh013370
Copy link
Member Author

mh013370 commented Aug 25, 2022

LGTM! @erdrix - was there a reason for adding RequeueAfter(interval / 3) to the dataflow controller here?

If so, I can't remember why...

Since the dataflow controller will now respect its interval, should we either revert the division or reset the default interval the 5 (15 / 3)? Happy to do this in a separate PR to avoid scope creep.

DataFlowRequeueInterval: util.MustConvertToInt(util.GetEnvWithDefault("DATAFLOW_REQUEUE_INTERVAL", "15"), "DATAFLOW_REQUEUE_INTERVAL"),

@mh013370 mh013370 closed this Aug 25, 2022
@mh013370 mh013370 reopened this Aug 25, 2022
@mh013370
Copy link
Member Author

mh013370 commented Aug 25, 2022

Just FYI i've deployed this branch w/ argo to a remote k8s cluster and verified that the NifiCluster and NifiDataflow resource are no longer constantly updating

image

A screenshot doesn't really prove this, but the resource resourceVersion fields are not updating unless I change the CRDs.

@erdrix erdrix merged commit 7f555fe into konpyutaika:master Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NifiCluster reconciliation loop is non-idempotent in ready state
3 participants