Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce cached client behind feature flag #1037

Merged
merged 3 commits into from Jun 3, 2021

Conversation

djwhatle
Copy link
Contributor

@djwhatle djwhatle commented Mar 28, 2021

Summary

  • Makes mig-controller go BRRR
  • Reintroduces remote cluster cache behind feature flag ENABLE_CACHED_CLIENT
    • Each remote cluster gets its own manager with cache
    • On Create and Update, blocks until the cache catches up

Complement to operator PR: migtools/mig-operator#675

To test

# Local
ENABLE_CACHED_CLIENT=true make run-fast

Open (Potential) Problems:

  • Need to add safe map access to the stored list of clients.

  • Need to properly handle teardown and recreation of client in shared map when a migcluster credential set changes

    • Need a graceful strategy for dealing with bad creds being given to the cache manager due to bad MigCluster creds (this may be handled by current logic)
  • Need to properly destroy manager which holds the cache when a migcluster is removed.

    • This removal must be safe such that we don't crash due to someone else using the manager client as we destroy it
  • Need to make sure that update calls are blocking until receipt of update is present in our cache (e.g. a GET shows the expected resourceVersion)

  • Need to add a while polling for cache to populate after a create or update (for a rare race condition I'm imagining where we create something but something else deletes, we wouldn't want to hang forever waiting, maybe 10 second max)

pkg/compat/client.go Outdated Show resolved Hide resolved
@pranavgaikwad
Copy link
Contributor

I used this PR and I would just like to post the results here.

  1. Before this PR, RunRsyncOperations phase in DVM took about 10-15 seconds to run once:

Screenshot from 2021-03-31 13-05-20

  1. After this PR, the same phase takes a few seconds (worst case) to run once:

Screenshot from 2021-03-31 13-06-04

@djwhatle djwhatle force-pushed the safe-cache branch 2 times, most recently from 28fb419 to 4613a9b Compare April 21, 2021 20:20
@djwhatle djwhatle changed the title Safe cached client Safe cached client - wait for cache consistency May 10, 2021
@djwhatle djwhatle changed the title Safe cached client - wait for cache consistency Safe cached client May 10, 2021
@djwhatle djwhatle marked this pull request as ready for review May 19, 2021 23:08
@djwhatle djwhatle changed the title Safe cached client Introduce cached client behind feature flag May 19, 2021
Copy link
Contributor

@pranavgaikwad pranavgaikwad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I will wait for operator PR for the feature flag.

@djwhatle
Copy link
Contributor Author

Need to rebase. Doing it now.

@djwhatle
Copy link
Contributor Author

Squashed to make the rebase easier.

Save client

It is alive

Checkpoint

Pre cleanup

Log name of migcluster starting remote watch for

Filter list of events with exact watch to allow for cache usage

Start transitioning event logging to keying on UID

clientMap access safety

add timout on spinlock

add consistency check for remote watch

add all clients to the shared client map to avoid reconstruction

change spin lock implementation

improve cache consistency logic for clients

Rename vars related to cached client

Pre-refactor to separate client names

Reorganize getClient

Cleaning up helper code checking for cache population

Fix event indexer

Tested and working

Add switch to enable / disable cached client

Cleanup compat client, return errors from helpers

Clean up redundant comment

Remove stunnel.go from bad rebase

Put all cached client logic inside feature gate in migcluster_types

Fix log statement with mismatched kv pairs
@djwhatle
Copy link
Contributor Author

djwhatle commented Jun 3, 2021

Just got the rebase finished. Need to do a bit of testing to verify that everything is working as expected.

Copy link
Contributor

@pranavgaikwad pranavgaikwad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sseago sseago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manager stop changes look sane to me.

@djwhatle djwhatle merged commit 3869e3a into migtools:master Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants