Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build A MVP Persistent Caching Proxy #342

Closed
9 of 18 tasks
Tracked by #1225
sttts opened this issue Dec 21, 2021 · 10 comments
Closed
9 of 18 tasks
Tracked by #1225

Build A MVP Persistent Caching Proxy #342

sttts opened this issue Dec 21, 2021 · 10 comments
Labels
area/sharding Issues or PRs related to sharding changes kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@sttts
Copy link
Member

sttts commented Dec 21, 2021

Certain data in a kcp cluster is critical for operation and SPOF in a multi-region/AZ environment, e.g. org workspaces. It is feasible to build a consistent cache hierarchy which

  1. persists objects to a local etcd or
  2. keep data in memory if one makes the proxies highly available through multiple instances
    and which serve consistent data by checking freshness of the cache on quorum reads. Using such a setup could give read-only availability to e.g. org workspaces.

Big question: do we run into time-travel problems as we do with pods and kubelets. So we must be super careful when answering consistent reads with potentially stale data in an outage situation. But for certain operation like the personal workspace virtual workspace from @davidfestal, a stale read is good enough.

Acceptance Criteria

We're just looking for an MVP implementation here:

  • shards push data to the proxy
  • no auth{n,z} needed
  • APIExport + APIResourceSchema hard-coded as types to push
  • dynamically determine which types to serve. For example, the built-in types (APIExports and APIResourceSchema) could be automatically served as CRDs.
  • turn on the watch cache in the cache server
  • have secondary informers in kcp reconcilers use the cache server

Items

authorization

  • shards should only be able to write their shard partition of the cache

data removal

  • unused resources could be automatically removed

shard management, a few loose ideas:

what will decide when a new shard needs to be created?
which cluster it gets deployed to?
making other shards aware of the new shard?

on kcp server

  • wire informer both ways this includes a controller to cope with not-yet-synced informers
  • start the second informer (aka. TemporaryRootShardKcpSharedInformerFactory) in a dedicated post-start-hook
@sttts sttts added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 21, 2021
@stevekuznetsov
Copy link
Contributor

2 seems quite a lot harder than 1 FWIW

@sttts
Copy link
Member Author

sttts commented Jan 5, 2022

The freshness problem we have either way. So I don't think one or the other is harder. The first is reboot safe, the second is operationally simpler.

@ncdc ncdc added this to the Prototype 4 milestone Feb 23, 2022
@ncdc ncdc removed this from the v0.4.0 milestone Apr 13, 2022
@ncdc
Copy link
Member

ncdc commented Apr 13, 2022

Clearing milestone to re-triage

@ncdc ncdc added this to the v0.6.0 milestone May 31, 2022
@sttts
Copy link
Member Author

sttts commented Jun 14, 2022

Part of multi-release epic #1225

@stevekuznetsov
Copy link
Contributor

@p0lyn0mial FYI for tracking, this is the issue for implementing the cache server, would be cool to link PRs to it as new ones come in

@stevekuznetsov stevekuznetsov changed the title Consistent caching proxy Build A MVP Persistent Caching Proxy Sep 16, 2022
@ncdc ncdc modified the milestones: v0.9, v0.10 Oct 5, 2022
@ncdc
Copy link
Member

ncdc commented Dec 5, 2022

@p0lyn0mial @sttts is this going to be completed this week for v0.10? It looks like we still have several outstanding items?

@p0lyn0mial
Copy link
Contributor

@p0lyn0mial @sttts is this going to be completed this week for v0.10? It looks like we still have several outstanding items?

we need to finish the workspace refactoring before we can finish this feature.

@ncdc ncdc modified the milestones: v0.10, v0.11 Dec 6, 2022
@p0lyn0mial
Copy link
Contributor

p0lyn0mial commented Feb 22, 2023

I think we can close this issue. Not all items have been implemented but it is unclear to me if we will use the cache server in the long-run. Perhaps it will be replaced by CRDB.

@ncdc
Copy link
Member

ncdc commented Feb 22, 2023

👍
/close

@openshift-ci openshift-ci bot closed this as completed Feb 22, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 22, 2023

@ncdc: Closing this issue.

In response to this:

👍
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sharding Issues or PRs related to sharding changes kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: Done
Development

No branches or pull requests

4 participants