-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving Sharding Downtime #806
Comments
Thanks for this recap, @priyawadhwa ! I would be a fan of trying to automate as much of this as possible instead of doing manual shard config / restart. The manual portions introduce the most downtime and also I think there is some possibility for user error down the line if the project continues to grow and scale. Could the issue with trillian trees not necessarily being chronologically ordered be fixed by Rekor code ordering them and keeping track? |
Other ideas floated in the meeting:
|
We can order trees chronologically during runtime, there's a I think we can achieve way less downtime by keeping the config, but removing the requirement for knowing the size of the inactive shard ahead of time. We can figure out the size during runtime. Sharding would look something like this:
WDYT @lkatalin ? |
@priyawadhwa I read this a couple of times, at first thinking you meant these steps would be implemented in the code (except redeployment) but now realizing that steps 1-4 are manual and 5 is implemented in code - is this right? So the change would be that instead of passing in tree sizes in the config, Rekor can calculate them at runtime, but the config is still used to specify and order the shards. I think that in itself would be a good change. My first (I think mistaken) read of your idea was that these steps would be automated and that led me to thinking about the utility of eventually having an API endpoint for sharding commands that could perform some of these functions. Then we could avoid having to do a redeployment to update the config, and instead the new config could be picked up during runtime. This would be in addition to having Rekor compute the inactive shard sizes separately from the config. Something like:
In addition to not needing to do a redeployment, the benefits would be less work for whoever is creating the config or updating the server to use it, and less room for human error while still giving the option to do overrides. I think the step you described of having Rekor get the sizes of inactive shards during runtime is more important and then this could be additional if we want to do it. There may be some problems as I'm still thinking through it but it seem beneficial to skip mandatory redeployment if possible. What do others think? |
Yep! I think that removing the tree length requirement is the minimal change we can make to get sharding to work without downtime. I'll probably do some manual testing to confirm that.
This could be really useful, we'd just need some way to make sure only authorized users could hit this endpoint! |
Update: With #810 we've pretty much removed downtime, instead there is a small race condition when the Rekor deployment is spinning down old pods (which point to the old shard) and spinning up new Pods (which point to the new shard). The risk is that the LoadBalancer would send requests to the old shard, and the new Pods may not pick up those new entries as they come up. We have gotten around this for now by scaling down the Deployment to 1 Pod for the turnover and sharding staging when there are low requests being made (which is most of the time for staging 😅 ). For a seamless experience in production I think we could do the following (none of which should require big changes):
@lkatalin WDYT? |
Regarding the proposal around seamless updates, is it rollback safe? i.e. if something goes wrong with the new Rekor deployment during the roll out or for some window after the roll out has completed, can the system quickly and easily be restored to the previous state? With the current approach, there is a small window of downtime but it does seem to have rollback safety at the cost of a small window of data loss. |
That's a good point. If we're automatically marking logs as FROZEN when they're in the sharding config, we would need to automatically make them as ACTIVE again if we rolled back. |
Re: my initial idea around marking pods as FROZEN; I think an easier way of achieving the same thing would be to mark the Rekor deployment strategy as Recreate -- all old pods would be terminated before new ones come up. The current strategy is the default Rolling Update. the nice thing about this is that if something goes wrong, we can easily run |
@priyawadhwa This is complete, correct? |
Yep! |
Right now, sharding the log requires about 10-20 minutes of downtime.
Sharding process currently looks like this:
Downtime occurs between freezing the log & updating the sharding config. We require the length of the inactive shard in the config to properly compute virtual indexes across shards.
FAQ
In this case, the inactive shard wouldn't be correctly queryable for some portion of time. That's because rekor relies on the sharding config to determine which shards to search and how to determine the "virtual index" across shards. In the period between switching over traffic and updating the sharding config, we could append entries to the log but we wouldn't be able to access entries in the frozen log -- almost all verifications would fail & get by log index would be incorrect.
The sharding config keeps track of inactive shards and for each tracks:
We'd have to guess at how to order the shards (this determines virtual log index computation). We could assume shards should be ordered chronologically, if we're comfortable with that it could reduce downtime.
The text was updated successfully, but these errors were encountered: