Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LIST starting from a key #66981

Closed
caesarxuchao opened this issue Aug 4, 2018 · 10 comments

Comments

@caesarxuchao
Copy link
Member

commented Aug 4, 2018

/kind feature
/sig api-machinery

@lavalamp @deads2k do you think this feature request needs a KEP, or can we view it as completing kubernetes/community#896?

Motivation

Kubernetes supports API chunking on LIST requests. It allows a large LIST response to be broken into smaller, consistent partial responses. Consistency here means that all the partial responses is for a single resource version. However, apiservers instruct etcd to delete old versions every 5 minutes, so if a client cannot complete all the chunks in 5 minutes, the late requests will fail with “resource version too old” error.

As listed in the API chunking proposal, some clients care more about getting a lexically complete list than getting a consistent list. That is, when a “resource version too old” error happens, the client wants to continue the list from the next key (in kubernetes case, keys are lexically ordered), with the latest resource version. Such clients include kubectl get and migration style commands (e.g., those rewrite data in etcd from json to proto; or convert the Kubernetes API version of the data in etcd).

Proposal

Adding a BeginningName and a BeginningNamespace to the ListOptions. They can be used with or without chunking.

ListOptions {
	…
	BeginningNamespace string
	BeginningName string
}

Let's walk through a few cases:

1. Path: /api/v1/pods

ListOptions {
	// Required, i.e., cannot be “”.
	BeginningNamespace: “staging”,
	// Required
	BeginningName: “a”,
}

Results: apiserver starts listing pods whose key in etcd is lexically greater than “staging/a”


2. Path: /api/v1/namespaces/{namespace}/pods

ListOptions {
	// Required, must match the namespace in the path.
	BeginningNamespace: “staging”,
	// Required
	BeginningName: “a”,
}

Results: apiserver starts listing pods whose key in etcd is prefixed with “staging/” and lexically greater than “staging/a”.

3. Path: /api/v1/nodes (nodes are cluster-scoped):

ListOptions {
	// Must be “”
	BeginningNamespace: “”,
	// Required
	BeginningName: “a”,
}

Results: apiserver starts listing nodes whose key in etcd is lexically greater than “a”

4. Path: N/A

ListOptions: {
	Limit: 50,
	Continue: “some_value”, 
	// Must be empty
	BeginningNamespace: “”,
	// Must be empty
	BeginningName: “”,
}

Results: apiserver behaves the same as it currently does, sending back results based on the continue token.

5. Path: N/A

ListOptions: {
	Limit: 50
	Continue: “”, 
	BeginningNamespace: “staging”,
	BeginningName: “a”,
}

Results: apiserver returns no more than 50 elements whose etcd key is lexically greater than “staging/a”. 
If there are more elements, apiserver sends back a continue token. To continue the consistent chunking, the client should continue with

ListOptions: {
	Limit: 50
	Continue: <token sent back by the apiserver>, 
	// Must be empty	
	BeginningNamespace: “”,
	// Must be empty
	BeginningName: “”,
}

Security implications

Currently Kubernetes supports two kind of LIST access control:

  1. Giving a user permissions to list a resource in all namespaces
  2. Giving a user permissions to list a resource in specific namespaces

For 1, the user already has the highest possible permission.
For 2, because “BeginningNamespace” must match the namespace in path, the “list from a key” feature doesn’t break today’s security model.

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Aug 4, 2018

cc @cheftako @jpbetz this is the feature we discussed for storage encoded version migration.

@fedebongio

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2018

/assign @lavalamp

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

in kubernetes case, keys are lexically ordered

is that actually a guarantee of our API, or just an artifact of the current etcd impl? promoting starting namespace/name into ListOptions would certainly constrain any future storage implementations to be ordered this way (and to be able to start cross-namespace traversals at arbitrary points)

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

rather than expose the startingName/startingNamespace options, would it make more sense to optionally augment the returned 410 error with a continue token the client can use to continue traversing without respect to resourceVersion? I like that for a few reasons:

  • keeps the existing list/continue flow as-is
  • signals in the error whether the server supports continuing from an arbitrary point (rather than sending new ListOptions we don't know if the server supports)
  • avoids growing the surface area of ListOptions in ways that constrain storage implementations
@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2018

Returning continue token with 410 error sounds good. The token can be hash(next key, resourceVersion=0). It's sufficient for the kubectl get and migration use cases. @lavalamp wdyt?

@lavalamp

This comment has been minimized.

Copy link
Member

commented Aug 7, 2018

in kubernetes case, keys are lexically ordered

is that actually a guarantee of our API, or just an artifact of the current etcd impl? ...

It is definitely the latter, and we'd like to eventually store by UID (with name lookup supported via an index). It happens that @caesarxuchao's use case doesn't actually care what they're sorted by as long as it can start somewhere in the middle, but that's hard to express in an API.

So for that reason, I think issuing a new continue token when an old one hits an out-of-window error is a good idea.

@lavalamp

This comment has been minimized.

Copy link
Member

commented Aug 7, 2018

Additional thought on the API proposed:

This is trying to encode a database query. It's not a standard or extensible approach (e.g. we wouldn't want to see 100 different queries done in the api like this). I also don't think we're ready to tackle a general query system. So, for these reasons, I'm not in favor of this API.

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2018

Thanks for the inputs. Issuing a continue token that doesn't care about resourceVersion is a small code change. I don't think that requires a KEP. I'll send a PR.

@lavalamp

This comment has been minimized.

Copy link
Member

commented Aug 7, 2018

@smarterclayton

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2018

I'm ok with this.

BenTheElder pushed a commit to BenTheElder/kubernetes that referenced this issue Sep 1, 2018
Kubernetes Submit Queue
Automatic merge from submit-queue (batch tested with PRs 67571, 67284, 66835, 68096, 68152). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

apiserver returns continue together with the 410 error

Implements kubernetes#66981 (comment).

Closes kubernetes#66981.

/sig api-machinery
/assign @lavalamp @liggitt @smarterclayton 

```release-note
Upon receiving a LIST request with expired continue token, the apiserver now returns a continue token together with the 410 "the from parameter is too old " error. If the client does not care about getting a list from a consistent snapshot, the client can use this token to continue listing from the next key, but the returned chunk will be from the latest snapshot.
```
k8s-publishing-bot added a commit to kubernetes/apimachinery that referenced this issue Sep 1, 2018
Automatic merge from submit-queue (batch tested with PRs 67571, 67284, 66835, 68096, 68152). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

apiserver returns continue together with the 410 error

Implements kubernetes/kubernetes#66981 (comment).

Closes #66981.

/sig api-machinery
/assign @lavalamp @liggitt @smarterclayton 

```release-note
Upon receiving a LIST request with expired continue token, the apiserver now returns a continue token together with the 410 "the from parameter is too old " error. If the client does not care about getting a list from a consistent snapshot, the client can use this token to continue listing from the next key, but the returned chunk will be from the latest snapshot.
```

Kubernetes-commit: 5b916f8b02f6cb7255bffa497000684f9ecd2dd5
k8s-publishing-bot added a commit to kubernetes/apiserver that referenced this issue Sep 1, 2018
Automatic merge from submit-queue (batch tested with PRs 67571, 67284, 66835, 68096, 68152). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

apiserver returns continue together with the 410 error

Implements kubernetes/kubernetes#66981 (comment).

Closes #66981.

/sig api-machinery
/assign @lavalamp @liggitt @smarterclayton

```release-note
Upon receiving a LIST request with expired continue token, the apiserver now returns a continue token together with the 410 "the from parameter is too old " error. If the client does not care about getting a list from a consistent snapshot, the client can use this token to continue listing from the next key, but the returned chunk will be from the latest snapshot.
```

Kubernetes-commit: 5b916f8b02f6cb7255bffa497000684f9ecd2dd5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.