Proposal for indexing labels to support efficient queries by label selectors #2203

ccding · 2018-06-01T00:56:03Z

…lectors

k8s-ci-robot · 2018-06-01T00:56:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: idvoretskyi

Assign the PR to them by writing /assign @idvoretskyi in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

keps/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

idvoretskyi · 2018-06-01T09:26:06Z

/ok-to-test

smarterclayton · 2018-06-02T19:52:37Z

@kubernetes/sig-api-machinery-api-reviews

smarterclayton · 2018-06-02T19:54:46Z

keps/0012-label-index.md

+ReplicaSet, when selecting pods, need to select all the pods by namespace and
+then select the ones wanted filtering by index. While running a large number
+of pods, we observed that listing all pods and then filtering takes several
+seconds. This proposal presents solutions to improve this.


Can you describe the scenario you are hitting? With the watch cache enabled I have never seen a filtered label query take longer than 1s even on large sets, so I’d want to understand the data you have and how it could be applied.

Eventually we will have to do this. However, I don't want to do it too early because it's a lot of complexity, and I'm not sure apiserver is the right place to do it. I think it makes more sense to have the storage layer do this. This requires either adding a layer between apiserver and etcd, or implementing a bunch of features in etcd. Neither is going to happen soon. In the meantime, I don't want to do this in apiserver unless it's absolutely necessary.

Added evaluation results. PTAL

@lavalamp Adding a layer between apiserver and etcd sounds like a reasonable approach, build DB like index query into etcd seems like not that possible. So this KEP just focus on client side.

adohe-zz · 2018-06-03T05:08:47Z

keps/0012-label-index.md

+
+This proposal presents solutions to make the queries by label selectors in the
+list API efficient.
+


To make it clear, this proposal focus on client cache or apiserver watch cache?

See Proposal section 1 and 4. It focuses on client cash, while apiserver watch cache is optional.

These two things shouldn't be in the same KEP, they don't have much in common.

Removed the apiserver side cache. We will do it in the client side first.

adohe-zz · 2018-06-03T05:19:38Z

keps/0012-label-index.md

+all the index for all labels as the first step.
+
+4. `Optional` Adds index in API-server cache to improve queries directly going
+to API-server.


actually there are two things: first we need to evaluate controller side cache label index, then we could go to API-server cache.

You can already build a client-side index for isolated queries pretty easily.

Removed the apiserver side cache. We will do it in the client side first.

lavalamp · 2018-06-04T18:39:29Z

keps/0012-label-index.md

+
+### Proposal
+
+1. Adds label based index in controllers to facilitate lookup by label.


This is not really a design :)

How will this be accomplished? What components will change? How will you support all of our selector options, in particular IN?

(I'm not saying these questions are not answerable and we can't do it, I'm saying they are and this document should provide the answers)

Oh I missed the word "controllers", you can already add indexes. Which controllers do you want to change?

I am not sure if you need to have a super detailed design in the kep itself but it needs to be somewhere.

lavalamp · 2018-06-04T18:42:04Z

keps/0012-label-index.md

+Since objects are created/deleted dynamically, we also need to update the index
+efficiently.
+
+2. Calculates the count for each index and evaluate the most selective first.


So, ah, I don't want to own a query planning engine without a lot of justification. We don't add value by implementing DB functions, at least not so far. Even though this sounds like a lot of fun to write.

lavalamp · 2018-06-04T18:45:28Z

keps/0012-label-index.md

+
+2. Calculates the count for each index and evaluate the most selective first.
+
+3. Provides the ability in API for users to mark which labels to index or build


I'm against passing this complexity on to users. Or are you talking about an API for configuring controller manager (as a substitute for a flag)? Even there I'd expect this to be just on or off, not require an administrator to figure out what their expensive label selectors are.

lavalamp · 2018-06-04T18:49:21Z

keps/0012-label-index.md

+reviewers:
+  - TBD
+approvers:
+  - TBD


are these people approving the KEP or approving the PRs that result? If the former, add me & deads2k.

lavalamp · 2018-06-04T18:51:07Z

keps/0012-label-index.md

+  - TBD
+creation-date: "2018-05-31"
+last-updated: "2018-05-31"
+status: provisional


This should not be merged it its current state. It needs: 1) better justification 2) focus on apiserver or clients and 3) a more detailed mechanism design.

lavalamp · 2018-06-04T23:02:20Z

keps/0012-label-index.md

+
+* With increasing number of pods, the response time of the list operations
+  increases linearly.
+* With more than 10K pods, it takes 14 seconds to list all the pods, and 5


And with client side filtering, it will take 14 seconds in both cases? I don't understand how this scenario can measure client performance?

lavalamp · 2018-06-05T18:01:19Z

/hold

This KEP is still ambiguous in multiple places about what it wants to do clientside vs in the server. I think that should be very clear throughout before the discussion continues too much further.

k8s-ci-robot · 2018-06-16T23:05:27Z

@ccding: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Proposal for indexing labels to support efficient queries by label se…

fb4ad10

…lectors

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 1, 2018

k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 1, 2018

k8s-ci-robot requested review from calebamiles and idvoretskyi June 1, 2018 00:56

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 1, 2018

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 1, 2018

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jun 2, 2018

smarterclayton reviewed Jun 2, 2018

View reviewed changes

adohe-zz reviewed Jun 3, 2018

View reviewed changes

lavalamp reviewed Jun 4, 2018

View reviewed changes

indexing labels: remove apiserver side cache and add evaluation

5b2e671

lavalamp reviewed Jun 4, 2018

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 5, 2018

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2018

ccding closed this Jun 21, 2018

skonto mentioned this pull request Jan 14, 2020

Find the namespace where cloudflow is installed lightbend/cloudflow#104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for indexing labels to support efficient queries by label selectors #2203

Proposal for indexing labels to support efficient queries by label selectors #2203

ccding commented Jun 1, 2018 •

edited

Loading

k8s-ci-robot commented Jun 1, 2018

idvoretskyi commented Jun 1, 2018

smarterclayton commented Jun 2, 2018

smarterclayton Jun 2, 2018

lavalamp Jun 4, 2018

ccding Jun 4, 2018

adohe-zz Jun 5, 2018

adohe-zz Jun 3, 2018

ccding Jun 3, 2018

lavalamp Jun 4, 2018

ccding Jun 4, 2018

adohe-zz Jun 3, 2018

lavalamp Jun 4, 2018

ccding Jun 4, 2018

lavalamp Jun 4, 2018

lavalamp Jun 4, 2018

lavalamp Jun 4, 2018

lavalamp Jun 4, 2018

lavalamp Jun 4, 2018

lavalamp Jun 4, 2018

lavalamp Jun 4, 2018

lavalamp Jun 4, 2018

lavalamp commented Jun 5, 2018

k8s-ci-robot commented Jun 16, 2018


		This proposal presents solutions to make the queries by label selectors in the
		list API efficient.


		### Proposal

		1. Adds label based index in controllers to facilitate lookup by label.


		2. Calculates the count for each index and evaluate the most selective first.

		3. Provides the ability in API for users to mark which labels to index or build

Proposal for indexing labels to support efficient queries by label selectors #2203

Proposal for indexing labels to support efficient queries by label selectors #2203

Conversation

ccding commented Jun 1, 2018 • edited Loading

k8s-ci-robot commented Jun 1, 2018

idvoretskyi commented Jun 1, 2018

smarterclayton commented Jun 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lavalamp commented Jun 5, 2018

k8s-ci-robot commented Jun 16, 2018

ccding commented Jun 1, 2018 •

edited

Loading