Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant regression in API server due to policy code path #9368

Closed
smarterclayton opened this issue Jun 16, 2016 · 16 comments
Closed

Significant regression in API server due to policy code path #9368

smarterclayton opened this issue Jun 16, 2016 · 16 comments

Comments

@smarterclayton
Copy link
Contributor

... the bad news is that we're allocating too much in policy again :)

From test-cmd in memprofile, it looks like we are bypassing / flushing the policy cache very aggressively (in an N^2 or N^3 pattern, maybe?) and so we are allocating lots of objects. For a test-cmd run we make ~40k API calls, but we are allocating ~26.5 million objects. It looks like confirmNoEscalation is the source of the GetClusterPolicy and GetEffectivePolicyRules calls, and it does not appear to be hitting caches. Should be reproducible with the following command (will dig in tomorrow and look for anything obvious).

$ OPENSHIFT_PROFILE=mem hack/test-cmd.sh

21188269 of 77583010 total (27.31%)
Dropped 4165 nodes (cum <= 387915)
Showing top 30 nodes out of 307 (cum >= 11648542)
      flat  flat%   sum%        cum   cum%
         0     0%     0%   35049045 45.18%  runtime.goexit
         0     0%     0%   26480274 34.13%  k8s.io/kubernetes/pkg/runtime/serializer/versioning.(*codec).Decode
     70156  0.09%  0.09%   23742880 30.60%  k8s.io/kubernetes/pkg/runtime/serializer/json.(*Serializer).Decode
         0     0%  0.09%   23215417 29.92%  github.com/ugorji/go/codec.(*Decoder).Decode
         0     0%  0.09%   23215417 29.92%  github.com/ugorji/go/codec.(*Decoder).decode
         0     0%  0.09%   22511738 29.02%  github.com/ugorji/go/codec.(*Decoder).decodeI
   4215935  5.43%  5.52%   22444678 28.93%  github.com/ugorji/go/codec.(*decFnInfo).kStruct
         0     0%  5.52%   21898462 28.23%  github.com/ugorji/go/codec.(*Decoder).decodeValue
     11791 0.015%  5.54%   20675631 26.65%  k8s.io/kubernetes/pkg/storage/etcd.(*etcdHelper).bodyAndExtractObj
     22210 0.029%  5.57%   20280901 26.14%  k8s.io/kubernetes/pkg/storage/etcd.(*etcdHelper).extractObj
     26964 0.035%  5.60%   19947641 25.71%  k8s.io/kubernetes/pkg/runtime/serializer/recognizer.(*decoder).Decode
         0     0%  5.60%   19898074 25.65%  k8s.io/kubernetes/pkg/storage/etcd.(*etcdHelper).Get
         0     0%  5.60%   19840076 25.57%  k8s.io/kubernetes/pkg/registry/generic/registry.(*Store).Get
         0     0%  5.60%   19572675 25.23%  github.com/ugorji/go/codec.(*decFnInfo).kSlice
         0     0%  5.60%   19179545 24.72%  k8s.io/kubernetes/pkg/storage.(*Cacher).Get
     21270 0.027%  5.63%   17591168 22.67%  github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetEffectivePolicyRules
         0     0%  5.63%   17293989 22.29%  github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*storage).GetClusterPolicy
@deads2k
Copy link
Contributor

deads2k commented Jun 16, 2016

@smarterclayton How about holding on the re-run until I've had time to plumb through shared caches?

@smarterclayton
Copy link
Contributor Author

This seems a straight up regression, unless there's something I don't know
about the policy caches. That not the case?

On Jun 16, 2016, at 7:31 AM, David Eads notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton How about holding on
the re-run until I've had time to plumb through shared caches?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9368 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p7IwDjLdgWYj0RFMG0p-s7YP7N22ks5qMTQTgaJpZM4I3Bqk
.

@soltysh
Copy link
Member

soltysh commented Jun 16, 2016

/sub

@deads2k
Copy link
Contributor

deads2k commented Jun 16, 2016

This seems a straight up regression, unless there's something I don't know
about the policy caches. That not the case?

There are a couple spots to do with scoped tokens that bypass the cache since we didn't have clean plumbing for them. Fixing that is near the top of my list. I also want to re-wire our existing caches to use the new cache style we have.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jun 16, 2016 via email

@smarterclayton smarterclayton changed the title The good news is that I have performance numbers Significant regression in API server due to policy code path Jun 16, 2016
@timothysc
Copy link

ack /cc @mffiedler

@smarterclayton
Copy link
Contributor Author

mem.pdf

@smarterclayton
Copy link
Contributor Author

goroutine 19490 [running]:
runtime/debug.Stack(0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.6.2/libexec/src/runtime/debug/stack.go:24 +0x80
runtime/debug.PrintStack()
        /usr/local/Cellar/go/1.6.2/libexec/src/runtime/debug/stack.go:16 +0x18
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*storage).GetClusterPolicy(0xc821a50b40, 0x82405d0, 0xc82cf3d980, 0x36f5140, 0x7, 0xc82cf3d980, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterpolicy/registry.go:78 +0x2e
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.ReadOnlyClusterPolicy.Get(0x768dfa8, 0xc821a50b40, 0x36f5140, 0x7, 0xc82d9fa8c0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterpolicy/registry.go:130 +0x12a
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*ReadOnlyClusterPolicy).Get(0xc821a50e30, 0x36f5140, 0x7, 0x0, 0x0, 0x0)
        <autogenerated>:45 +0xb8
github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetRole(0xc821e61e40, 0x86880c8, 0xc82d51a900, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/find_rules.go:79 +0x13f
github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetEffectivePolicyRules(0xc821e61e40, 0x82405d0, 0xc82e482630, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/find_rules.go:136 +0x539
github.com/openshift/origin/pkg/authorization/rulevalidation.ConfirmNoEscalation(0x82405d0, 0xc82e482630, 0x0, 0x0, 0x3752620, 0xb, 0xc82d9bb390, 0xe, 0x82465b0, 0xc821e61e40, ...)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/user_covers.go:21 +0xb7
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).confirmNoEscalation(0xc821e61e80, 0x82405d0, 0xc82e482630, 0xc82ea36300, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:221 +0x400
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).updateRoleBinding(0xc821e61e80, 0x82405d0, 0xc82e482630, 0x72c1a10, 0xc82ea36300, 0x1000, 0x1000, 0xc82236e000, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:179 +0x4ad
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).Update(0xc821e61e80, 0x82405d0, 0xc82e482630, 0x72c1a10, 0xc82ea36300, 0x0, 0x0, 0xac3e78, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:154 +0x82
github.com/openshift/origin/pkg/authorization/registry/clusterrolebinding/proxy.(*ClusterRoleBindingStorage).Update(0xc821e61e80, 0x82405d0, 0xc82e482630, 0x72c1c90, 0xc82ea36180, 0x0, 0x0, 0xc828808ec8, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterrolebinding/proxy/proxy.go:89 +0xf5
k8s.io/kubernetes/pkg/apiserver.UpdateResource.func1.1(0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:682 +0xac
k8s.io/kubernetes/pkg/apiserver.finishRequest.func1(0xc827b75c20, 0xc82cc42f80, 0xc827b75bc0, 0xc827b75b60)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:918 +0xd9
created by k8s.io/kubernetes/pkg/apiserver.finishRequest
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:923 +0xf1

@smarterclayton
Copy link
Contributor Author

This is @ 7c4fc5a

@smarterclayton
Copy link
Contributor Author

goroutine 3091 [running]:
runtime/debug.Stack(0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.6.2/libexec/src/runtime/debug/stack.go:24 +0x80
runtime/debug.PrintStack()
        /usr/local/Cellar/go/1.6.2/libexec/src/runtime/debug/stack.go:16 +0x18
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*storage).GetClusterPolicy(0xc821a50b40, 0x82405d0, 0xc826ee25a0, 0x36f5140, 0x7, 0xc826ee25a0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterpolicy/registry.go:78 +0x2e
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.ReadOnlyClusterPolicy.Get(0x768dfa8, 0xc821a50b40, 0x36f5140, 0x7, 0xc827082910, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterpolicy/registry.go:130 +0x12a
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*ReadOnlyClusterPolicy).Get(0xc821a50e30, 0x36f5140, 0x7, 0x0, 0x0, 0x0)
        <autogenerated>:45 +0xb8
github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetRole(0xc821e61e40, 0x86880c8, 0xc823b14480, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/find_rules.go:79 +0x13f
github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetEffectivePolicyRules(0xc821e61e40, 0x82405d0, 0xc824c089c0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/find_rules.go:136 +0x539
github.com/openshift/origin/pkg/authorization/rulevalidation.ConfirmNoEscalation(0x82405d0, 0xc824c089c0, 0x0, 0x0, 0x3752620, 0xb, 0xc8255cc600, 0x1c, 0x82465b0, 0xc821e61e40, ...)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/user_covers.go:21 +0xb7
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).confirmNoEscalation(0xc821e61e80, 0x82405d0, 0xc824c089c0, 0xc824df6d80, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:221 +0x400
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).createRoleBinding(0xc821e61e80, 0x82405d0, 0xc824c089c0, 0x72c1a10, 0xc824df6d80, 0x82e7b00, 0x12f78, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:127 +0x1ba
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).Create(0xc821e61e80, 0x82405d0, 0xc824c089c0, 0x72c1a10, 0xc824df6d80, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:109 +0x7c
github.com/openshift/origin/pkg/authorization/registry/clusterrolebinding/proxy.(*ClusterRoleBindingStorage).Create(0xc821e61e80, 0x82405d0, 0xc824c089c0, 0x72c1c90, 0xc824df6000, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterrolebinding/proxy/proxy.go:77 +0xef
k8s.io/kubernetes/pkg/apiserver.(*namedCreaterAdapter).Create(0xc821440190, 0x82405d0, 0xc824c089c0, 0x0, 0x0, 0x72c1c90, 0xc824df6000, 0x0, 0x0, 0x0, ...)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:439 +0x93
k8s.io/kubernetes/pkg/apiserver.createHandler.func1.1(0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:402 +0xc8
k8s.io/kubernetes/pkg/apiserver.finishRequest.func1(0xc820f728a0, 0xc82787ceb0, 0xc820f72840, 0xc820f72720)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:918 +0xd9
created by k8s.io/kubernetes/pkg/apiserver.finishRequest
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:923 +0xf1

which I think is the same, is the bulk of them. It looks like we make ~1-3 per mutation call.

@deads2k
Copy link
Contributor

deads2k commented Jun 24, 2016

which I think is the same, is the bulk of them. It looks like we make ~1-3 per mutation call.

How easy/hard is it to get the comparison mem chart for 1.2? Do you have a script?

Given that its the GetEffectivePolicyRules call, I'd suspect this copy: https://github.com/openshift/origin/blob/master/pkg/authorization/rulevalidation/find_rules.go#L130-L145, but its been around forever. I can refactor it to [][]PolicyRule and avoid copying all the rules every request.

@smarterclayton
Copy link
Contributor Author

I think the command has worked since 1.0.

On Jun 24, 2016, at 7:54 AM, David Eads notifications@github.com wrote:

which I think is the same, is the bulk of them. It looks like we make ~1-3
per mutation call.

How easy/hard is it to get the comparison mem chart for 1.2? Do you have a
script?

Given that its the GetEffectivePolicyRules call, I'd suspect this copy:
https://github.com/openshift/origin/blob/master/pkg/authorization/rulevalidation/find_rules.go#L130-L145,
but its been around forever. I can refactor it to [][]PolicyRule and avoid
copying all the rules every request.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#9368 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p_nvo81TIs37pqBsR7g6MMEXBceHks5qO8VQgaJpZM4I3Bqk
.

@deads2k
Copy link
Contributor

deads2k commented Jul 20, 2016

Clayton thinks he's got this.

@smarterclayton
Copy link
Contributor Author

Almost resolved with #9814

@deads2k
Copy link
Contributor

deads2k commented Jul 25, 2016

Almost resolved with #9814

Wasn't that pre-existing?

@deads2k deads2k closed this as completed Jul 25, 2016
@deads2k deads2k reopened this Jul 25, 2016
@smarterclayton
Copy link
Contributor Author

The fix in there fixes conversions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants