Significant regression in API server due to policy code path #9368

smarterclayton · 2016-06-16T04:58:46Z

... the bad news is that we're allocating too much in policy again :)

From test-cmd in memprofile, it looks like we are bypassing / flushing the policy cache very aggressively (in an N^2 or N^3 pattern, maybe?) and so we are allocating lots of objects. For a test-cmd run we make ~40k API calls, but we are allocating ~26.5 million objects. It looks like confirmNoEscalation is the source of the GetClusterPolicy and GetEffectivePolicyRules calls, and it does not appear to be hitting caches. Should be reproducible with the following command (will dig in tomorrow and look for anything obvious).

$ OPENSHIFT_PROFILE=mem hack/test-cmd.sh

21188269 of 77583010 total (27.31%)
Dropped 4165 nodes (cum <= 387915)
Showing top 30 nodes out of 307 (cum >= 11648542)
      flat  flat%   sum%        cum   cum%
         0     0%     0%   35049045 45.18%  runtime.goexit
         0     0%     0%   26480274 34.13%  k8s.io/kubernetes/pkg/runtime/serializer/versioning.(*codec).Decode
     70156  0.09%  0.09%   23742880 30.60%  k8s.io/kubernetes/pkg/runtime/serializer/json.(*Serializer).Decode
         0     0%  0.09%   23215417 29.92%  github.com/ugorji/go/codec.(*Decoder).Decode
         0     0%  0.09%   23215417 29.92%  github.com/ugorji/go/codec.(*Decoder).decode
         0     0%  0.09%   22511738 29.02%  github.com/ugorji/go/codec.(*Decoder).decodeI
   4215935  5.43%  5.52%   22444678 28.93%  github.com/ugorji/go/codec.(*decFnInfo).kStruct
         0     0%  5.52%   21898462 28.23%  github.com/ugorji/go/codec.(*Decoder).decodeValue
     11791 0.015%  5.54%   20675631 26.65%  k8s.io/kubernetes/pkg/storage/etcd.(*etcdHelper).bodyAndExtractObj
     22210 0.029%  5.57%   20280901 26.14%  k8s.io/kubernetes/pkg/storage/etcd.(*etcdHelper).extractObj
     26964 0.035%  5.60%   19947641 25.71%  k8s.io/kubernetes/pkg/runtime/serializer/recognizer.(*decoder).Decode
         0     0%  5.60%   19898074 25.65%  k8s.io/kubernetes/pkg/storage/etcd.(*etcdHelper).Get
         0     0%  5.60%   19840076 25.57%  k8s.io/kubernetes/pkg/registry/generic/registry.(*Store).Get
         0     0%  5.60%   19572675 25.23%  github.com/ugorji/go/codec.(*decFnInfo).kSlice
         0     0%  5.60%   19179545 24.72%  k8s.io/kubernetes/pkg/storage.(*Cacher).Get
     21270 0.027%  5.63%   17591168 22.67%  github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetEffectivePolicyRules
         0     0%  5.63%   17293989 22.29%  github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*storage).GetClusterPolicy

The text was updated successfully, but these errors were encountered:

deads2k · 2016-06-16T11:31:29Z

@smarterclayton How about holding on the re-run until I've had time to plumb through shared caches?

smarterclayton · 2016-06-16T12:50:15Z

This seems a straight up regression, unless there's something I don't know
about the policy caches. That not the case?

On Jun 16, 2016, at 7:31 AM, David Eads notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton How about holding on
the re-run until I've had time to plumb through shared caches?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9368 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p7IwDjLdgWYj0RFMG0p-s7YP7N22ks5qMTQTgaJpZM4I3Bqk
.

soltysh · 2016-06-16T12:54:01Z

/sub

deads2k · 2016-06-16T12:57:03Z

This seems a straight up regression, unless there's something I don't know
about the policy caches. That not the case?

There are a couple spots to do with scoped tokens that bypass the cache since we didn't have clean plumbing for them. Fixing that is near the top of my list. I also want to re-wire our existing caches to use the new cache style we have.

smarterclayton · 2016-06-16T13:35:28Z

Ok. Will keep this as a high priority so we don't lose sight. We should make sure performance knows not to start testing until we land this. @timothysc

timothysc · 2016-06-16T17:27:15Z

ack /cc @mffiedler

smarterclayton · 2016-06-23T20:48:30Z

mem.pdf

smarterclayton · 2016-06-23T20:51:45Z

goroutine 19490 [running]:
runtime/debug.Stack(0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.6.2/libexec/src/runtime/debug/stack.go:24 +0x80
runtime/debug.PrintStack()
        /usr/local/Cellar/go/1.6.2/libexec/src/runtime/debug/stack.go:16 +0x18
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*storage).GetClusterPolicy(0xc821a50b40, 0x82405d0, 0xc82cf3d980, 0x36f5140, 0x7, 0xc82cf3d980, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterpolicy/registry.go:78 +0x2e
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.ReadOnlyClusterPolicy.Get(0x768dfa8, 0xc821a50b40, 0x36f5140, 0x7, 0xc82d9fa8c0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterpolicy/registry.go:130 +0x12a
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*ReadOnlyClusterPolicy).Get(0xc821a50e30, 0x36f5140, 0x7, 0x0, 0x0, 0x0)
        <autogenerated>:45 +0xb8
github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetRole(0xc821e61e40, 0x86880c8, 0xc82d51a900, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/find_rules.go:79 +0x13f
github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetEffectivePolicyRules(0xc821e61e40, 0x82405d0, 0xc82e482630, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/find_rules.go:136 +0x539
github.com/openshift/origin/pkg/authorization/rulevalidation.ConfirmNoEscalation(0x82405d0, 0xc82e482630, 0x0, 0x0, 0x3752620, 0xb, 0xc82d9bb390, 0xe, 0x82465b0, 0xc821e61e40, ...)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/user_covers.go:21 +0xb7
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).confirmNoEscalation(0xc821e61e80, 0x82405d0, 0xc82e482630, 0xc82ea36300, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:221 +0x400
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).updateRoleBinding(0xc821e61e80, 0x82405d0, 0xc82e482630, 0x72c1a10, 0xc82ea36300, 0x1000, 0x1000, 0xc82236e000, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:179 +0x4ad
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).Update(0xc821e61e80, 0x82405d0, 0xc82e482630, 0x72c1a10, 0xc82ea36300, 0x0, 0x0, 0xac3e78, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:154 +0x82
github.com/openshift/origin/pkg/authorization/registry/clusterrolebinding/proxy.(*ClusterRoleBindingStorage).Update(0xc821e61e80, 0x82405d0, 0xc82e482630, 0x72c1c90, 0xc82ea36180, 0x0, 0x0, 0xc828808ec8, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterrolebinding/proxy/proxy.go:89 +0xf5
k8s.io/kubernetes/pkg/apiserver.UpdateResource.func1.1(0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:682 +0xac
k8s.io/kubernetes/pkg/apiserver.finishRequest.func1(0xc827b75c20, 0xc82cc42f80, 0xc827b75bc0, 0xc827b75b60)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:918 +0xd9
created by k8s.io/kubernetes/pkg/apiserver.finishRequest
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:923 +0xf1

smarterclayton · 2016-06-23T20:52:02Z

This is @ 7c4fc5a

smarterclayton · 2016-06-23T20:58:19Z

goroutine 3091 [running]:
runtime/debug.Stack(0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.6.2/libexec/src/runtime/debug/stack.go:24 +0x80
runtime/debug.PrintStack()
        /usr/local/Cellar/go/1.6.2/libexec/src/runtime/debug/stack.go:16 +0x18
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*storage).GetClusterPolicy(0xc821a50b40, 0x82405d0, 0xc826ee25a0, 0x36f5140, 0x7, 0xc826ee25a0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterpolicy/registry.go:78 +0x2e
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.ReadOnlyClusterPolicy.Get(0x768dfa8, 0xc821a50b40, 0x36f5140, 0x7, 0xc827082910, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterpolicy/registry.go:130 +0x12a
github.com/openshift/origin/pkg/authorization/registry/clusterpolicy.(*ReadOnlyClusterPolicy).Get(0xc821a50e30, 0x36f5140, 0x7, 0x0, 0x0, 0x0)
        <autogenerated>:45 +0xb8
github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetRole(0xc821e61e40, 0x86880c8, 0xc823b14480, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/find_rules.go:79 +0x13f
github.com/openshift/origin/pkg/authorization/rulevalidation.(*DefaultRuleResolver).GetEffectivePolicyRules(0xc821e61e40, 0x82405d0, 0xc824c089c0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/find_rules.go:136 +0x539
github.com/openshift/origin/pkg/authorization/rulevalidation.ConfirmNoEscalation(0x82405d0, 0xc824c089c0, 0x0, 0x0, 0x3752620, 0xb, 0xc8255cc600, 0x1c, 0x82465b0, 0xc821e61e40, ...)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/rulevalidation/user_covers.go:21 +0xb7
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).confirmNoEscalation(0xc821e61e80, 0x82405d0, 0xc824c089c0, 0xc824df6d80, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:221 +0x400
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).createRoleBinding(0xc821e61e80, 0x82405d0, 0xc824c089c0, 0x72c1a10, 0xc824df6d80, 0x82e7b00, 0x12f78, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:127 +0x1ba
github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased.(*VirtualStorage).Create(0xc821e61e80, 0x82405d0, 0xc824c089c0, 0x72c1a10, 0xc824df6d80, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/rolebinding/policybased/virtual_storage.go:109 +0x7c
github.com/openshift/origin/pkg/authorization/registry/clusterrolebinding/proxy.(*ClusterRoleBindingStorage).Create(0xc821e61e80, 0x82405d0, 0xc824c089c0, 0x72c1c90, 0xc824df6000, 0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/pkg/authorization/registry/clusterrolebinding/proxy/proxy.go:77 +0xef
k8s.io/kubernetes/pkg/apiserver.(*namedCreaterAdapter).Create(0xc821440190, 0x82405d0, 0xc824c089c0, 0x0, 0x0, 0x72c1c90, 0xc824df6000, 0x0, 0x0, 0x0, ...)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:439 +0x93
k8s.io/kubernetes/pkg/apiserver.createHandler.func1.1(0x0, 0x0, 0x0, 0x0)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:402 +0xc8
k8s.io/kubernetes/pkg/apiserver.finishRequest.func1(0xc820f728a0, 0xc82787ceb0, 0xc820f72840, 0xc820f72720)
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:918 +0xd9
created by k8s.io/kubernetes/pkg/apiserver.finishRequest
        /Users/clayton/projects/origin/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/apiserver/resthandler.go:923 +0xf1

which I think is the same, is the bulk of them. It looks like we make ~1-3 per mutation call.

deads2k · 2016-06-24T11:53:50Z

which I think is the same, is the bulk of them. It looks like we make ~1-3 per mutation call.

How easy/hard is it to get the comparison mem chart for 1.2? Do you have a script?

Given that its the GetEffectivePolicyRules call, I'd suspect this copy: https://github.com/openshift/origin/blob/master/pkg/authorization/rulevalidation/find_rules.go#L130-L145, but its been around forever. I can refactor it to [][]PolicyRule and avoid copying all the rules every request.

smarterclayton · 2016-06-24T14:20:18Z

I think the command has worked since 1.0.

On Jun 24, 2016, at 7:54 AM, David Eads notifications@github.com wrote:

which I think is the same, is the bulk of them. It looks like we make ~1-3
per mutation call.

How easy/hard is it to get the comparison mem chart for 1.2? Do you have a
script?

Given that its the GetEffectivePolicyRules call, I'd suspect this copy:
https://github.com/openshift/origin/blob/master/pkg/authorization/rulevalidation/find_rules.go#L130-L145,
but its been around forever. I can refactor it to [][]PolicyRule and avoid
copying all the rules every request.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#9368 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p_nvo81TIs37pqBsR7g6MMEXBceHks5qO8VQgaJpZM4I3Bqk
.

deads2k · 2016-07-20T11:33:12Z

Clayton thinks he's got this.

smarterclayton · 2016-07-25T17:43:03Z

Almost resolved with #9814

deads2k · 2016-07-25T18:23:56Z

Almost resolved with #9814

Wasn't that pre-existing?

smarterclayton · 2016-07-25T18:43:43Z

The fix in there fixes conversions.

smarterclayton added the area/performance label Jun 16, 2016

smarterclayton assigned deads2k Jun 16, 2016

danmcp added component/auth priority/P2 labels Jun 16, 2016

smarterclayton added priority/P1 and removed priority/P2 labels Jun 16, 2016

smarterclayton changed the title ~~The good news is that I have performance numbers~~ Significant regression in API server due to policy code path Jun 16, 2016

deads2k mentioned this issue Jun 20, 2016

Collapse authorization onto SharedInformer caches #9442

Merged

smarterclayton mentioned this issue Jul 14, 2016

Avoid allocations while checking role bindings #9836

Merged

deads2k assigned smarterclayton and unassigned deads2k Jul 20, 2016

deads2k closed this as completed Jul 25, 2016

deads2k reopened this Jul 25, 2016

smarterclayton closed this as completed Jul 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant regression in API server due to policy code path #9368

Significant regression in API server due to policy code path #9368

smarterclayton commented Jun 16, 2016

deads2k commented Jun 16, 2016

smarterclayton commented Jun 16, 2016

soltysh commented Jun 16, 2016

deads2k commented Jun 16, 2016

smarterclayton commented Jun 16, 2016 via email

timothysc commented Jun 16, 2016

smarterclayton commented Jun 23, 2016

smarterclayton commented Jun 23, 2016

smarterclayton commented Jun 23, 2016

smarterclayton commented Jun 23, 2016

deads2k commented Jun 24, 2016

smarterclayton commented Jun 24, 2016

deads2k commented Jul 20, 2016

smarterclayton commented Jul 25, 2016

deads2k commented Jul 25, 2016

smarterclayton commented Jul 25, 2016

Significant regression in API server due to policy code path #9368

Significant regression in API server due to policy code path #9368

Comments

smarterclayton commented Jun 16, 2016

deads2k commented Jun 16, 2016

smarterclayton commented Jun 16, 2016

soltysh commented Jun 16, 2016

deads2k commented Jun 16, 2016

smarterclayton commented Jun 16, 2016 via email

timothysc commented Jun 16, 2016

smarterclayton commented Jun 23, 2016

smarterclayton commented Jun 23, 2016

smarterclayton commented Jun 23, 2016

smarterclayton commented Jun 23, 2016

deads2k commented Jun 24, 2016

smarterclayton commented Jun 24, 2016

deads2k commented Jul 20, 2016

smarterclayton commented Jul 25, 2016

deads2k commented Jul 25, 2016

smarterclayton commented Jul 25, 2016