Add a metric to track usage of inflight request limit. #58342

gmarek · 2018-01-16T16:07:13Z

This one is tricky. The goal is to know how 'loaded' given apiserver is before we start dropping the load, to so we need to somehow expose 'fullness' of channels.

Sadly this metric is pretty volatile so it's not clear how to do this correctly. I decided to do pre-aggregation to smoothen the metric a bit. In the current implementation the metric publishes maximum "usage" of the inflight is previous second.

If you have any ideas please share.
@smarterclayton @lavalamp @wojtek-t @liggitt @deads2k @caesarxuchao @sttts @crassirostris @hulkholden

Add apiserver metric for current inflight-request usage.

crassirostris · 2018-01-16T16:15:19Z

staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go

+	)
+	currentMutatingInflightRequests = prometheus.NewGauge(
+		prometheus.GaugeOpts{
+			Name: "apiserver_current_mutating_inflight_requests",


Can mutating be a metric label?

#58340 (comment) - It might for cleanness, but I'm not sure if we're not paying for that with a bit of lock contention.

SG, thanks! Maybe add a comment here to avoid further questions?

wojtek-t · 2018-01-23T13:02:06Z

staging/src/k8s.io/apiserver/pkg/server/filters/maxinflight.go

@@ -40,6 +50,40 @@ func handleError(w http.ResponseWriter, r *http.Request, err error) {
 	glog.Errorf(err.Error())
 }

+// requestWatermark is used to trak maximal usage of inflight requests.
+type requestWatermark struct {
+	sync.Mutex


I think we have a convention to not inheirt directly from sync.Mutex. So let's change to:
lock sync.Mutex

Hmm... I thought that at some point we had one that said that we should do it for simple things like this.

Done.

Hmm, maybe I'm not aware of something..
But in any case, I think what you have now is definitely good too.

wojtek-t · 2018-01-23T13:02:30Z

/approve no-issue

wojtek-t · 2018-01-23T13:25:17Z

/lgtm

liggitt · 2018-01-23T13:32:12Z

staging/src/k8s.io/apiserver/pkg/server/filters/maxinflight.go

@@ -93,6 +138,7 @@ func WithMaxInFlightLimit(
 			select {
 			case c <- true:
 				defer func() { <-c }()
+				watermark.record(len(nonMutatingChan), len(mutatingChan))


We really want to add a blocking lock here?

Fair point. I update a PR. Take a look if you like it better.

liggitt · 2018-01-23T16:42:52Z

staging/src/k8s.io/apiserver/pkg/server/filters/maxinflight.go

@@ -92,7 +137,12 @@ func WithMaxInFlightLimit(

 			select {
 			case c <- true:
-				defer func() { <-c }()
+				nonMutatingLen := len(nonMutatingChan)


should probably only update the length of the channel we're using in this request

liggitt · 2018-01-23T16:48:57Z

staging/src/k8s.io/apiserver/pkg/server/filters/maxinflight.go

+
+func (w *requestWatermark) record(readOnlyVal, mutatingVal int) {
+	w.lock.Lock()
+	defer w.lock.Unlock()


this is still a blocking lock. might be a good use for non-locking updates (would want to benchmark with contention to be sure)

oldVal := atomic.LoadInt64(&w.readOnlyWatermark) while oldVal < newVal && !atomic.CompareAndSwapInt64(&w.readOnlyWatermark, oldVal, newVal) { oldVal = atomic.LoadInt64(&w.readOnlyWatermark) }

My feeling is that 3 atomic operations are worse than lock, but I agree benchmark would be needed to be sure, but even that may change in the future versions of go. Which is why I'd want to proceed with the current version. Locking is now done after processing the request and releasing the "inflight" token, which should prevent this impacting the critical path in the apiserver. Does that sound good to you?

My feeling is that 3 atomic operations are worse than lock

that would only happen on requests that bumped the high water mark... the majority would be a single atomic read.

Which is why I'd want to proceed with the current version. Locking is now done after processing the request and releasing the "inflight" token, which should prevent this impacting the critical path in the apiserver.

I don't know the http stack well enough to know if this would still block request completion. It seems bad to contend on a single write lock in the handler path for all requests

Will moving this to separate go routine fix that?

that would only happen on requests that bumped the high water mark... the majority would be a single atomic read.

missed that this gets reset when the metric is reported. yeah... this approach wouldn't be great.

liggitt · 2018-01-24T16:58:04Z

staging/src/k8s.io/apiserver/pkg/server/filters/maxinflight.go

+		wait.Forever(func() {
+			watermark.lock.Lock()
+			defer watermark.lock.Unlock()
+			metrics.UpdateInflightRequestMetrics(watermark.readOnlyWatermark, watermark.mutatingWatermark)


we're holding a lock hanging all requests at this point... either read into local vars or use atomic.ReadInt64/atomic.StoreInt64 to avoid a critical section that has callouts to prometheus here

I'd need to both atomically read and write those values. I'm not sure it's worth it.

Just read them locally to keep the critical section small before calling Prometheus

lock
read to local vars
reset
unlock
pass local vars to Prometheus

Yup, I was going to do this yesterday, but life attacked me before I was able to.

gmarek · 2018-01-25T10:56:23Z

Addressed all comments. @liggitt PTAL.

gmarek · 2018-01-25T14:28:28Z

/retest

liggitt · 2018-01-26T05:52:20Z

staging/src/k8s.io/apiserver/pkg/server/filters/maxinflight.go

+	}
+}
+
+var watermark requestWatermark


since sync.Mutex is not supposed to be copied, passing this to anything would result in a copy of the lock. would be safer as var watermark = &requestWatermark{}

liggitt · 2018-01-26T13:15:15Z

/lgtm
/retest

k8s-ci-robot · 2018-01-26T13:15:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gmarek, liggitt, wojtek-t

Associated issue requirement bypassed by: wojtek-t

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~staging/src/k8s.io/apiserver/OWNERS~~ [liggitt,wojtek-t]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2018-01-26T14:26:54Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2018-01-26T14:54:34Z

Automatic merge from submit-queue (batch tested with PRs 55792, 58342). If you want to cherry-pick this change to another branch, please follow the instructions here.

smarterclayton · 2018-01-26T21:41:01Z

staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go

+	prometheus.MustRegister(currentInflightRequests)
+}
+
+func UpdateInflightRequestMetrics(nonmutating, mutating int) {


This metric seems like it belongs in the filter package, not here. We shouldn't have to register metrics in this package. This is a weird coupling.

…-#58340-#58342-upstream-release-1.8 Automatic merge from submit-queue. Automated cherry pick of #58340: Add apiserver metric for number of requests dropped by #58342: Add a metric to track usage of inflight request limit. Cherry pick of #58340 #58342 on release-1.8. #58340: Add apiserver metric for number of requests dropped by #58342: Add a metric to track usage of inflight request limit. ```release-note Add apiserver metric for current inflight-request usage and number of requests dropped because of inflight limit. ```

…58342-upstream-release-1.9 Automatic merge from submit-queue. Automated cherry pick of #58340: Add apiserver metric for number of requests dropped by #58342: Add a metric to track usage of inflight request limit. Cherry pick of #58340 #58342 on release-1.9. #58340: Add apiserver metric for number of requests dropped by #58342: Add a metric to track usage of inflight request limit. ```release-note Add apiserver metric for current inflight-request usage and number of requests dropped because of inflight limit. ```

gmarek added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jan 16, 2018

gmarek added the release-note-none Denotes a PR that doesn't merit a release note. label Jan 16, 2018

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 16, 2018

k8s-github-robot assigned jimmidyson and dims Jan 16, 2018

gmarek removed the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Jan 16, 2018

k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jan 16, 2018

crassirostris reviewed Jan 16, 2018

View reviewed changes

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 19, 2018

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 20, 2018

gmarek force-pushed the inflight branch from 358b758 to 2179451 Compare January 23, 2018 12:28

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 23, 2018

gmarek force-pushed the inflight branch from 2179451 to 1ceffd9 Compare January 23, 2018 12:28

gmarek assigned wojtek-t and unassigned dims and jimmidyson Jan 23, 2018

gmarek removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2018

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 23, 2018

gmarek force-pushed the inflight branch 2 times, most recently from 523a30a to 4205590 Compare January 23, 2018 13:01

wojtek-t reviewed Jan 23, 2018

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2018

liggitt reviewed Jan 23, 2018

View reviewed changes

gmarek force-pushed the inflight branch from b06b91f to c1bd360 Compare January 23, 2018 16:20

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2018

liggitt reviewed Jan 23, 2018

View reviewed changes

liggitt reviewed Jan 24, 2018

View reviewed changes

gmarek force-pushed the inflight branch 2 times, most recently from f6c82c9 to 6be50a5 Compare January 25, 2018 10:55

liggitt reviewed Jan 26, 2018

View reviewed changes

Add a metric to track usage of inflight request limit.

000d7ba

gmarek force-pushed the inflight branch from 6be50a5 to 000d7ba Compare January 26, 2018 11:11

k8s-ci-robot assigned liggitt Jan 26, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 26, 2018

k8s-github-robot merged commit a73c96d into kubernetes:master Jan 26, 2018

smarterclayton reviewed Jan 26, 2018

View reviewed changes

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jan 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a metric to track usage of inflight request limit. #58342

Add a metric to track usage of inflight request limit. #58342

gmarek commented Jan 16, 2018 •

edited by mbohlool

Loading

crassirostris Jan 16, 2018

gmarek Jan 16, 2018

crassirostris Jan 16, 2018

wojtek-t Jan 23, 2018

gmarek Jan 23, 2018

wojtek-t Jan 23, 2018

wojtek-t commented Jan 23, 2018

wojtek-t commented Jan 23, 2018

liggitt Jan 23, 2018

gmarek Jan 23, 2018

liggitt Jan 23, 2018

gmarek Jan 25, 2018

liggitt Jan 23, 2018 •

edited

Loading

gmarek Jan 23, 2018

liggitt Jan 23, 2018 •

edited

Loading

liggitt Jan 23, 2018

gmarek Jan 24, 2018

liggitt Jan 24, 2018

liggitt Jan 24, 2018 •

edited

Loading

gmarek Jan 24, 2018

liggitt Jan 24, 2018 •

edited

Loading

gmarek Jan 25, 2018

gmarek commented Jan 25, 2018

gmarek commented Jan 25, 2018

liggitt Jan 26, 2018

gmarek Jan 26, 2018

liggitt commented Jan 26, 2018

k8s-ci-robot commented Jan 26, 2018

k8s-github-robot commented Jan 26, 2018

k8s-github-robot commented Jan 26, 2018

smarterclayton Jan 26, 2018

Add a metric to track usage of inflight request limit. #58342

Add a metric to track usage of inflight request limit. #58342

Conversation

gmarek commented Jan 16, 2018 • edited by mbohlool Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Jan 23, 2018

wojtek-t commented Jan 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Jan 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Jan 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmarek commented Jan 25, 2018

gmarek commented Jan 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Jan 26, 2018

k8s-ci-robot commented Jan 26, 2018

k8s-github-robot commented Jan 26, 2018

k8s-github-robot commented Jan 26, 2018

Choose a reason for hiding this comment

gmarek commented Jan 16, 2018 •

edited by mbohlool

Loading

liggitt Jan 23, 2018 •

edited

Loading

liggitt Jan 23, 2018 •

edited

Loading

liggitt Jan 24, 2018 •

edited

Loading

liggitt Jan 24, 2018 •

edited

Loading