High latency with resource="imagestreamimports" and verb="POST" #21508

Reamer · 2018-11-16T13:25:59Z

With the default monitoring from cluster-monitoring-operator the alert KubeAPILatencyHigh reports periodical problems with a high latency on API-Server.
The resource "imagestreamimports" is triggered periodical because I have imagestreams with "importPolicy.scheduled" = true.

Version

oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://s-openshift.mycompany.com:443
openshift v3.11.0+6c2b013-59
kubernetes v1.11.0+d4cacc0

Steps To Reproduce

Install Openshift 3.11 with Ansible
Verify that openshift-monitoring is installed with default configuration
Create some imagestreams with importPolicy.scheduled = true

Current Result

API calls latency is quiet high 1-4 seconds

Expected Result

Fast API calls with low latency.

Additional Infos

Maybe related with #14264
Noticed this messages in API-Server Log.

I1116 10:15:42.621803       1 trace.go:76] Trace[2046733054]: "Create /apis/image.openshift.io/v1/namespaces/my-project/imagestreamimports" (started: 2018-11-16 10:15:41.486368579 +0000 UTC m=+80018.770131016) (total time: 1.135397696s):
Trace[2046733054]: [1.13489078s] [1.134427655s] Object stored in database
I1116 10:19:28.517513       1 trace.go:76] Trace[1052075296]: "Create /apis/image.openshift.io/v1/namespaces/my-project/imagestreamimports" (started: 2018-11-16 10:19:26.48801308 +0000 UTC m=+80243.771775530) (total time: 2.029461307s):
Trace[1052075296]: [2.028804306s] [2.028362867s] Object stored in database
I1116 10:26:59.096277       1 trace.go:76] Trace[1356838264]: "Create /apis/image.openshift.io/v1/namespaces/my-project/imagestreamimports" (started: 2018-11-16 10:26:56.48712211 +0000 UTC m=+80693.770884485) (total time: 2.609073903s):
Trace[1356838264]: [2.608495955s] [2.607996341s] Object stored in database
I1116 10:34:28.377486       1 trace.go:76] Trace[1656023571]: "Create /apis/image.openshift.io/v1/namespaces/my-project/imagestreamimports" (started: 2018-11-16 10:34:26.485752391 +0000 UTC m=+81143.769514793) (total time: 1.891700389s):
Trace[1656023571]: [1.891163987s] [1.890827358s] Object stored in database
I1116 10:42:01.529790       1 trace.go:76] Trace[40799593]: "Create /apis/image.openshift.io/v1/namespaces/my-project/imagestreamimports" (started: 2018-11-16 10:41:56.677180492 +0000 UTC m=+81593.960942873) (total time: 4.852573087s):
Trace[40799593]: [4.851734298s] [4.851280253s] Object stored in database
I1116 10:45:42.642385       1 trace.go:76] Trace[556365896]: "Create /apis/image.openshift.io/v1/namespaces/my-project/imagestreamimports" (started: 2018-11-16 10:45:41.48695107 +0000 UTC m=+81818.770713475) (total time: 1.155388409s):
Trace[556365896]: [1.154500425s] [1.154029628s] Object stored in database

The text was updated successfully, but these errors were encountered:

openshift-bot · 2019-02-14T13:32:12Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Reamer · 2019-02-14T20:28:57Z

/remove-lifecycle stale

dmage · 2019-02-21T12:53:23Z

This API call goes to remote registries and, in general, we can't guarantee it to be fast.

StoneIsle · 2019-03-25T11:41:22Z

+1

Reamer · 2019-03-25T16:22:29Z

@dmage Can you explain a little bit more?
Some thoughts:
We enable with Imagestream configuration importPolicy.scheduled: true a periodic sync with remote image registries. I think some kind of scheduler will check for this configuration flag. The scheduler should then ask the remote registry for updated information and if all information are present in RAM, then we should push information at once to the API-Server, which updates the API-objects.

dmage · 2019-03-25T16:37:07Z

@Reamer we have the controller "openshift.io/image-import" that goes through all image streams that have "importPolicy.scheduled: true" and creates ImageStreamImport objects. The image API has a special handler for creating ImageStreamImport objects: it gets the list of remote registries from the ImageStreamImport object and from the image stream, fetches manifests from the remote registries, and writes information about new images back to the image stream.

Reamer · 2019-03-25T17:17:53Z

I disabled alertrule in alertmanager configuration, because it's to noisy. Maybe we should disable this specific api endpoint alertrule in general, because I think nearly everyone should have this problems, if they use dockerhub as a remote registry.

clcollins · 2019-05-02T17:36:33Z

Just adding a voice here - the alarm began to trigger for us when we configured the image streams for scheduled updates for the official OpenShift images in the openshift namespace, on two different clusters.

yocum137 · 2019-07-01T16:15:36Z

Just adding a voice here - the alarm began to trigger for us when we configured the image streams for scheduled updates for the official OpenShift images in the openshift namespace, on two different clusters.

Are you using registry.access.redhat.com or registry.redhat.io?

nate-duke · 2019-07-26T16:22:08Z

@yocum137 I work on the same cluster @clcollins was referring to, we're not using either of those registries. the registries in our openshift namespace are mostly on docker.io or registry.centos.org

We're OKD, not OCP.

openshift-bot · 2019-10-24T16:47:18Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Reamer · 2019-10-25T13:22:43Z

/remove-lifecycle stale

trumbaut · 2019-11-18T08:58:32Z

We have this issue too:

Using OCP v3.11.146 (Kubernetes v1.11.0+d4cacc0).
After enabling importPolicy.scheduled: true for some imagestreams.
These imagestreams are pointing to registry.redhat.io.
They are correctly being synchronized.
They produce a KubeAPILatencyHigh alert in Prometheus (The API server has a 99th percentile latency of 7.826666666666666 seconds for POST imagestreamimports.)

I will contact Red Hat Support on how to handle with this.

trumbaut · 2019-11-20T11:16:15Z

For your reference: the same issue is being described at Red Hat Bugzilla - Bug 1670380 - Alertmanager triggers error when updating the image.

openshift-bot · 2020-02-18T13:40:24Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

trumbaut · 2020-02-20T07:05:50Z

Issue should have been fixed in Red Hat OpenShift Container Platform 3.11.170, but I haven't test it myself. Can someone confirm?

nate-duke · 2020-02-20T12:41:20Z

@trumbaut that's great that it's fixed downstream! Any idea where we cound find that in openshift/origin? The last official release we have here is v3.11.0 in October 2018.

Did a bit of digging based on that "Port ... to 3.11" MR that maybe this is in the quay.io/openshift/origin-cluster-monitoring-operator images, maybe on the v3.11.0 tag but I have a hard time finding anything concrete about the history of images on quay. We could probably find a CI process log somewhere that would connect that MR to an image getting pushed to the origin-cluster-monitoring-operator image repository. I'm willing to test in our dev cluster if we can find at least a tad of evidence that the change is actually available outside of OCP.

trumbaut · 2020-03-04T13:29:06Z

@nate-duke :

Any idea where we cound find that in openshift/origin? The last official release we have here is v3.11.0 in October 2018.

The changes can be found at

But no, I heave no idea if new releases for OKD 3.11 will still be delivered. OKD 4 can be found at https://github.com/openshift/okd.

nate-duke · 2020-03-04T14:45:14Z

But no, I heave no idea if new releases for OKD 3.11 will still be delivered. OKD 4 can be found at https://github.com/openshift/okd.

We've since silenced this alert but i did do the leg work of updating our images so there's a chance we picked up these fixes if they were ever included in anything built for OKD 3.11.

I'm glad to see that the development of OKD 4 is proceeding. I've been following that repo for the last few months. I think we'll likely wait until FCOS and support for it in OKD 4.0 has matured just a bit more before we go building a new cluster to actually try and run it.

openshift-bot · 2020-04-03T18:33:56Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2020-05-03T20:33:23Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2020-05-03T20:33:37Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Reamer mentioned this issue Nov 20, 2018

Why do my changes to the prometheus rules only work in a short time? openshift/cluster-monitoring-operator#158

Closed

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2019

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2019

dmage added the component/image label Feb 21, 2019

adambkaplan added the component/imageregistry label Mar 25, 2019

dmage removed the component/imageregistry label Mar 25, 2019

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2019

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 25, 2019

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 18, 2020

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 3, 2020

openshift-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 3, 2020

openshift-ci-robot closed this as completed May 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High latency with resource="imagestreamimports" and verb="POST" #21508

High latency with resource="imagestreamimports" and verb="POST" #21508

Reamer commented Nov 16, 2018 •

edited

Loading

openshift-bot commented Feb 14, 2019

Reamer commented Feb 14, 2019

dmage commented Feb 21, 2019

StoneIsle commented Mar 25, 2019

Reamer commented Mar 25, 2019

dmage commented Mar 25, 2019 •

edited

Loading

Reamer commented Mar 25, 2019 •

edited

Loading

clcollins commented May 2, 2019

yocum137 commented Jul 1, 2019

nate-duke commented Jul 26, 2019

openshift-bot commented Oct 24, 2019

Reamer commented Oct 25, 2019

trumbaut commented Nov 18, 2019

trumbaut commented Nov 20, 2019

openshift-bot commented Feb 18, 2020

trumbaut commented Feb 20, 2020

nate-duke commented Feb 20, 2020 •

edited

Loading

trumbaut commented Mar 4, 2020

nate-duke commented Mar 4, 2020

openshift-bot commented Apr 3, 2020

openshift-bot commented May 3, 2020

openshift-ci-robot commented May 3, 2020

High latency with resource="imagestreamimports" and verb="POST" #21508

High latency with resource="imagestreamimports" and verb="POST" #21508

Comments

Reamer commented Nov 16, 2018 • edited Loading

Version

Steps To Reproduce

Current Result

Expected Result

Additional Infos

openshift-bot commented Feb 14, 2019

Reamer commented Feb 14, 2019

dmage commented Feb 21, 2019

StoneIsle commented Mar 25, 2019

Reamer commented Mar 25, 2019

dmage commented Mar 25, 2019 • edited Loading

Reamer commented Mar 25, 2019 • edited Loading

clcollins commented May 2, 2019

yocum137 commented Jul 1, 2019

nate-duke commented Jul 26, 2019

openshift-bot commented Oct 24, 2019

Reamer commented Oct 25, 2019

trumbaut commented Nov 18, 2019

trumbaut commented Nov 20, 2019

openshift-bot commented Feb 18, 2020

trumbaut commented Feb 20, 2020

nate-duke commented Feb 20, 2020 • edited Loading

trumbaut commented Mar 4, 2020

nate-duke commented Mar 4, 2020

openshift-bot commented Apr 3, 2020

openshift-bot commented May 3, 2020

openshift-ci-robot commented May 3, 2020

Reamer commented Nov 16, 2018 •

edited

Loading

dmage commented Mar 25, 2019 •

edited

Loading

Reamer commented Mar 25, 2019 •

edited

Loading

nate-duke commented Feb 20, 2020 •

edited

Loading