Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clair high cpu load 24/7 #415

Closed
zetaab opened this issue Jun 12, 2017 · 9 comments
Closed

clair high cpu load 24/7 #415

zetaab opened this issue Jun 12, 2017 · 9 comments

Comments

@zetaab
Copy link

zetaab commented Jun 12, 2017

image

this might be related to #414 but I am wondering why clair cpu usage is all what it can get all the time? Is it really doing something 24/7 if not new images are uploaded?

It might be that 0.2cpu is too low for clair anyway.

@jzelinskie
Copy link
Contributor

Did you recently upgrade from 1.x to 2.x? How are you using the API? Quay Enterprise?

@zetaab
Copy link
Author

zetaab commented Jun 12, 2017

I started from clean 2.0.0. I have coded custom integration which is located between docker registry and clair. I am calling clair using these API specs https://coreos.com/clair/docs/latest/api_v1.html

@jzelinskie
Copy link
Contributor

The update worker might just keep trying to update then failing and repeating due to the error you posted in #414. Is the CPU usage the same if there are no API requests at all?

@ErikThoreson
Copy link
Contributor

ErikThoreson commented Jun 12, 2017

This is exactly what is happening to me. Typically the ubuntu update or the NVD downloads often timeout which sends the updater into a never ending retry cycle.

Running the latest v2.0.0 docker container in a K8s setup

{"Event":"update finished","Level":"info","Location":"updater.go:198","Time":"2017-06-12 21:23:51.869203"}
{"Event":"attempting to obtain update lock","Level":"debug","Location":"updater.go:99","Time":"2017-06-12 21:23:51.875354"}
{"Event":"updating vulnerabilities","Level":"info","Location":"updater.go:167","Time":"2017-06-12 21:23:51.880889"}
{"Event":"fetching vulnerability updates","Level":"info","Location":"updater.go:213","Time":"2017-06-12 21:23:51.880952"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"alpine.go:52","Time":"2017-06-12 21:23:51.881073","package":"Alpine"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"oracle.go:119","Time":"2017-06-12 21:23:51.881114","package":"Oracle Linux"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"rhel.go:92","Time":"2017-06-12 21:23:51.881170","package":"RHEL"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"ubuntu.go:88","Time":"2017-06-12 21:23:51.881203","package":"Ubuntu"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"debian.go:63","Time":"2017-06-12 21:23:51.881252","package":"Debian"}
{"Event":"could not pull Ubuntu repository","Level":"error","Location":"ubuntu.go:189","Time":"2017-06-12 21:23:52.653941","error":"exit status 3","output":"bzr: ERROR: Not a branch: \"/tmp/ubuntu-cve-tracker374160068/.bzr/branch/\": location is a repository.\n"}
{"Event":"an error occured when fetching update","Level":"error","Location":"updater.go:220","Time":"2017-06-12 21:23:52.654043","error":"could not download requested resource","updater name":"ubuntu"}
{"Event":"no update","Level":"debug","Location":"rhel.go:149","Time":"2017-06-12 21:23:53.001010","package":"Red Hat"}
{"Event":"finished fetching","Level":"info","Location":"updater.go:227","Time":"2017-06-12 21:23:53.001099","updater name":"rhel"}
{"Event":"no update","Level":"debug","Location":"alpine.go:74","Time":"2017-06-12 21:23:53.351490","package":"alpine"}
{"Event":"finished fetching","Level":"info","Location":"updater.go:227","Time":"2017-06-12 21:23:53.351597","updater name":"alpine"}
{"Event":"no update","Level":"debug","Location":"oracle.go:178","Time":"2017-06-12 21:23:53.930796","package":"Oracle Linux"}
{"Event":"finished fetching","Level":"info","Location":"updater.go:227","Time":"2017-06-12 21:23:53.930897","updater name":"oracle"}
{"Event":"no update","Level":"debug","Location":"debian.go:116","Time":"2017-06-12 21:24:32.456501","package":"Debian"}
{"Event":"finished fetching","Level":"info","Location":"updater.go:227","Time":"2017-06-12 21:24:32.456602","updater name":"debian"}
{"Event":"adding metadata to vulnerabilities","Level":"info","Location":"updater.go:253","Time":"2017-06-12 21:24:32.456639"}
{"Event":"could not decode NVD data feed","Level":"error","Location":"nvd.go:93","Time":"2017-06-12 21:24:33.936441","data feed name":"2016","error":"EOF"}
{"Event":"an error occured when loading metadata fetcher","Level":"error","Location":"updater.go:274","Time":"2017-06-12 21:24:33.936539","appender name":"NVD","error":"updater/fetchers: could not parse"}

Example of incomplete NVD downloads.

/go/src/github.com/coreos/clair # cd /tmp/
/tmp # ls -al
total 81044
drwxrwxrwt    1 root     root          4096 Jun 12 21:24 .
drwxr-xr-x    1 root     root            90 Jun 12 20:50 ..
drwx------    7 root     root           102 Jun 12 20:50 alpine-secdb708459664
drwx------    2 root     root             6 Jun 12 20:59 nvd-data260805698
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982002.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982003.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982004.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982005.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982006.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982007.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982008.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982009.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982010.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982011.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982012.xml
-rw-r--r--    1 root     root      45423389 Jun 12 20:59 nvd-data2608056982013.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982014.xml
-rw-r--r--    1 root     root      32840022 Jun 12 20:59 nvd-data2608056982015.xml
-rw-r--r--    1 root     root             0 Jun 12 20:59 nvd-data2608056982016.xml
-rw-r--r--    1 root     root       4716287 Jun 12 20:59 nvd-data2608056982017.xml
drwx------    3 root     root            18 Jun 12 21:24 ubuntu-cve-tracker147402067

@ErikThoreson
Copy link
Contributor

its possible mine is related to this.

#323 (comment)

I'm playing with my cpu and mem requests and limits. Will post if i can find a sweet spot.

@zetaab
Copy link
Author

zetaab commented Jun 13, 2017

image
last 6 hours with higher cpu limit. It is doing quite much tasks there? Our registry is not that active.

@ErikThoreson
Copy link
Contributor

ErikThoreson commented Jun 13, 2017

changing the CPU and memory limits on the container allowed the ubuntu to finish finally and NVD got much further but still failed.

        resources:
          limits:
            cpu: 4000m
            memory: 4096Mi
          requests:
            cpu: 250m
            memory: 512Mi
{"Event":"running database migrations","Level":"info","Location":"pgsql.go:216","Time":"2017-06-13 02:38:28.663323"}
{"Event":"database migration ran successfully","Level":"info","Location":"pgsql.go:223","Time":"2017-06-13 02:38:28.696005"}
{"Event":"starting health API","Level":"info","Location":"api.go:85","Time":"2017-06-13 02:38:28.696449","port":6061}
{"Event":"notifier service is disabled","Level":"info","Location":"notifier.go:77","Time":"2017-06-13 02:38:28.696630"}
{"Event":"starting main API","Level":"info","Location":"api.go:52","Time":"2017-06-13 02:38:28.696702","port":6060}
{"Event":"updater service started","Level":"info","Location":"updater.go:80","Time":"2017-06-13 02:38:28.696702","lock identifier":"6a702c44-2484-4c70-8bab-4c6bf94d477e"}
{"Event":"attempting to obtain update lock","Level":"debug","Location":"updater.go:99","Time":"2017-06-13 02:38:28.701316"}
{"Event":"updating vulnerabilities","Level":"info","Location":"updater.go:167","Time":"2017-06-13 02:38:28.707629"}
{"Event":"fetching vulnerability updates","Level":"info","Location":"updater.go:213","Time":"2017-06-13 02:38:28.707671"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"ubuntu.go:88","Time":"2017-06-13 02:38:28.707764","package":"Ubuntu"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"debian.go:63","Time":"2017-06-13 02:38:28.707797","package":"Debian"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"alpine.go:52","Time":"2017-06-13 02:38:28.707848","package":"Alpine"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"rhel.go:92","Time":"2017-06-13 02:38:28.707943","package":"RHEL"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"oracle.go:119","Time":"2017-06-13 02:38:28.707902","package":"Oracle Linux"}
{"Event":"no update","Level":"debug","Location":"rhel.go:149","Time":"2017-06-13 02:38:29.400731","package":"Red Hat"}
{"Event":"finished fetching","Level":"info","Location":"updater.go:227","Time":"2017-06-13 02:38:29.400826","updater name":"rhel"}
{"Event":"no update","Level":"debug","Location":"oracle.go:178","Time":"2017-06-13 02:38:36.579170","package":"Oracle Linux"}
{"Event":"finished fetching","Level":"info","Location":"updater.go:227","Time":"2017-06-13 02:38:36.579281","updater name":"oracle"}
{"Event":"no update","Level":"debug","Location":"alpine.go:74","Time":"2017-06-13 02:38:40.787443","package":"alpine"}
{"Event":"finished fetching","Level":"info","Location":"updater.go:227","Time":"2017-06-13 02:38:40.787547","updater name":"alpine"}
{"Event":"finished fetching","Level":"info","Location":"updater.go:227","Time":"2017-06-13 02:39:31.088258","updater name":"debian"}
{"Event":"finished fetching","Level":"info","Location":"updater.go:227","Time":"2017-06-13 06:50:44.800606","updater name":"ubuntu"}
{"Event":"adding metadata to vulnerabilities","Level":"info","Location":"updater.go:253","Time":"2017-06-13 06:50:45.184249"}
{"Event":"could not decode NVD data feed","Level":"error","Location":"nvd.go:93","Time":"2017-06-13 06:51:16.814940","data feed name":"2008","error":"read tcp 192.168.5.29:58424-\u003e129.6.13.177:443: read: connection reset by peer"}
{"Event":"an error occured when loading metadata fetcher","Level":"error","Location":"updater.go:274","Time":"2017-06-13 06:51:16.815053","appender name":"NVD","error":"updater/fetchers: could not parse"}
{"Event":"inserting vulnerabilities for update","Level":"debug","Location":"updater.go:173","Time":"2017-06-13 06:51:16.815108","count":111934}
{"Event":"fetcher note","Level":"warning","Location":"updater.go:189","Time":"2017-06-13 07:13:13.748680","note":"Ubuntu precise/esm is not mapped to any version number (eg. trusty-\u003e14.04). Please update me."}
{"Event":"update finished","Level":"info","Location":"updater.go:198","Time":"2017-06-13 07:13:13.752952"}
{"Event":"next update attempt scheduled","Level":"debug","Location":"updater.go:143","Time":"2017-06-13 07:13:13.760320","scheduled time":"2017-06-14T07:13:14Z"}
drwxrwxrwt    1 root     root          4096 Jun 13 06:50 .
drwxr-xr-x    1 root     root            90 Jun 13 02:38 ..
drwx------    7 root     root           102 Jun 13 02:38 alpine-secdb729155134
drwx------    2 root     root             6 Jun 13 06:50 nvd-data169518725
-rw-r--r--    1 root     root      19575401 Jun 13 06:50 nvd-data1695187252002.xml
-rw-r--r--    1 root     root       5713679 Jun 13 06:50 nvd-data1695187252003.xml
-rw-r--r--    1 root     root      11833655 Jun 13 06:51 nvd-data1695187252004.xml
-rw-r--r--    1 root     root      18237722 Jun 13 06:51 nvd-data1695187252005.xml
-rw-r--r--    1 root     root      28329687 Jun 13 06:50 nvd-data1695187252006.xml
-rw-r--r--    1 root     root      25801638 Jun 13 06:51 nvd-data1695187252007.xml
-rw-r--r--    1 root     root       2718733 Jun 13 06:51 nvd-data1695187252008.xml
-rw-r--r--    1 root     root      31871257 Jun 13 06:51 nvd-data1695187252009.xml
-rw-r--r--    1 root     root      47065101 Jun 13 06:51 nvd-data1695187252010.xml
-rw-r--r--    1 root     root             0 Jun 13 06:50 nvd-data1695187252011.xml
-rw-r--r--    1 root     root             0 Jun 13 06:50 nvd-data1695187252012.xml
-rw-r--r--    1 root     root      45423389 Jun 13 06:51 nvd-data1695187252013.xml
-rw-r--r--    1 root     root             0 Jun 13 06:50 nvd-data1695187252014.xml
-rw-r--r--    1 root     root             0 Jun 13 06:50 nvd-data1695187252015.xml
-rw-r--r--    1 root     root      37865174 Jun 13 06:51 nvd-data1695187252016.xml
-rw-r--r--    1 root     root      11773684 Jun 13 06:51 nvd-data1695187252017.xml
drwx------   10 root     root          4096 Jun 13 02:50 ubuntu-cve-tracker154171291

@zetaab
Copy link
Author

zetaab commented Jun 13, 2017

how I can disable this ubuntu thing? We do not have any ubuntu based images.

@jzelinskie
Copy link
Contributor

jzelinskie commented Jun 28, 2017

You'd have to recompile Clair without the import for the ubuntu extension. I'd recommend building off of one of the stable release branches (release-1.x or release-2.x).

I've also created a PR for the loop eating the CPU.

KeyboardNerd pushed a commit to KeyboardNerd/clair that referenced this issue Feb 2, 2018
This change makes the updater sleep for a small portion of time before
it continues to try and obtain a lock for running the updates. This will
prevent the CPU from being pinned in the case where an error is
consistently failing an update.

Fixes quay#415.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants