Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage with no TICK scripts #1950

Open
alex-phillips opened this issue Jun 4, 2018 · 2 comments
Open

High CPU usage with no TICK scripts #1950

alex-phillips opened this issue Jun 4, 2018 · 2 comments

Comments

@alex-phillips
Copy link

Our monitoring stack consists of 2 availability zones each with an instance (r4.large) running kapacitor. Both run kapacitor 24x7, but only one is ever the primary with our TICK scripts enabled while the other is running with no tasks. We have automation that runs every 5 minutes to handle who is the primary and if the primary has failed or the instance was terminated, we add and enable the TICK scripts on the secondary.

The issue we are experiencing is that the instance in one of our availability zones constantly has a higher CPU usage than our other instance, regardless of whether it is running the TICK scripts or not. We've isolated the issue to be kapacitor using an unacceptable amount of CPU, but cannot determine why. Debug logging does not output anything other than lvl=debug msg="linking subscription for cluster" service=influxdb cluster=localhost cluster=localhost every minute. We have also tried terminating and bringing up new instances in its place several times with no change.

The 2 instances are mostly identical. The exact same AMI, same version of kapacitor (1.5.0), the same configuration files which are shared via EFS. The only difference we can tell is the availability zone they exist.

Below is a screenshot of our monitoring of the CPU usage on the 2 instances. The blue is our problematic machine and the green is the normal, primary machine. In this case, the green is running our TICK scrips while the blue has none enabled. The drop in usage for blue at around 15:31 is when we disable kapacitor on the secondary machine. Once this is disabled, you can see the usage drops to mirror what the primary is doing.

kapacitor

Please let me know if you need any more information. We have tried everything and are unable to determine what is causing this high kapacitor usage.

@faskiri
Copy link

faskiri commented Jun 6, 2018

Can you look at the kapacitor stats to see what it is upto? You can enable the stats in kapacitor config and then publish the stats to influx to monitor the same using:

stream
    |from()
        .database('_kapacitor')
        .retentionPolicy('autogen')
    |influxDBOut()

@pabdavis
Copy link

pabdavis commented Aug 29, 2018

We have determined what was causing this high kapacitor CPU usage - it was actually high on the both instances. Influxdb and kapacitor both run on each instance as mentioned above - the influxdb data persisted thru new instance creation in each availability zone however the /var/lib/kapacitor directory did not.

When kapacitor was started on a new instance it would self-register to influxdb subscription for all databases and retention policies (5 subscriptions in our configuration). Since there is no self unregister, we accumulated over 70+ subscriptions in our influxdb.

Whether kapacitor was processing or rejecting the other subscriptions is not known, but kapacitor was working much more than it needed to be.

We stopped kapacitor, dropped all subscriptions from influxdb, re-located the /var/lib/kapacitor directory to persistent storage, re-configured and restarted kapacitor. Kapacitor self-registered 5 subscriptions only, we've rebooted and created a new instance and kapacitor has not registered additional subscriptions.

Our load avg on the instances has dropped from 7-10 range to < 1 on both instances.

Once we found the large number of subscriptions but still did not know this was the cause, this documentation was helpful:

https://docs.influxdata.com/kapacitor/v1.5/administration/subscription-management/#duplicate-kapacitor-subscriptions

https://docs.influxdata.com/influxdb/v1.6/administration/subscription-management/#inaccessible-or-decommissioned-subscription-endpoints

As well as finding this issue:
#870

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants