Constant high CPU usage #39

pdolan · 2014-06-30T17:11:59Z

Hi there,

Currently working with a carbon relay receiving about 1.2million metrics per minute.

For testing purposes, have deployed a two node cassandra cluster with each cassandra node having a cyanite process attempting to write metrics. This setup seems to function fine when I throw some simple stress-test metrics at it.

However, when I direct a portion of our production metrics at the cluster, CPU utilisation hops to near 100% for the cyanite process across all available cores (currently 8 per instance) - and continues to spin with this usage even after I cease sending metrics. Cassandra writes and CPU utilisation remain very low throughout this ~5% usage on a single core.

I initially thought that @addisonj had a pull request (#37) that would address this issue, as there were a number of exceptions being thrown in the cyanite.log file relating to badly formed metrics. However, after manually merging the pull request and retrying the issue persists (although the formatting exceptions are now being handled elegantly!)

Any pointers for this one? Quite excited to get cyanite working on our production metric volume!

-Paul

addisonj · 2014-06-30T17:19:55Z

@pdolan What path store are you using?

I had this issue with the memory path store, which I assume is from search
of the path store. Switching to the elastic search store greatly reduced
CPU and increased throughput.

However, you will also want to make sure #36 is merged as well, which adds
a local cache to check if a metric already exists.

With those two fixes in place, I can handle quite a few metrics, however, I
do see lag under load, where metrics can get behind. I am still debugging
this issue.

On Mon, Jun 30, 2014 at 10:12 AM, pdolan notifications@github.com wrote:

Hi there,

Currently working with a carbon relay receiving about 1.2million metrics
per minute.

For testing purposes, have deployed a two node cassandra cluster with each
cassandra node having a cyanite process attempting to write metrics. This
setup seems to function fine when I throw some simple stress-test metrics
at it.

However, when I direct a portion of our production metrics at the cluster,
CPU utilisation hops to near 100% for the cyanite process across all
available cores (currently 8 per instance) - and continues to spin with
this usage even after I cease sending metrics. Cassandra writes and CPU
utilisation remain very low throughout this ~5% usage on a single core.

I initially thought that @addisonj https://github.com/addisonj had a
pull request (#37 #37) that would
address this issue, as there were a number of exceptions being thrown in
the cyanite.log file relating to badly formed metrics. However, after
manually merging the pull request and retrying the issue persists (although
the formatting exceptions are now being handled elegantly!)

Any pointers for this one? Quite excited to get cyanite working on our
production metric volume!

-Paul

—
Reply to this email directly or view it on GitHub
#39.

pdolan · 2014-07-01T11:32:13Z

@addisonj

We had flapped on and off from using ES due to a previous requirement for graphite-web. However, putting ES in place and merging in #36 has made a huge difference - the metrics now seem to be flowing.

Had to double the amount of cyanite processes (to 4) and am still getting some drops from the relay, but definitely feels like a step in the right direction.

I hope your debugging bears fruit!

pdolan closed this as completed Jul 1, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant high CPU usage #39

Constant high CPU usage #39

pdolan commented Jun 30, 2014

addisonj commented Jun 30, 2014

pdolan commented Jul 1, 2014

Constant high CPU usage #39

Constant high CPU usage #39

Comments

pdolan commented Jun 30, 2014

addisonj commented Jun 30, 2014

pdolan commented Jul 1, 2014