Skip to content
This repository has been archived by the owner on Sep 2, 2020. It is now read-only.

Constant high CPU usage #39

Closed
pdolan opened this issue Jun 30, 2014 · 2 comments
Closed

Constant high CPU usage #39

pdolan opened this issue Jun 30, 2014 · 2 comments

Comments

@pdolan
Copy link

pdolan commented Jun 30, 2014

Hi there,

Currently working with a carbon relay receiving about 1.2million metrics per minute.

For testing purposes, have deployed a two node cassandra cluster with each cassandra node having a cyanite process attempting to write metrics. This setup seems to function fine when I throw some simple stress-test metrics at it.

However, when I direct a portion of our production metrics at the cluster, CPU utilisation hops to near 100% for the cyanite process across all available cores (currently 8 per instance) - and continues to spin with this usage even after I cease sending metrics. Cassandra writes and CPU utilisation remain very low throughout this ~5% usage on a single core.

I initially thought that @addisonj had a pull request (#37) that would address this issue, as there were a number of exceptions being thrown in the cyanite.log file relating to badly formed metrics. However, after manually merging the pull request and retrying the issue persists (although the formatting exceptions are now being handled elegantly!)

Any pointers for this one? Quite excited to get cyanite working on our production metric volume!

-Paul

@addisonj
Copy link
Contributor

@pdolan What path store are you using?

I had this issue with the memory path store, which I assume is from search
of the path store. Switching to the elastic search store greatly reduced
CPU and increased throughput.

However, you will also want to make sure #36 is merged as well, which adds
a local cache to check if a metric already exists.

With those two fixes in place, I can handle quite a few metrics, however, I
do see lag under load, where metrics can get behind. I am still debugging
this issue.

On Mon, Jun 30, 2014 at 10:12 AM, pdolan notifications@github.com wrote:

Hi there,

Currently working with a carbon relay receiving about 1.2million metrics
per minute.

For testing purposes, have deployed a two node cassandra cluster with each
cassandra node having a cyanite process attempting to write metrics. This
setup seems to function fine when I throw some simple stress-test metrics
at it.

However, when I direct a portion of our production metrics at the cluster,
CPU utilisation hops to near 100% for the cyanite process across all
available cores (currently 8 per instance) - and continues to spin with
this usage even after I cease sending metrics. Cassandra writes and CPU
utilisation remain very low throughout this ~5% usage on a single core.

I initially thought that @addisonj https://github.com/addisonj had a
pull request (#37 #37) that would
address this issue, as there were a number of exceptions being thrown in
the cyanite.log file relating to badly formed metrics. However, after
manually merging the pull request and retrying the issue persists (although
the formatting exceptions are now being handled elegantly!)

Any pointers for this one? Quite excited to get cyanite working on our
production metric volume!

-Paul


Reply to this email directly or view it on GitHub
#39.

@pdolan
Copy link
Author

pdolan commented Jul 1, 2014

@addisonj

We had flapped on and off from using ES due to a previous requirement for graphite-web. However, putting ES in place and merging in #36 has made a huge difference - the metrics now seem to be flowing.

Had to double the amount of cyanite processes (to 4) and am still getting some drops from the relay, but definitely feels like a step in the right direction.

I hope your debugging bears fruit!

@pdolan pdolan closed this as completed Jul 1, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants