New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carbon vs Megacarbon and Roadmap ? #235

Closed
toni-moreno opened this Issue Apr 7, 2014 · 18 comments

Comments

Projects
None yet
@toni-moreno

toni-moreno commented Apr 7, 2014

Hi guys.

I'm planning a big graphite installation and I have to decide the final infrastructure and releases to install.

I have tested Graphite 0.9.X and 0.10-alpha. Wisper and Ceres and Also Carbon ( master branch and Megacarbon branch), I 'm planning a multi node Graphite Cluster.

My only documentation sources for clustering Graphite are :

http://graphite.readthedocs.org/en/latest/carbon-daemons.html
http://bitprophet.org/blog/2013/03/07/graphite/
http://anatolijd.blogspot.com.es/2013/06/graphitemegacarbonceres-multi-node.html

I'm worried because of leak of information on carbon vs megacarbon features. I'm also watching updates and the master seems to be updated newer than megacarbon branch.

Which is best for a big platform now?
What's the roadmap for carbon and megacarbon in nexts months ?

@meteozond

This comment has been minimized.

Show comment
Hide comment
@meteozond

meteozond commented Apr 9, 2014

+1

@nickchappell

This comment has been minimized.

Show comment
Hide comment
@nickchappell

nickchappell commented Apr 25, 2014

+1

@nickchappell

This comment has been minimized.

Show comment
Hide comment
@nickchappell

nickchappell May 1, 2014

Any more info or updates?

nickchappell commented May 1, 2014

Any more info or updates?

@psych0d0g

This comment has been minimized.

Show comment
Hide comment
@psych0d0g

psych0d0g commented May 6, 2014

+1

@payamsabz

This comment has been minimized.

Show comment
Hide comment
@payamsabz

payamsabz commented Jun 10, 2014

+1

@proover

This comment has been minimized.

Show comment
Hide comment
@proover

proover commented Jul 7, 2014

+1

@esc

This comment has been minimized.

Show comment
Hide comment
@esc

esc Jul 10, 2014

Member

I don't think anyone knows, TBH. Anyone have an idea what could/should be done?

Member

esc commented Jul 10, 2014

I don't think anyone knows, TBH. Anyone have an idea what could/should be done?

@zstyblik

This comment has been minimized.

Show comment
Hide comment
@zstyblik

zstyblik Jul 12, 2014

Anyone have an idea what could/should be done?

It's simple:

  1. fork it
  2. document hell out of it
  3. attract developers(previous should do the trick)
  4. secure some backend(= money to pay for dev time)

Please note, we're talking about 3-4 separate projects. That is:

  • graphite-web
  • carbon
  • whisper
  • ceres

zstyblik commented Jul 12, 2014

Anyone have an idea what could/should be done?

It's simple:

  1. fork it
  2. document hell out of it
  3. attract developers(previous should do the trick)
  4. secure some backend(= money to pay for dev time)

Please note, we're talking about 3-4 separate projects. That is:

  • graphite-web
  • carbon
  • whisper
  • ceres
@esc

This comment has been minimized.

Show comment
Hide comment
@esc

esc Jul 12, 2014

Member

I was asking more about carbon vs. megacarbon, rather than the state of the project itself.

If you want to help out with releases, feel free to look at:

graphite-project/graphite-web#677

Member

esc commented Jul 12, 2014

I was asking more about carbon vs. megacarbon, rather than the state of the project itself.

If you want to help out with releases, feel free to look at:

graphite-project/graphite-web#677

@zstyblik

This comment has been minimized.

Show comment
Hide comment
@zstyblik

zstyblik Jul 12, 2014

EDIT: mysterious trigger of post button.

I was asking more about carbon vs. megacarbon, rather than the state of the project itself.

I see. Now, my own opinion and experience. I think megacarbon is a step in right direction due to the fact Python process can't run on more than one CPU. And I find game with relays and caches and all that stuff rather appalling. But hey, it apparently works for some people.
And since PRs for megacarbon aren't being accepted for whatever reason, we've ended up with megacarbon from a "3rd party" git. And so will anybody else should they run megacarbon.
The same goes for Ceres and its maintenance tools.

And, actually, if anybody, or even you, could say which one is it going to be in the future, I mean megacarbon and Whisper or Ceres, it would save me from a lot of questions at my current job. I'm being asked about that a lot.

If you want to help out with releases, feel free to look at:

I have read it now and thank you for bringing it to my attention. However, that thread is 95% about whether next release is going to be labeled as 0.9.x or 0.10.x and ~ 3% about combing things up.
I don't know what you want or expect me to say.

zstyblik commented Jul 12, 2014

EDIT: mysterious trigger of post button.

I was asking more about carbon vs. megacarbon, rather than the state of the project itself.

I see. Now, my own opinion and experience. I think megacarbon is a step in right direction due to the fact Python process can't run on more than one CPU. And I find game with relays and caches and all that stuff rather appalling. But hey, it apparently works for some people.
And since PRs for megacarbon aren't being accepted for whatever reason, we've ended up with megacarbon from a "3rd party" git. And so will anybody else should they run megacarbon.
The same goes for Ceres and its maintenance tools.

And, actually, if anybody, or even you, could say which one is it going to be in the future, I mean megacarbon and Whisper or Ceres, it would save me from a lot of questions at my current job. I'm being asked about that a lot.

If you want to help out with releases, feel free to look at:

I have read it now and thank you for bringing it to my attention. However, that thread is 95% about whether next release is going to be labeled as 0.9.x or 0.10.x and ~ 3% about combing things up.
I don't know what you want or expect me to say.

@steve-dave

This comment has been minimized.

Show comment
Hide comment
@steve-dave

steve-dave Jul 14, 2014

Member

FWIW, I haven't used megacarbon. I made the choice based on the fact that it didn't yet seem to be considered "stable". I haven't hit any performance with carbon, but haven't yet hit a single relay with more than 2M+ metrics/min. Perhaps if you're planning to go significantly higher than that megacarbon may be your only option.

Member

steve-dave commented Jul 14, 2014

FWIW, I haven't used megacarbon. I made the choice based on the fact that it didn't yet seem to be considered "stable". I haven't hit any performance with carbon, but haven't yet hit a single relay with more than 2M+ metrics/min. Perhaps if you're planning to go significantly higher than that megacarbon may be your only option.

@SEJeff

This comment has been minimized.

Show comment
Hide comment
@SEJeff

SEJeff Jul 14, 2014

Member

@steve-dave Honestly if you're going > 1m metrics every 10 seconds, carbon-relay doesn't work at all. For super scalable and large installations, cyanite with the graphite storage backend is a better option. It uses cassandra as the backing store.

Member

SEJeff commented Jul 14, 2014

@steve-dave Honestly if you're going > 1m metrics every 10 seconds, carbon-relay doesn't work at all. For super scalable and large installations, cyanite with the graphite storage backend is a better option. It uses cassandra as the backing store.

@steve-dave

This comment has been minimized.

Show comment
Hide comment
@steve-dave

steve-dave Jul 14, 2014

Member

@SEJeff are you saying that there's no real use for megacarbon then, by the time you've maxed out carbon-relay megacarbon won't help?

Member

steve-dave commented Jul 14, 2014

@SEJeff are you saying that there's no real use for megacarbon then, by the time you've maxed out carbon-relay megacarbon won't help?

@SEJeff

This comment has been minimized.

Show comment
Hide comment
@SEJeff

SEJeff Jul 15, 2014

Member

Not exactly. I was throwing (at peak) 3.5 million metrics through collectd to carbon-relay every 10 seconds from many hosts. The caches weren't idle, but we maxed out the ssds in our cache nodes, so IO was not the limiting factor, the relay was. We tuned the kernel, we put it on a 10g network, we tuned every thing a large and senior group of sysadmins would.

megacarbon allows you to distribute things with ceres for the caches, but it doesn't do anything much at all for the relays. Megacarbon just allows carbon's backend to be pluggable and as it stands supports both ceres and whisper. This is a massive improvement for scaling carbon horizontally, but again, nothing for the relay.

Now the relay falling over is something we investigated a LOT. We tried swapping out the hashing algo in the relay (hoping it would lower cpu usage), we were tasksetting each relay process on it's own dedicated cpu core (we were using the isolcpus functionality to totally isolate the relay cpu cores from the general scheduler). That improved performance by a few hundred thousand metrics every 10 seconds, but even still, with some serious tuning, we didn't get the performance we wanted. We tried "striping" the load from collectd aggregators to different relay processes with the same config set via config management, and that sort of worked, but was clunky at best. Again, carbon-cache was able to keep up, but the relays simply fell over. The relay was and stayed cpu bound under heavy load. Once you reach the "max throughput" that it can handle it just starts queuing and eventually will start dropping metrics on the floor. We wanted high resolution metrics from large clusters of computers and were out to build a reliable and stable platform to store them all in. I used megacarbon "in production" and it worked perfectly fine. For the graphite-web and carbon side of things, megacarbon works perfectly well.

Now @pcn has done some interesting things with saving metrics to disk and then another process to send them, but looking heavily at it, it just is a bandaid that does indeed help scale the relays to higher throughput. One solution was to rewrite a large chunk of the relay in c, but haproxy works just as well. Then so long as your datastore is multi-master, you can just evenly round robin requests over them using a battletested codebase such as haproxy or LVS.

Keep in mind that most graphite installs won't remotely come close to 3.5+ million metrics every 10 seconds and 25 million metrics per minute is a pretty large number of incoming datapoints. We found that carbon struggled massively while opentsdb along with cassandra were fine (with appropriate tuning obviously). I know the hostedgraphite.com guys use Riak to do the exact same thing and their experience mimics mine, hence them building their own backend ontop of Riak. Carbon is a great idea, but fundamentally, twisted doesn't do what carbon-relay or carbon-aggregator were built to do when hit with sustained and heavy throughput. Much to my chagrin, concurrency isn't one of python's core competencies.

Sorry for the long response, but for a LOT of scale, carbon is just fundamentally not a wonderful piece of code.

Member

SEJeff commented Jul 15, 2014

Not exactly. I was throwing (at peak) 3.5 million metrics through collectd to carbon-relay every 10 seconds from many hosts. The caches weren't idle, but we maxed out the ssds in our cache nodes, so IO was not the limiting factor, the relay was. We tuned the kernel, we put it on a 10g network, we tuned every thing a large and senior group of sysadmins would.

megacarbon allows you to distribute things with ceres for the caches, but it doesn't do anything much at all for the relays. Megacarbon just allows carbon's backend to be pluggable and as it stands supports both ceres and whisper. This is a massive improvement for scaling carbon horizontally, but again, nothing for the relay.

Now the relay falling over is something we investigated a LOT. We tried swapping out the hashing algo in the relay (hoping it would lower cpu usage), we were tasksetting each relay process on it's own dedicated cpu core (we were using the isolcpus functionality to totally isolate the relay cpu cores from the general scheduler). That improved performance by a few hundred thousand metrics every 10 seconds, but even still, with some serious tuning, we didn't get the performance we wanted. We tried "striping" the load from collectd aggregators to different relay processes with the same config set via config management, and that sort of worked, but was clunky at best. Again, carbon-cache was able to keep up, but the relays simply fell over. The relay was and stayed cpu bound under heavy load. Once you reach the "max throughput" that it can handle it just starts queuing and eventually will start dropping metrics on the floor. We wanted high resolution metrics from large clusters of computers and were out to build a reliable and stable platform to store them all in. I used megacarbon "in production" and it worked perfectly fine. For the graphite-web and carbon side of things, megacarbon works perfectly well.

Now @pcn has done some interesting things with saving metrics to disk and then another process to send them, but looking heavily at it, it just is a bandaid that does indeed help scale the relays to higher throughput. One solution was to rewrite a large chunk of the relay in c, but haproxy works just as well. Then so long as your datastore is multi-master, you can just evenly round robin requests over them using a battletested codebase such as haproxy or LVS.

Keep in mind that most graphite installs won't remotely come close to 3.5+ million metrics every 10 seconds and 25 million metrics per minute is a pretty large number of incoming datapoints. We found that carbon struggled massively while opentsdb along with cassandra were fine (with appropriate tuning obviously). I know the hostedgraphite.com guys use Riak to do the exact same thing and their experience mimics mine, hence them building their own backend ontop of Riak. Carbon is a great idea, but fundamentally, twisted doesn't do what carbon-relay or carbon-aggregator were built to do when hit with sustained and heavy throughput. Much to my chagrin, concurrency isn't one of python's core competencies.

Sorry for the long response, but for a LOT of scale, carbon is just fundamentally not a wonderful piece of code.

@ccope

This comment has been minimized.

Show comment
Hide comment
@ccope

ccope Jul 15, 2014

@SEJeff what about using round-robin DNS (or haproxy) with a scalable tier of just relay nodes? I suppose you might have to stop your relays when adding new cache nodes until their configs are updated, but that seems like a minor annoyance (collectd on the hosts should buffer data until the relays are available again...)

ccope commented Jul 15, 2014

@SEJeff what about using round-robin DNS (or haproxy) with a scalable tier of just relay nodes? I suppose you might have to stop your relays when adding new cache nodes until their configs are updated, but that seems like a minor annoyance (collectd on the hosts should buffer data until the relays are available again...)

@steve-dave

This comment has been minimized.

Show comment
Hide comment
@steve-dave

steve-dave Sep 1, 2014

Member

@SEJeff Thanks for sharing your experience above!

Member

steve-dave commented Sep 1, 2014

@SEJeff Thanks for sharing your experience above!

@obfuscurity

This comment has been minimized.

Show comment
Hide comment
@obfuscurity

obfuscurity Sep 1, 2014

Member

With regards to "what is the future of Carbon", the megacarbon branch was carefully merged in graphite-web some time ago but was never merged into master here in the Carbon project. That work remains and it won't be easy.

Per a recent conversation with @mleinart, the process will look something like this:

$ git log --pretty=oneline 5a286f5..master | wc -l
      77
$ git merge ~77
  ... fix conflicts ...
$ git merge ~76
  ... fix conflicts ...

Insert joke about 99 bottles of beer on the wall.

Member

obfuscurity commented Sep 1, 2014

With regards to "what is the future of Carbon", the megacarbon branch was carefully merged in graphite-web some time ago but was never merged into master here in the Carbon project. That work remains and it won't be easy.

Per a recent conversation with @mleinart, the process will look something like this:

$ git log --pretty=oneline 5a286f5..master | wc -l
      77
$ git merge ~77
  ... fix conflicts ...
$ git merge ~76
  ... fix conflicts ...

Insert joke about 99 bottles of beer on the wall.

@toni-moreno

This comment has been minimized.

Show comment
Hide comment
@toni-moreno

toni-moreno May 4, 2016

Closed by leak of activity.

toni-moreno commented May 4, 2016

Closed by leak of activity.

@toni-moreno toni-moreno closed this May 4, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment