Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statsd needs some serious refactoring #529

Closed
coykitten opened this issue Jul 17, 2015 · 10 comments
Closed

Statsd needs some serious refactoring #529

coykitten opened this issue Jul 17, 2015 · 10 comments

Comments

@coykitten
Copy link
Contributor

After my second go at fixing #462 (EADDRINUSE on activated config reload) I think that this issue is worth raising:

etsy/statsd needs some serious refactoring if it is going to continue to grow/develop and generally be useful.

The issue with #462 is simple: the admin tcp interface attempts to be naively recreated on config reload without detecting whether one is already running. In the attached pr, which is nowhere near complete, you can see I've done the sane thing and attempted to pull the telnet server out into its own module that we can require in statsd-core (stats.js). This has fixed #462 but now new issues are arising, namely that the admin interface does not receive the new config following a live config reload (This also applies to the other many many globals in stats.js such as the metricsHash).

I understand that this application was originally envisioned when node.js was in infancy but we are at a point where we need to refactor this into a modern javascript project. I'd like to argue that Statsd is no longer a "simple daemon for stats aggregation," it outgrew that box a long time ago and we need to compartmentalize the code better. Currently it is near impossible to safely extend the current statsd-core code. I would like to envision a future where we have statsd as a proper node_module, following Common.js, where developers can require('statsd'); to include a statsd based metric server in their node projects. For statsd, stand-alone, we'd include a proper bin/statsd that initializes and runs the entire app. A refactor like this would allow us to write fully testable code, something that is important to a piece of software that is relied on so much.

Such a refactor is very difficult without good test cases, or even testable code. I would like to open discussion into the possible re-architecting of stats.js, the core of statsd, into something that is testable, manageable and more importantly easy to change.

As an example, this could be accomplished by having stats.js export a global application object and appending all concerns to it. This way plugins need only a reference to the application object to change/check config values, interact with metric buckets, etc.

As we are the reference implementation of statsd I think we owe it to ourselves to grab the reigns here and actively guide the future of this open source project, one that is critical infrastructure for many groups and individuals.

All the above said, if anyone can figure out why the telnet server in the current pr doesn't receive the newest config on a config reload that would be a great step in the direction that we need to go.

@coykitten
Copy link
Contributor Author

@shindere

@till
Copy link
Contributor

till commented Jul 18, 2015

Generally you are right and it often seems like a refactoring is needed.

I personally have no idea though how to “successfully” structure a JavaScript/nodejs app so I can’t say what needs to be done here. I can follow the example though for future contributions.

I would keep in mind that breakage from setup or maintenance of statsd is relatively minimal currently. At least in my (ops) experience. I would not want to trade this in for cleaner code.

Best of luck!

@ghost
Copy link

ghost commented Jul 18, 2015

We actually use the golang port of statsd in production. But I agree, it would be nice to be able to include statsd in other nodejs projects, as not everything can be done with forwarding the data to Graphite

@mcfunley
Copy link
Contributor

There should be a "Friday night coding meltdowns" tumblr for moments like this.

You're right of course, but what is needed is the incremental path from A to B.

One strategy:

  • Change one thing at a time, adding test coverage. This could be functional test coverage for the time being, although I hate functional test coverage and typically advise against it.
  • Validate those changes by updating statsd within Etsy and making sure nothing's completely messed up.
  • Repeat until we get to where we want to go.

Alternatively:

  • Put your proposed rewrite next to the working version, with a switch that enables it (like statsd --refactor or something).
  • Tee traffic to that and observe until we're confident that it's identical.
  • Delete the old version.

What you can't do is update it in place without validating it against production workloads.

@coykitten
Copy link
Contributor Author

It's really unfortunate as both strategies are very expensive. I'd probably opt for enabling new code paths with a flag until we're confident that a newer implementation is both identical and is on par or more performant.

@mcfunley
Copy link
Contributor

I would literally copy the existing code into a new file, make a switch that selects between the two, and bang on it with method #2 until it works. (I'd propose calling this the "Ross Snyder method.")

@mikhailov
Copy link

why not https://github.com/github/brubeck ?

@jroper
Copy link
Contributor

jroper commented May 12, 2016

Wow... I missed this last year, just got the notification that it was closed. So I contribute to statsd once, and now I'm blamed for the state of its codebase? I don't even remember contributing to statsd....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants