Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmful processor registration #230

Closed
jalvz opened this issue Oct 11, 2017 · 1 comment
Closed

Harmful processor registration #230

jalvz opened this issue Oct 11, 2017 · 1 comment
Labels

Comments

@jalvz
Copy link
Contributor

jalvz commented Oct 11, 2017

I'd like to revisit the way in which we initialize processors. This has been discussed before, but I think circumstances (and general knowledge about the project) changed over the last months so IMO this worths a second look. Ill try to explain myself as best I can.

Right now, each processor registers itself in an init function, and we make sure those functions are executed with a make command ( https://github.com/elastic/apm-server/blob/master/Makefile#L39 ) that generates a go file just with blank imports, and then we blank-import that file wherever is needed:

_ "github.com/elastic/apm-server/include"
_ "github.com/elastic/apm-server/include"
_ "github.com/elastic/apm-server/include"

The original goal of this was to support arbitrary plugins, so that anyone could "plug" in a processor without having to worry about the rest. This is inspired by other beats.
However there are a few caveats applying this approach to the apm server:

  • Since the apm server is not self-contained, you can't be oblivious to the agents and the tailored UI. The server is in middle of a 3-steps contract. You can't just "plug" a processor and expect to it work without considering how might impact other agents, how to spec it out, how to test it end-to-end, what happens to the curated dashboards, etc. You need the big picture no matter what.

  • Is hard to picture these plugins. I expect the biggest room for growth in APM is by adding agent support for more languages and frameworks, rather than customized features in the server.
    But even that, features in the apm server don't necessarily fit the processor pattern of "ingest data over HTTP, then pipe it to ES". Consider for instance sourcemaps or the onboarding document...

  • Even if we want to facilitate plugin development, we should make an use-case based effort to address them so we have a more concrete idea of what do they entail, how they work, what value they provide, etc.; instead of prematurely lay out the codebase in certain way to accommodate some (more or less blurry) expectations.

That said, I found the processor initialization to be a problem while developing #227

  • I wanted to create eg. a frontend/transaction package that could reuse transaction stuff, but the empty include trick wouldn't pick it up. This limits me how to write processors.

  • I got stuck with some failing tests for quite some time because i forgot to run make update first so to import the right thing (aka to trigger the right side effect). This is very counter-intuitive, you just have to know it. At some point I got myself into a situation wether either make update worked and make unit didn't, or the other way around.

Admittedly, blank identifier imports are hacky. A command that generates a go file with just blank imports is even more hacky. Exceptions to the rule on already hacky code is definitely a concern:

if protocol == "model":

Another concern is the bug that we saw in the past about mixed tracing data. This was because processors were holding state, while only one processor instance for each endpoint is created for the entire life-cycle of the server in the init method.
It is very easy and tempting to add state to processors (they are just structs), without realising how dangerous can it be, and I fear it can happen again.

TL;DR

I think we can simplify a lot the code, remove all this workarounds / side effects and have safer processors just by registering them centrally in the beater.

This way you can have all the endpoint-processor-handlers mapping in the same one place, and the implementation details (what processors do) are left in the processor packages.

I don't think that facilitating plugins development is a pressing need, and we can always tackle it in due time.

My attempt to solve this is at #229

@roncohen
Copy link
Contributor

I agree that registering centrally could be way simpler. I seems to recall that we had a problem with circular imports, but if circular imports is not a problem, I'm +1 on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants