Harmful processor registration #230

jalvz · 2017-10-11T14:10:26Z

I'd like to revisit the way in which we initialize processors. This has been discussed before, but I think circumstances (and general knowledge about the project) changed over the last months so IMO this worths a second look. Ill try to explain myself as best I can.

Right now, each processor registers itself in an init function, and we make sure those functions are executed with a make command ( https://github.com/elastic/apm-server/blob/master/Makefile#L39 ) that generates a go file just with blank imports, and then we blank-import that file wherever is needed:

apm-server/main.go

Line 11 in 25d1d92

_ "github.com/elastic/apm-server/include"

apm-server/script/output_data/output_data.go

Line 10 in 027445d

_ "github.com/elastic/apm-server/include"

apm-server/beater/beater_test.go

Line 12 in 5f73b75

_ "github.com/elastic/apm-server/include"

The original goal of this was to support arbitrary plugins, so that anyone could "plug" in a processor without having to worry about the rest. This is inspired by other beats.
However there are a few caveats applying this approach to the apm server:

Since the apm server is not self-contained, you can't be oblivious to the agents and the tailored UI. The server is in middle of a 3-steps contract. You can't just "plug" a processor and expect to it work without considering how might impact other agents, how to spec it out, how to test it end-to-end, what happens to the curated dashboards, etc. You need the big picture no matter what.
Is hard to picture these plugins. I expect the biggest room for growth in APM is by adding agent support for more languages and frameworks, rather than customized features in the server.
But even that, features in the apm server don't necessarily fit the processor pattern of "ingest data over HTTP, then pipe it to ES". Consider for instance sourcemaps or the onboarding document...
Even if we want to facilitate plugin development, we should make an use-case based effort to address them so we have a more concrete idea of what do they entail, how they work, what value they provide, etc.; instead of prematurely lay out the codebase in certain way to accommodate some (more or less blurry) expectations.

That said, I found the processor initialization to be a problem while developing #227

I wanted to create eg. a frontend/transaction package that could reuse transaction stuff, but the empty include trick wouldn't pick it up. This limits me how to write processors.
I got stuck with some failing tests for quite some time because i forgot to run make update first so to import the right thing (aka to trigger the right side effect). This is very counter-intuitive, you just have to know it. At some point I got myself into a situation wether either make update worked and make unit didn't, or the other way around.

Admittedly, blank identifier imports are hacky. A command that generates a go file with just blank imports is even more hacky. Exceptions to the rule on already hacky code is definitely a concern:

apm-server/script/generate_imports.py

Line 26 in d9e691f

if protocol == "model":

Another concern is the bug that we saw in the past about mixed tracing data. This was because processors were holding state, while only one processor instance for each endpoint is created for the entire life-cycle of the server in the init method.
It is very easy and tempting to add state to processors (they are just structs), without realising how dangerous can it be, and I fear it can happen again.

TL;DR

I think we can simplify a lot the code, remove all this workarounds / side effects and have safer processors just by registering them centrally in the beater.

This way you can have all the endpoint-processor-handlers mapping in the same one place, and the implementation details (what processors do) are left in the processor packages.

I don't think that facilitating plugins development is a pressing need, and we can always tackle it in due time.

My attempt to solve this is at #229

The text was updated successfully, but these errors were encountered:

roncohen · 2017-10-12T13:46:55Z

I agree that registering centrally could be way simpler. I seems to recall that we had a problem with circular imports, but if circular imports is not a problem, I'm +1 on this.

jalvz added the discuss label Oct 11, 2017

jalvz mentioned this issue Oct 12, 2017

Central processor registration #229

Merged

roncohen closed this as completed in #229 Oct 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harmful processor registration #230

Harmful processor registration #230

jalvz commented Oct 11, 2017 •

edited

Loading

roncohen commented Oct 12, 2017

Harmful processor registration #230

Harmful processor registration #230

Comments

jalvz commented Oct 11, 2017 • edited Loading

roncohen commented Oct 12, 2017

jalvz commented Oct 11, 2017 •

edited

Loading