You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to revisit the way in which we initialize processors. This has been discussed before, but I think circumstances (and general knowledge about the project) changed over the last months so IMO this worths a second look. Ill try to explain myself as best I can.
Right now, each processor registers itself in an init function, and we make sure those functions are executed with a make command ( https://github.com/elastic/apm-server/blob/master/Makefile#L39 ) that generates a go file just with blank imports, and then we blank-import that file wherever is needed:
The original goal of this was to support arbitrary plugins, so that anyone could "plug" in a processor without having to worry about the rest. This is inspired by other beats.
However there are a few caveats applying this approach to the apm server:
Since the apm server is not self-contained, you can't be oblivious to the agents and the tailored UI. The server is in middle of a 3-steps contract. You can't just "plug" a processor and expect to it work without considering how might impact other agents, how to spec it out, how to test it end-to-end, what happens to the curated dashboards, etc. You need the big picture no matter what.
Is hard to picture these plugins. I expect the biggest room for growth in APM is by adding agent support for more languages and frameworks, rather than customized features in the server.
But even that, features in the apm server don't necessarily fit the processor pattern of "ingest data over HTTP, then pipe it to ES". Consider for instance sourcemaps or the onboarding document...
Even if we want to facilitate plugin development, we should make an use-case based effort to address them so we have a more concrete idea of what do they entail, how they work, what value they provide, etc.; instead of prematurely lay out the codebase in certain way to accommodate some (more or less blurry) expectations.
That said, I found the processor initialization to be a problem while developing #227
I wanted to create eg. a frontend/transaction package that could reuse transaction stuff, but the empty include trick wouldn't pick it up. This limits me how to write processors.
I got stuck with some failing tests for quite some time because i forgot to run make update first so to import the right thing (aka to trigger the right side effect). This is very counter-intuitive, you just have to know it. At some point I got myself into a situation wether either make update worked and make unit didn't, or the other way around.
Admittedly, blank identifier imports are hacky. A command that generates a go file with just blank imports is even more hacky. Exceptions to the rule on already hacky code is definitely a concern:
Another concern is the bug that we saw in the past about mixed tracing data. This was because processors were holding state, while only one processor instance for each endpoint is created for the entire life-cycle of the server in the init method.
It is very easy and tempting to add state to processors (they are just structs), without realising how dangerous can it be, and I fear it can happen again.
TL;DR
I think we can simplify a lot the code, remove all this workarounds / side effects and have safer processors just by registering them centrally in the beater.
This way you can have all the endpoint-processor-handlers mapping in the same one place, and the implementation details (what processors do) are left in the processor packages.
I don't think that facilitating plugins development is a pressing need, and we can always tackle it in due time.
I agree that registering centrally could be way simpler. I seems to recall that we had a problem with circular imports, but if circular imports is not a problem, I'm +1 on this.
I'd like to revisit the way in which we initialize processors. This has been discussed before, but I think circumstances (and general knowledge about the project) changed over the last months so IMO this worths a second look. Ill try to explain myself as best I can.
Right now, each processor registers itself in an
init
function, and we make sure those functions are executed with a make command ( https://github.com/elastic/apm-server/blob/master/Makefile#L39 ) that generates ago
file just with blank imports, and then we blank-import that file wherever is needed:apm-server/main.go
Line 11 in 25d1d92
apm-server/script/output_data/output_data.go
Line 10 in 027445d
apm-server/beater/beater_test.go
Line 12 in 5f73b75
The original goal of this was to support arbitrary plugins, so that anyone could "plug" in a processor without having to worry about the rest. This is inspired by other beats.
However there are a few caveats applying this approach to the apm server:
Since the apm server is not self-contained, you can't be oblivious to the agents and the tailored UI. The server is in middle of a 3-steps contract. You can't just "plug" a processor and expect to it work without considering how might impact other agents, how to spec it out, how to test it end-to-end, what happens to the curated dashboards, etc. You need the big picture no matter what.
Is hard to picture these plugins. I expect the biggest room for growth in APM is by adding agent support for more languages and frameworks, rather than customized features in the server.
But even that, features in the apm server don't necessarily fit the
processor
pattern of "ingest data over HTTP, then pipe it to ES". Consider for instance sourcemaps or the onboarding document...Even if we want to facilitate plugin development, we should make an use-case based effort to address them so we have a more concrete idea of what do they entail, how they work, what value they provide, etc.; instead of prematurely lay out the codebase in certain way to accommodate some (more or less blurry) expectations.
That said, I found the processor initialization to be a problem while developing #227
I wanted to create eg. a
frontend/transaction
package that could reusetransaction
stuff, but the empty include trick wouldn't pick it up. This limits me how to write processors.I got stuck with some failing tests for quite some time because i forgot to run
make update
first so to import the right thing (aka to trigger the right side effect). This is very counter-intuitive, you just have to know it. At some point I got myself into a situation wether eithermake update
worked andmake unit
didn't, or the other way around.Admittedly, blank identifier imports are hacky. A command that generates a
go
file with just blank imports is even more hacky. Exceptions to the rule on already hacky code is definitely a concern:apm-server/script/generate_imports.py
Line 26 in d9e691f
Another concern is the bug that we saw in the past about mixed tracing data. This was because processors were holding state, while only one processor instance for each endpoint is created for the entire life-cycle of the server in the
init
method.It is very easy and tempting to add state to processors (they are just structs), without realising how dangerous can it be, and I fear it can happen again.
TL;DR
I think we can simplify a lot the code, remove all this workarounds / side effects and have safer processors just by registering them centrally in the
beater
.This way you can have all the endpoint-processor-handlers mapping in the same one place, and the implementation details (what processors do) are left in the
processor
packages.I don't think that facilitating plugins development is a pressing need, and we can always tackle it in due time.
My attempt to solve this is at #229
The text was updated successfully, but these errors were encountered: