Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with Elastic Agent #4004

Closed
14 of 15 tasks
axw opened this issue Jul 21, 2020 · 20 comments
Closed
14 of 15 tasks

Integrate with Elastic Agent #4004

axw opened this issue Jul 21, 2020 · 20 comments
Assignees
Labels
Milestone

Comments

@axw
Copy link
Member

axw commented Jul 21, 2020

Things to investigate/consider

  • elastic-agent configuration
  • elastic-agent program spec
  • indexing strategy (e.g. include service name in data stream name? specific data sets for transaction metrics, breakdown metrics, etc.?), constant_keyword fields, reducing flexibility in configuration
  • how to read/write non-timeseries data (tail-based sampling, API Keys, source maps)
  • how to integrate with cloud-on-k8s (depends on working group) (WIP, tracked separately)
  • upgrade plan to 8.0 (will be tracked separately)

Update 03/11, 08/12 (Juan)

Main lines of work required


Issues

This list is non exahustive. Check comments for links to PRs done in relation to the APM package.

Required

Closed issues

@jalvz
Copy link
Contributor

jalvz commented Sep 10, 2020

@axw you mentioned that you had some details you wanted to add here, right? or did I misinterpret?

@axw
Copy link
Member Author

axw commented Sep 14, 2020

@jalvz I had previously added a list, but I think I forgot to save and it was lost. I'll add things as I remember them 😅
Feel free to add things you think we should look at too.

@jalvz
Copy link
Contributor

jalvz commented Oct 21, 2020

  • In Fleet / Ingest Manager, the top level entity in an integration is the data stream. Each integration can have 1 or more data streams. Each data stream has a type. Currently there are 2 types defined in the spec: logs and metrics.

  • Each data stream can have any number of inputs. Every input has its own settings, although there is a way to associate settings to a given data stream and have them applied to all its inputs. There is no way however to go up 1 level and have a setting mapped to the whole integration.

  • Data streams and inputs are defined in manifest.yml files for each package. Both data streams and inputs can be enabled or disabled. Kibana will parse the manifest file and display the information in 2 levels, with a toggle switch for each.
    This is how it looks for the Apache integration:

image

Note that the hosts setting applies to the whole metrics data stream, that currently only has 1 input (status metrics).


Now we need to define an APM integration in terms of data streams and inputs with the structure described above. I'm going with the assumption that we will have only one APM integration for apm-server and APM agents.

Following the Indexing Strategy proposal, we are going to introduce a new traces type, and use both existing metrics and logs types. The most suitable option then is to treat APM agents as inputs.
Then, to configure eg. the Java agent, one would find some settings under traces, some other under metrics, etc.
We could further decide that the inputs under traces have general and tracing-related settings, and the inputs under the other data streams (metrics, logs) have settings specific to that type only.

From there, 2 options come to mind:

  • We could adopt the convention of having apm-server configuration under traces, and make sure that the traces data stream can't be disabled in the UI. We would also need the UI to propagate that configuration even if no input is enabled (this is currently not supported in Ingest Manager).

A simplified version of this would look like:

image

  • Alternatively we could treat apm-server like a separate input. In that case its settings will be scattered across the data streams like for APM agents, we would still need to pick one type for generic settings (which are almost all), and we would still need the UI to enforce that specific input (ie. don't allow users to disable apm-server).

@jalvz
Copy link
Contributor

jalvz commented Oct 21, 2020

@elastic/apm-server @felixbarny @ruflin looking for feedback on comment above, if anything of what I wrote makes any sense.

@graphaelli
Copy link
Member

I would vote for the simplest UI that gets APM Server and the Agents deployed. If possible, just a Monitor applications with a toggle per language, deferring all opting in/out of trace, log, profile, metric collection and any configuration customization to central configuration. Picking on these screenshots, I would not expect to see secret token reporting interval max spans per transaction at this point in the integration setup.

@axw
Copy link
Member Author

axw commented Oct 22, 2020

I agree with @graphaelli's comments above, with the exception of maybe secret_token. That's something that's defined in the server's config, so I think it does belong in that UI... Maybe we shouldn't add it though, and instead only provide an option to enable/disable API Key auth?

I don't think it makes a heap of sense to opt in/out of individual data types for APM agents at this level, since it doesn't give a way of filtering by service. You might have multiple Java services, but want to enable logs for only a handful of them. This is different to existing integrations, where the integration is tied to a specific service (apache, nginx, etc.)

Going further, I think the only agent-related config that should go in the package should be to do with auto-instrumentation: whether to enable/disable per language, and perhaps some parameters like a pattern for matching process names. @felixbarny WDYT?

@jalvz
Copy link
Contributor

jalvz commented Oct 22, 2020

Thanks all,

I don't intend to add any agent related stuff at this point at all, but since we have to make to room for it I wanted to show now how it could look like. I don't have an opinion on what settings should be there, this was just an example.

There is no way around the toggle for the data types and inputs (that is what I tried to explain in standup yesterday @axw), I agree it doesn't make sense for us.

I guess my biggest question now is whether to treat apm-server as an input (confusing UI IMO), or just have all its settings under eg. the traces data stream (needs additional support in Ingest Manager to have those settings propagated downstream even if there are 0 inputs).

@felixbarny
Copy link
Member

I don't think it makes a heap of sense to opt in/out of individual data types for APM agents at this level

++
I think we should not constrain ourselves with sticking to an input per data stream. Instead, let's think about what would be an ideal solution from the perspective of the user.

I've created a document with a proposal of how to integrate agents in the workflow that includes some mockups:
https://docs.google.com/document/d/1b_Fy0KcgQBer_jQpgCjVaIbsg8--oGX_XqNy6mSONsU/edit?usp=sharing

I agree with @graphaelli's comments above, with the exception of maybe secret_token.

Do we need a secret token? When APM Server is installed locally, could we make it to only accept connections from localhost? On k8s, can we inject (via a mutating webhook) the secret_token a environment variable from a k8s secret?
See also this section of the proposal doc.

deferring all opting in/out of trace, log, profile, metric collection and any configuration customization to central configuration.

++
An exception to that rule is if we'd want users to set options that are not dynamic, meaning they have to be known upfront, before querying remote config.

@jalvz
Copy link
Contributor

jalvz commented Oct 22, 2020

I think we should not constrain ourselves with sticking to an input per data stream.

Not sure if I understand, I think we are not doing that?

let's think about what would be an ideal solution from the perspective of the user

SGTM, but note that the proposal in the doc you linked (one data stream per agent, which is what I first thought of) might not be very compatible with how Ingest Manager works now. Eg. what is the type of the agent data stream, etc.
Depending on what are the release expectations that can be more or less ambitious, so for now I just wanted to find the path of least resistance to move forward (progress over perfection and all that).

@felixbarny
Copy link
Member

In the very first iteration, I think it's probably fine to not have any agent related inputs. The next step would be to include documentation on how to install the agents (similar to the add data dialog). On top of that, we can add controls for auto attachment of agents.

@axw
Copy link
Member Author

axw commented Oct 22, 2020

Do we need a secret token? When APM Server is installed locally, could we make it to only accept connections from localhost?

Good point, we probably don't need it to start with. Perhaps later we could orchestrate secrets (e.g. server generates secret_tokens and/or API Keys, and passes them to instrumented apps). It's probably not high priority since it's localhost-only.

@jalvz reiterating my statement from standup for posterity: I'm on board with squishing the config into an arbitrary (i.e. traces) data stream config as a way to move forward, and working with the Fleet team to improve the UX.

@axw
Copy link
Member Author

axw commented Oct 22, 2020

Copying a little bit of conversation from Slack. I failed to write down some thoughts I had swimming in my mind which led to some confusion.

We talked a bit about needing a way to configure APM Server for hosted cases, e.g. I may want to run an APM Server for RUM, Lambda, etc., where APM Server does not/cannot run on the same host. I'm thinking we might want two separate packages: one that is all about Hosted APM Server, and one that is focused more on APM Agents, where APM Server is configured transparently in order to service the agents.

@jalvz
Copy link
Contributor

jalvz commented Oct 22, 2020

Ok, so let's elaborate that. The integrations page would offer something like:

image

Click on the left one:

image

Again, that is just an example. But I was also thinking that it would make sense (eventually) to include agent settings not supported by central config?
Anyways, move on to the APM Server integration:

image

Now maybe is more obvious what I mentioned about confusing UI:

  • We have 2 levels ("Collect application traces" -> "APM Server") that we don't need. We can change the titles, but there is no way around the 2 levels. I wouldn't bet on a "simple" fix for that in Ingest Manager, but maybe I am wrong.
  • In addition of the problematic toggle switch for the traces, we have now a problematic toggle switch for the APM Server input as well, which doesn't make sense.

Having 2 similar-yet-different ways to install/manage APM Server means that we need 2 ways of handling configuration coming from Elastic agent in the apm-server code, more documentation, user confusion, etc.

All in all, I think the option with best trade-offs is still the initial suggestion.

@jalvz
Copy link
Contributor

jalvz commented Oct 22, 2020

Answering myself here and cross-posting my confusion: we might need the apm-server input after all, as otherwise templates will not be installed if there are no inputs...

@jalvz
Copy link
Contributor

jalvz commented Nov 23, 2020

@jalvz
Copy link
Contributor

jalvz commented Dec 1, 2020

Update 01 Dec

Work that was WIP at the time of the last status update has been finished. This means: Kibana support for top level variables, a way for us to generate a package out of our fields.yml and some code refactoring to have data streams and ILM not interfer with each other. This is needed eg. to ensure that setup subcommands don't work when data streams are enabled.

The main body of work currently WIP is #4473. The most important thing that it brings to the package are pipelines.
APM Server uses pipelines to drop span metadata (usually not needed), add an ingest timestamp, and enrich events with user agent and geo data.
There was a misunderstanding in that the apm package expected to define pipelines outside of data streams (following the described format) because some of them need to run for different event types. However that feature is not actually implemented. We should be able to move forward copying the pipelines to each relevant data stream, so this is not a blocker.

The templates currently generated by the apm package seem all right and good. Once we wire the result of kibana/83878 down to the apm-server config reloader, and verify that the whole thing (hopefully!) works, then we can finally copy the package to elastic/integrations and take it from there.

@jalvz
Copy link
Contributor

jalvz commented Dec 8, 2020

Update 08 Dec

Work that was WIP at the time of the last status update has been finished. It took a couple of rounds of fixes to get pipelines working; and with the addition of 4486 we got now all the pieces together.
Installation, ingestion and the APM UI (with index setting changes) seem to work fine together, but more exhaustive testing is needed.

I opened apm-server issues to iterate on docs and testing, and to add icons/screenshots (to be worked on next). Also filed package-registry/659 to improve pipeline support in Fleet.

First package should come to the snapshot registry hopefully soon via package-storage/688, which is now waiting on a pending registry release.

@ph
Copy link
Contributor

ph commented Dec 10, 2020

Thanks @jalvz for the updates!

@jalvz
Copy link
Contributor

jalvz commented Dec 16, 2020

Update 16 Dec

Work that was WIP at the time of the last status update has been finished. We have the APM package version 0.1.0-dev.1 in the snapshot registry and 0.1.0-dev.2 is on its way bringing more configurations (and an icon).

We identified and fixed a permission issue in Fleet, which now needs an enhancement to smooth out the experience for 7.12.

There is now support in apm-integration-testing to run apm-server under Agent, and we will continue to iterate on that in the upcoming weeks and flip the switch to run apm-server under agent by default (that is, in apm-integration-testing).

This week there has been also good progress adding tests in apm-server for the package; and last but not least, a new proposal for sourcemapping is out for review (this has POC implementation already).

In the server weekly today we decided to not release the APM package in 7.11, and instead use a whole release cycle to improve, test and play with it. The reason is that the package as-is doesn't bring a huge benefit to existing APM users, they would just have some features less and a bit more of friction.
All code is in place and has been backported to 7.x.

Therefore I will close this issue soon and open a new meta issue for tracking 7.12 goals.

@jalvz
Copy link
Contributor

jalvz commented Dec 16, 2020

Forgot to mention that there is an ongoing discussion about whether or not to bundle apm-server with Agent.

I'll close this now in favor of #4558, thanks for following 👋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants