Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument Kibana with Elastic APM #43548

Open
wants to merge 7 commits into
base: master
from

Conversation

@watson
Copy link
Member

commented Aug 19, 2019

Summary

Instruments Kibana with Elastic APM by adding the Node.js and JavaScript RUM agents to the source code. The agents are not turned on by default but can be enabled by setting the environment variable ELASTIC_APM_ACTIVE=true or by creating an apm config file called config/apm.dev.js and setting active: true inside of it.

This implementation is not meant to be used by end-users of Kibana as it lacks integration with the regular Kibana config file. For now, this is meant as a useful internal tool for Elastic employees when developing Kibana.

By default, it's pre-configured with a serverUrl pointing to an APM Server hosted on Elastic Cloud. The data is stored in an ES cluster accessible only by Elastic employees. These defaults can easily be overwritten using environment variables or via the custom config file.

Closes #32491

Checklist

  • Add yarn apm snapshot command to boot up local APM Server Update: Turns out this is pretty complicated, so let's not do this during this PR
  • Integrate with native Kibana logger Update: The native Kibana logger doesn't support %s, %d etc.
  • Figure out why span stack traces cannot be captured (fixed by elastic/apm-agent-nodejs#1299)
  • Get rid of "Elastic APM agent is inactive due to configuration" message in console when agent isn't active (fixed by elastic/apm-agent-nodejs#1300)
  • Allow config file to be loaded via config option (fixed by elastic/apm-agent-nodejs#1303)
  • Lock down Kibana when an uncaught exception occurs so it can be sent without Kibana accepts any incoming HTTP requests etc Update: This doesn't seem feasible in this iteration
  • Release new version of the Node.js agent with the above changes and bump the dependency here
  • Send all data collected during development to a shared APM Server and Kibana instance
  • Add Elastic APM RUM agent Update: pushed to a separate PR
  • This was checked for cross-browser compatibility, including a check against IE11
  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • This was checked for keyboard-only and screenreader accessibility

For maintainers

@watson watson force-pushed the watson:apm branch from 1d9671e to 462ed94 Aug 19, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Aug 19, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Aug 20, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Aug 21, 2019

@watson watson self-assigned this Aug 22, 2019

@watson watson marked this pull request as ready for review Aug 23, 2019

@watson watson requested a review from elastic/kibana-platform as a code owner Aug 23, 2019

@watson watson force-pushed the watson:apm branch 2 times, most recently from 04d00aa to 458104e Aug 23, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Aug 23, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Aug 23, 2019

@watson watson force-pushed the watson:apm branch from 458104e to ebf465e Aug 26, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Aug 26, 2019

@watson watson force-pushed the watson:apm branch from ebf465e to 268c8cd Aug 26, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Aug 26, 2019

@watson watson force-pushed the watson:apm branch from 268c8cd to d8b8eae Sep 2, 2019

@watson watson force-pushed the watson:apm branch from d8b8eae to 2d94ee3 Sep 2, 2019

@watson watson requested a review from elastic/kibana-operations as a code owner Sep 2, 2019

@watson watson force-pushed the watson:apm branch from 2d94ee3 to 8178d9e Sep 2, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Sep 2, 2019

@watson watson force-pushed the watson:apm branch from 8178d9e to 5a6f4df Sep 2, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Sep 2, 2019

@mistic
mistic approved these changes Sep 3, 2019
Copy link
Member

left a comment

LGTM from the operations team changes perspective

@jbudz

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

I know there was some discussion here and will defer to @elastic/kibana-security, but I'd push for the default localhost or the apm server to allow anonymous requests. Mostly to avoid us showing up on any of the lists that scrape github.

@legrego

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

I know there was some discussion here and will defer to @elastic/kibana-security, but I'd push for the default localhost or the apm server to allow anonymous requests. Mostly to avoid us showing up on any of the lists that scrape github.

I agree with @jbudz that we should either default to localhost, or include no default and require the developer to specify their own endpoint if they choose to enable APM. End-users would need to do this anyway if/when we make this a public feature, so I don't feel like it's that big of an ask. Exposing the URL via environment variable makes it easy for folks to setup their development environment once, without messing with their config files if they don't want to.

@tylersmalley

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

@legrego, the idea for when this would be made public would be that we would be the ones collecting this information to understand the performance metrics of Kibana in the wild. Having the default provided seems to be the only way to accomplish this. Is there a security issue we see with including this default?

@tylersmalley

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Tested the PR, looks good to me. There are a couple things which would be nice to have that can be a follow-up PR which I am happy to do if it's not done here:

  • Enable for CI: We can set environment variables here
  • Add hostname/IP tagging to help isolate your own metrics.
@watson

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2019

Add hostname/IP tagging to help isolate your own metrics.

We already label all collected APM data with the Kibana uuid. You can use the query bar in the APM UI to filter for your data using this uuid. We also collect the hostname, but that's not guaranteed to be unique.

@watson

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2019

Enable for CI [...]

@tylersmalley Is this so that we also get APM data collected from the Kibana instances running under CI? That would be interesting and give is a lot of coverage hopefully. All that's needed is to set ELASTIC_APM_ACTIVE=true

@tylersmalley

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Yeah, just exporting that in setup.sh is all which is needed for that.

@watson

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2019

@jbudz @legrego From a security perspective I see two areas worth discussing:

  1. We expose the secretToken to the APM Server which means that someone could send us garbage data. However, they will not be able to read any data, only write, so this would at worst be annoying. After discussing this with other Kibana developers we decided to postpone worrying about this only if it ever became a problem. But I'm open to reconsidering this if it's necessary.
  2. We want to centrally collect performance metrics of core Kibana developers running Kibana on their development machines so that we can get a better picture of the Kibana performance bottlenecks. To get as many core devs to do this as possible we decided to bake in the serverUrl of a shared cluster. Alternatively, each dev would need configure this themselves, which we deemed would lead to fewer people doing it. It's worth noting that this feature is opt-in as you need to activate the APM agent before it collects any data and I've tried to document what will happen if you do so. Could you possibly explain the risks of this approach a bit more if you think it's a blocker?
@legrego

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

@legrego, the idea for when this would be made public would be that we would be the ones collecting this information to understand the performance metrics of Kibana in the wild. Having the default provided seems to be the only way to accomplish this. Is there a security issue we see with including this default?

We want to centrally collect performance metrics of core Kibana developers running Kibana on their development machines so that we can get a better picture of the Kibana performance bottlenecks. To get as many core devs to do this as possible we decided to bake in the serverUrl of a shared cluster. Alternatively, each dev would need configure this themselves, which we deemed would lead to fewer people doing it. It's worth noting that this feature is opt-in as you need to activate the APM agent before it collects any data and I've tried to document what will happen if you do so

I see, thanks for clarifying @tylersmalley & @watson. I was under the impression that this would allow users to ship their data to their own APM instances for their own use, and that the hard-coded URL was a shortcut for developers to get up and running. This sounds more like an extension of our telemetry collection then, although it's configurable in such a way that end-users could ship to their own clusters if they wanted to.

(tl;dr I should have read the very well written docs a bit closer, sorry about that @watson)

We expose the secretToken to the APM Server which means that someone could send us garbage data. However, they will not be able to read any data, only write, so this would at worst be annoying. After discussing this with other Kibana developers we decided to postpone worrying about this only if it ever became a problem. But I'm open to reconsidering this if it's necessary.

I'm struggling with the secretToken -- by specifying this in a public repository, it's effectively not secret. Does it buy us anything to keep it in knowing that it's a public value? I worry that it'd create noise like @jbudz alluded to.

@tylersmalley

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

I see with the RUM agent we only require the server URL. What is the reason the RUM agent doesn't require the secret when the rest of the agents do?

Regarding being able to send garbage data. I just want to point out that this is the case with most analytics services. For instance, with Google Analytics you must embed your tracking ID into the front-end code which is used to identify the site. You could use this to send garbage data as well.

@watson

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2019

Regarding the secret token being in the source code:

Yes, this is a bit of a hack. Normally an APM Server can be configured to not require a secret token, but since we host this specific APM Server on Elastic Cloud, I don't think it's possible to change that part of the configuration. But I might be mistaken.

The JS RUM agent uses a different endpoint than the Node.js agent and since the RUM agent is designed to be baked into applications that end-users have access to, there's no way to effectively hide a secret token inside the source code. So we simply don't require a secret token for requests to the RUM intake API on the APM Server. Instead, that API is more heavily rate limited.

@legrego

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

Yes, this is a bit of a hack. Normally an APM Server can be configured to not require a secret token, but since we host this specific APM Server on Elastic Cloud, I don't think it's possible to change that part of the configuration. But I might be mistaken.

I see. How do you feel about documenting something to this effect where we define the secretToken, so that casual observers don't try to file security reports against it?

@vigneshshanmugam vigneshshanmugam referenced this pull request Sep 4, 2019
7 of 13 tasks complete

@watson watson force-pushed the watson:apm branch from 5a6f4df to d455b4e Sep 10, 2019

@watson

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2019

@tylersmalley I think I activated the apm agent during CI runs now in setup.sh. But I'm not 100% sure I did it in the right location. Let me know if you think it's ok.

@legrego I added a disclaimer as you suggested next to the hardcoded secretToken

src/dev/ci_setup/setup.sh Outdated Show resolved Hide resolved
src/dev/ci_setup/setup.sh Outdated Show resolved Hide resolved
watson and others added 5 commits Aug 19, 2019
Fix export of APM ci env vars
Co-Authored-By: Tyler Smalley <tylersmalley@me.com>

@watson watson force-pushed the watson:apm branch from 28565f1 to c277b5f Sep 10, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Sep 10, 2019

@elasticmachine

This comment was marked as outdated.

Copy link
Contributor

commented Sep 10, 2019

@tylersmalley

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

Well, the failures we were seeing in master are resolved - but appears there are still issues with this branch. Looks like the failures started when we enabled APM in CI. I am also seeing some errors:

(node:48893) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
 error  [23:02:20.695] [warning][process] Authorization Exception :: {"path":"/.apm-agent-configuration","query":{},"statusCode":403,"response":""}
    at respond (/Users/tyler/elastic/kibana/node_modules/elasticsearch/src/lib/transport.js:315:15)
    at checkRespForFailure (/Users/tyler/elastic/kibana/node_modules/elasticsearch/src/lib/transport.js:274:7)
    at HttpConnector.<anonymous> (/Users/tyler/elastic/kibana/node_modules/elasticsearch/src/lib/connectors/http.js:166:7)
    at IncomingMessage.wrapper (/Users/tyler/elastic/kibana/node_modules/elasticsearch/node_modules/lodash/lodash.js:4929:19)
    at IncomingMessage.emit (events.js:194:15)
    at endReadableNT (_stream_readable.js:1103:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)

I can help look into it further tomorrow.

@watson

This comment has been minimized.

Copy link
Member Author

commented Sep 11, 2019

@tylersmalley ah yes, I actually recall seeing this previously when the APM agent was active by default all the time. Not sure what's going on either, but I'll try to have a look as well. But last I checked I couldn't figure it out, so I'd love to spar with you on this one if you have time at some point.

@elasticmachine

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.