Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra] Create service in apm_data_access to fetch APM colleced hosts #189046

Closed

Conversation

crespocarlos
Copy link
Contributor

@crespocarlos crespocarlos commented Jul 24, 2024

closes 188752

Summary

Create a service in APM data access that will return host names collected by APM agents. This change is key to allow the hosts view to list only hosts that are monitored with the system module #188756.

I've made some changes to the Hosts API, simplifying the code. The payload has been slightly changed.

How to test

  • Start a local kibana and es instances
  • Run node scripts/synthtrace infra_hosts_with_apm_hosts --live
  • Navigate to Infrastructure > hosts

@obltmachine
Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@crespocarlos crespocarlos added release_note:skip Skip the PR/issue when compiling release notes Feature:ObsHosts Hosts feature within Observability Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team v8.16.0 labels Jul 24, 2024
@crespocarlos
Copy link
Contributor Author

/ci

@crespocarlos
Copy link
Contributor Author

/ci

@crespocarlos
Copy link
Contributor Author

@elasticmachine merge upstream

@crespocarlos
Copy link
Contributor Author

/ci

@crespocarlos crespocarlos marked this pull request as ready for review July 25, 2024 09:14
@crespocarlos crespocarlos requested review from a team as code owners July 25, 2024 09:14
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label Jul 25, 2024
@crespocarlos
Copy link
Contributor Author

@elasticmachine merge upstream

const getCoreStart = () => core.getStartServices().then(([coreStart]) => coreStart);
const getResourcesForServices = async () => {
const coreStart = await getCoreStart();
const soClient = coreStart.savedObjects.createInternalRepository();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried changing the APM index patterns on the APM settings page and see if the correct indices are returned with this createInternalRepository?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested it and it worked. APM also uses it here here. It could be due to the fact that when APM indices are changed on the Settings page, Kibana does a full page refresh.

@crespocarlos
Copy link
Contributor Author

@elasticmachine merge upstream

const { apmIndices, esClient } = await params.getResourcesForServices();

const esResponse = await esClient.search({
index: [apmIndices.metric, apmIndices.transaction, apmIndices.span, apmIndices.error],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be slow, because you're aggregating over all data. Even if it's just a single service, you'll run into performance issues. What would be more efficient here is if you figure out what data source to query first (tx metrics or raw events, 1m vs 10m and 60m), and then execute it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be the same thing that the get_document_sources does. We'd have to move that to apm_data_access. Is that your idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, how involved would that be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a relatively big change. A few things would have to be moved (could be more):

document_type, get_document_sources and the types they depend on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds fine to do it in a follow-up but I'd like to make it happen, to set a good first example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, makes sense. I'm going to get a sense of how much effort it would be. it might as well be doable in this PR.


export function createGetHostNames(params: RegisterParams) {
return async ({ from: timeFrom, to: timeTo, query, limit }: ServicesHostNamesParams) => {
const { apmIndices, esClient } = await params.getResourcesForServices();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stay away from patterns like this, and expect the consumer to pass in asynchronously resolved dependencies, such as getting the APM indices. If we don't do that, we'll end up with a ton of small requests - I see this happen with e.g. the ML plugin that does a ton of small requests because it encapsulates privilege and other checks in each service call. For instance: #161229

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was just having second thoughts about this approach.

Copy link
Contributor Author

@crespocarlos crespocarlos Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though it's kind of weird asking the consumer to pass the APM indices to a service in the apm_data_acess plugin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a little weird, and there might be a better way (e.g. to have a per-request service that does per-request caching, but that also might create some risks).

Copy link
Contributor Author

@crespocarlos crespocarlos Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really see the risks of getting the APM indices when calling a service in the apm_data_access, what could happen? if we just get the APM indices there, do you think it would still end up in the same situation from ML plugin?

I'm more concerned about the fact I'm using the internal user to get both apm indices and typed esClient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside is potentially requesting APM indices twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside is potentially requesting APM indices twice.

This is my (only) concern. It might end up being more than twice however. Especially in the context of e.g. rule executors this becomes more important. My experience is that if we abstract away async operations, people lose awareness of what is happening as part of the service call, and it leads to performance bottlenecks. I'd be less concerned about this if we have really good insight in production bottlenecks, but we don't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the lack of better alternatives, I'll leave it up to the consumer the send the APM indices for now. It's bad DX but for the time being it's better than end up in situation where it might lead to performance problems.

aggs: {
hostNames: {
terms: {
field: 'host.name',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a constant for this somewhere (HOST_NAME).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in the apm_data_access. we need to either move the constants from apm plugin here or create a new constants file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's move the constants then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that in a follow up PR, over 250 files will be changed after this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package might be better at that point, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. When doing this #189046 (comment), some types could also be moved to this new package.

@crespocarlos crespocarlos requested a review from a team as a code owner July 29, 2024 21:07
@kibana-ci
Copy link
Collaborator

kibana-ci commented Jul 29, 2024

💔 Build Failed

Failed CI Steps

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
apmDataAccess 9 10 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
infra 1.5MB 1.5MB -87.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
apmDataAccess 0 1 +1
Unknown metric groups

API count

id before after diff
apmDataAccess 9 10 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@crespocarlos
Copy link
Contributor Author

I'll open a new PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:project-deploy-observability Create an Observability project Feature:ObsHosts Hosts feature within Observability release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Infra][APM] Create a service in apm_data_access to provide APM collected host names
7 participants