Skip to content

Conversation

@swiatekm
Copy link
Contributor

What does this PR do?

For all variable provider types, we store mappings as untyped dictionaries and only convert them to AST when we're about to generate actual vars. This means we have to recompute all of it every time there's a change to any of the provider data, even if all the other data is unaffected. For example, if a Kubernetes Pod gains a new label, we need to regenerate vars for all Pods.

Instead, we can store mappings as AST to begin with. Then the only cost of emitting new vars after a change is updating a bunch of references.

Why is it important?

This unnecessary recomputation can be quite expensive when there's a lot of mappings from the dynamic provider. See the benchstat report below:

goos: linux
goarch: amd64
pkg: github.com/elastic/elastic-agent/internal/pkg/composable
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
                       │ bench_main.txt │          bench_dynamic.txt          │
                       │     sec/op     │   sec/op     vs base                │
GenerateVars100Pods-20     755.49µ ± 3%   76.87µ ± 2%  -89.82% (p=0.000 n=10)

                       │ bench_main.txt │          bench_dynamic.txt           │
                       │      B/op      │     B/op      vs base                │
GenerateVars100Pods-20    735.00Ki ± 0%   98.69Ki ± 0%  -86.57% (p=0.000 n=10)

                       │ bench_main.txt │          bench_dynamic.txt          │
                       │   allocs/op    │  allocs/op   vs base                │
GenerateVars100Pods-20     21.335k ± 0%   1.364k ± 0%  -93.61% (p=0.000 n=10)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas

Related issues

@swiatekm swiatekm added enhancement New feature or request skip-changelog backport-8.15 Automated backport to the 8.15 branch with mergify backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify labels Nov 21, 2024
@swiatekm swiatekm requested a review from a team as a code owner November 21, 2024 12:49
@swiatekm swiatekm mentioned this pull request Nov 21, 2024
2 tasks
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Nov 21, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm swiatekm force-pushed the fix/dynamic-provider-ast branch from 3c62d14 to 8cd14c8 Compare November 21, 2024 14:03
@swiatekm
Copy link
Contributor Author

/test

@swiatekm swiatekm force-pushed the fix/dynamic-provider-ast branch from 8cd14c8 to 6860e91 Compare November 21, 2024 18:34
@elastic-sonarqube
Copy link

func (a *AST) Equal(other *AST) bool {
if a.root == nil || other.root == nil {
return a.root == other.root
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit 🙂

Suggested change
}
}
// nil check above is mandatory as calling Hash() on a nil may cause a panic

Copy link
Contributor

@pkoutsovasilis pkoutsovasilis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code change LGTM. the CI failures are due to TestOtelAPMIngestion (#4148) and TestRpmFleetUpgrade. The last one fails because it doesn't download in time the elastic-agent-8.17.0-SNAPSHOT-x86_64.rpm artifact with a context deadline exceeded (code ref)

cc @ycombinator should we create an issue about the latter?

@ycombinator
Copy link
Contributor

the CI failures are due to TestOtelAPMIngestion (#4148) and TestRpmFleetUpgrade. The last one fails because it doesn't download in time the elastic-agent-8.17.0-SNAPSHOT-x86_64.rpm artifact with a context deadline exceeded (code ref)

cc @ycombinator should we create an issue about the latter?

I'm not seeing these tests failing on main so either something in this PR is causing them to fail (unlikely) or they are flaky.
I just re-ran the integration tests step. If it passes, it means those tests are flaky and we should create an issue for each of them.

@swiatekm swiatekm removed the backport-8.15 Automated backport to the 8.15 branch with mergify label Nov 22, 2024
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great performance improvement! Nicely done.

@swiatekm swiatekm merged commit fe0f6b0 into main Nov 27, 2024
15 checks passed
@swiatekm swiatekm deleted the fix/dynamic-provider-ast branch November 27, 2024 15:44
mergify bot pushed a commit that referenced this pull request Nov 27, 2024
* Store AST directly in dynamic provider mappings

* Store AST directly in context provider mappings

(cherry picked from commit fe0f6b0)
mergify bot pushed a commit that referenced this pull request Nov 27, 2024
* Store AST directly in dynamic provider mappings

* Store AST directly in context provider mappings

(cherry picked from commit fe0f6b0)
mergify bot pushed a commit that referenced this pull request Nov 27, 2024
* Store AST directly in dynamic provider mappings

* Store AST directly in context provider mappings

(cherry picked from commit fe0f6b0)

# Conflicts:
#	internal/pkg/composable/controller.go
swiatekm added a commit that referenced this pull request Nov 27, 2024
* Store AST directly in dynamic provider mappings

* Store AST directly in context provider mappings

(cherry picked from commit fe0f6b0)

Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
swiatekm added a commit that referenced this pull request Nov 27, 2024
* Store AST directly in dynamic provider mappings

* Store AST directly in context provider mappings

(cherry picked from commit fe0f6b0)

Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
ycombinator pushed a commit that referenced this pull request Dec 11, 2024
* Store AST directly in dynamic provider mappings

* Store AST directly in context provider mappings

(cherry picked from commit fe0f6b0)

# Conflicts:
#	internal/pkg/composable/controller.go
swiatekm added a commit that referenced this pull request Dec 13, 2024
(cherry picked from commit fe0f6b0)

# Conflicts:
#	internal/pkg/composable/controller.go

# Conflicts:
#	internal/pkg/composable/controller.go
swiatekm added a commit that referenced this pull request Dec 13, 2024
(cherry picked from commit fe0f6b0)

# Conflicts:
#	internal/pkg/composable/controller.go

# Conflicts:
#	internal/pkg/composable/controller.go

Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
orestisfl added a commit to orestisfl/elastic-agent that referenced this pull request Jan 24, 2025
swiatekm added a commit that referenced this pull request Jan 24, 2025
We only signal from providers if the emitted data is different from the
current state. This
caused a bug where the data wasn't ever emitted if it was empty. The
provider controller now distinguishes between no data (yet) and empty
data. As a result, it's again possible to effectively disable all
providers by using a local provider with no data.

This is a regression caused by
#6114. It doesn't apply to
the main and 8.x branches due to
#6169 refactoring provider
initialisation in those.
swiatekm added a commit that referenced this pull request Jan 27, 2025
We only signal from providers if the emitted data is different from the
current state. This
caused a bug where the data wasn't ever emitted if it was empty. The
provider controller now distinguishes between no data (yet) and empty
data. As a result, it's again possible to effectively disable all
providers by using a local provider with no data.

This is a regression caused by
#6114. It doesn't apply to
the main and 8.x branches due to
#6169 refactoring provider
initialisation in those.
mergify bot pushed a commit that referenced this pull request Jan 27, 2025
We only signal from providers if the emitted data is different from the
current state. This
caused a bug where the data wasn't ever emitted if it was empty. The
provider controller now distinguishes between no data (yet) and empty
data. As a result, it's again possible to effectively disable all
providers by using a local provider with no data.

This is a regression caused by
#6114. It doesn't apply to
the main and 8.x branches due to
#6169 refactoring provider
initialisation in those.

(cherry picked from commit 8ea9439)
swiatekm added a commit that referenced this pull request Jan 27, 2025
We only signal from providers if the emitted data is different from the
current state. This
caused a bug where the data wasn't ever emitted if it was empty. The
provider controller now distinguishes between no data (yet) and empty
data. As a result, it's again possible to effectively disable all
providers by using a local provider with no data.

This is a regression caused by
#6114. It doesn't apply to
the main and 8.x branches due to
#6169 refactoring provider
initialisation in those.

(cherry picked from commit 8ea9439)

Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify enhancement New feature or request skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants