Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reset metadata fetcher init error after successful sync #10360

Merged
merged 3 commits into from
Feb 28, 2023

Conversation

kruskall
Copy link
Member

Motivation/summary

From the related issue:

If the new Elasticsearch sourcemap fetcher fails to initialise, then it sets an error that is never cleared, and all future attempts fetch will fail with that error. Temporary issues, such as Elasticsearch being unavailable during startup, will render the fetcher permanently broken.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

Related issues

Related to #10338

@mergify
Copy link
Contributor

mergify bot commented Feb 27, 2023

This pull request does not have a backport label. Could you fix it @kruskall? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.x is the label to automatically backport to the 7.x branch.
  • backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Feb 27, 2023
@apmmachine
Copy link
Collaborator

apmmachine commented Feb 27, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-02-28T12:54:50.202+0000

  • Duration: 10 min 42 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@apmmachine
Copy link
Collaborator

apmmachine commented Feb 27, 2023

📚 Go benchmark report

Diff with the main branch

goos: linux
goarch: amd64
pkg: github.com/elastic/apm-server/internal/agentcfg
cpu: 12th Gen Intel(R) Core(TM) i5-12500
                                  │ build/main/bench.out │             bench.out              │
                                  │        sec/op        │    sec/op     vs base              │
FetchAndAdd/FetchFromCache-12               46.13n ± ∞ ¹   46.20n ± ∞ ¹  +0.15% (p=0.040 n=5)
geomean                                     69.03n         69.42n        +0.56%
¹ need >= 6 samples for confidence interval at level 0.95

                                  │ build/main/bench.out │              bench.out              │
                                  │         B/op         │    B/op      vs base                │
geomean                                                ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                                  │ build/main/bench.out │              bench.out              │
                                  │      allocs/op       │  allocs/op   vs base                │
geomean                                                ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/elastic/apm-server/internal/beater/request
                                             │ build/main/bench.out │             bench.out              │
                                             │        sec/op        │    sec/op     vs base              │
ContextResetContentEncoding/uncompressed-12            159.6n ± ∞ ¹   160.4n ± ∞ ¹  +0.50% (p=0.040 n=5)
geomean                                                893.4n         894.6n        +0.14%
¹ need >= 6 samples for confidence interval at level 0.95

                                             │ build/main/bench.out │               bench.out               │
                                             │         B/op         │     B/op       vs base                │
geomean                                                           ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                                             │ build/main/bench.out │              bench.out              │
                                             │      allocs/op       │  allocs/op   vs base                │
geomean                                                           ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/elastic/apm-server/internal/publish
             │ build/main/bench.out │          bench.out           │
             │        sec/op        │   sec/op     vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

             │ build/main/bench.out │           bench.out            │
             │         B/op         │     B/op       vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

             │ build/main/bench.out │           bench.out           │
             │      allocs/op       │  allocs/op    vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

pkg: github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics
                 │ build/main/bench.out │           bench.out           │
                 │        sec/op        │    sec/op     vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

                 │ build/main/bench.out │            bench.out             │
                 │         B/op         │     B/op       vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                 │ build/main/bench.out │           bench.out            │
                 │      allocs/op       │  allocs/op   vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

pkg: github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics
                        │ build/main/bench.out │           bench.out           │
                        │        sec/op        │    sec/op     vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

                        │ build/main/bench.out │           bench.out            │
                        │         B/op         │    B/op      vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                        │ build/main/bench.out │           bench.out            │
                        │      allocs/op       │  allocs/op   vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

pkg: github.com/elastic/apm-server/x-pack/apm-server/sampling
               │ build/main/bench.out │             bench.out              │
               │        sec/op        │    sec/op     vs base              │
geomean                  598.2n         618.9n        +3.47%
¹ need >= 6 samples for confidence interval at level 0.95

               │ build/main/bench.out │               bench.out               │
               │         B/op         │     B/op       vs base                │
geomean                             ³                  +0.05%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

               │ build/main/bench.out │              bench.out              │
               │      allocs/op       │  allocs/op   vs base                │
geomean                             ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage
                                            │ build/main/bench.out │              bench.out              │
                                            │        sec/op        │    sec/op     vs base               │
ReadEvents/nop_codec/1000_events-12                  1004.0µ ± ∞ ¹   880.6µ ± ∞ ¹  -12.29% (p=0.008 n=5)
geomean                                               31.68µ         31.14µ         -1.71%
¹ need >= 6 samples for confidence interval at level 0.95

                                            │ build/main/bench.out │               bench.out                │
                                            │         B/op         │      B/op       vs base                │
geomean                                              31.37Ki          31.37Ki        -0.02%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                                            │ build/main/bench.out │              bench.out               │
                                            │      allocs/op       │  allocs/op    vs base                │
geomean                                                144.7          144.7        +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an improvement, but I still think we can do better.

Can you please respond to my comment here? #10338 (comment)

What is the benefit of the initial ping? Why do we block in SourcemapFetcher.Fetch? Would it be a problem if we just returned ErrUnavailable until the metadata cache is populated? As it is, there's a window where SourcemapFetcher.Fetch may behave as if the cache is empty, after the initial ping succeeds but before the sync completes.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we should probably remove initErr in #10367 though, and only mark the metadata fetcher as ready when it has successfully synced. Any errors would be logged, but not returned to the SourcemapFetcher.

@kruskall kruskall enabled auto-merge (squash) February 28, 2023 12:54
@kruskall kruskall merged commit f3e8719 into elastic:main Feb 28, 2023
@kruskall kruskall deleted the fix/metadata-fetcher-reset-initerr branch February 28, 2023 14:22
@kruskall
Copy link
Member Author

kruskall commented Mar 1, 2023

@Mergifyio backport 8.7

@mergify
Copy link
Contributor

mergify bot commented Mar 1, 2023

backport 8.7

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Mar 1, 2023
kruskall added a commit that referenced this pull request Mar 1, 2023
…#10398)

(cherry picked from commit f3e8719)

Co-authored-by: kruskall <99559985+kruskall@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants