fix: reset metadata fetcher init error after successful sync #10360

kruskall · 2023-02-27T00:33:18Z

Motivation/summary

From the related issue:

If the new Elasticsearch sourcemap fetcher fails to initialise, then it sets an error that is never cleared, and all future attempts fetch will fail with that error. Temporary issues, such as Elasticsearch being unavailable during startup, will render the fetcher permanently broken.

Checklist

Update CHANGELOG.asciidoc
Update package changelog.yml (only if changes to apmpackage have been made)
Documentation has been updated

For functional changes, consider:

Is it observable through the addition of either logging or metrics?
Is its use being published in telemetry to enable product improvement?
Have system tests been added to avoid regression?

How to test these changes

Related issues

Related to #10338

mergify · 2023-02-27T00:33:52Z

This pull request does not have a backport label. Could you fix it @kruskall? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-7.x is the label to automatically backport to the 7.x branch.
backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

apmmachine · 2023-02-27T00:47:31Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-02-28T12:54:50.202+0000
Duration: 10 min 42 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate and publish the docker images.
/test windows : Build & tests on Windows.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

apmmachine · 2023-02-27T00:47:35Z

📚 Go benchmark report

Diff with the main branch

goos: linux
goarch: amd64
pkg: github.com/elastic/apm-server/internal/agentcfg
cpu: 12th Gen Intel(R) Core(TM) i5-12500
                                  │ build/main/bench.out │             bench.out              │
                                  │        sec/op        │    sec/op     vs base              │
FetchAndAdd/FetchFromCache-12               46.13n ± ∞ ¹   46.20n ± ∞ ¹  +0.15% (p=0.040 n=5)
geomean                                     69.03n         69.42n        +0.56%
¹ need >= 6 samples for confidence interval at level 0.95

                                  │ build/main/bench.out │              bench.out              │
                                  │         B/op         │    B/op      vs base                │
geomean                                                ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                                  │ build/main/bench.out │              bench.out              │
                                  │      allocs/op       │  allocs/op   vs base                │
geomean                                                ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/elastic/apm-server/internal/beater/request
                                             │ build/main/bench.out │             bench.out              │
                                             │        sec/op        │    sec/op     vs base              │
ContextResetContentEncoding/uncompressed-12            159.6n ± ∞ ¹   160.4n ± ∞ ¹  +0.50% (p=0.040 n=5)
geomean                                                893.4n         894.6n        +0.14%
¹ need >= 6 samples for confidence interval at level 0.95

                                             │ build/main/bench.out │               bench.out               │
                                             │         B/op         │     B/op       vs base                │
geomean                                                           ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                                             │ build/main/bench.out │              bench.out              │
                                             │      allocs/op       │  allocs/op   vs base                │
geomean                                                           ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/elastic/apm-server/internal/publish
             │ build/main/bench.out │          bench.out           │
             │        sec/op        │   sec/op     vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

             │ build/main/bench.out │           bench.out            │
             │         B/op         │     B/op       vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

             │ build/main/bench.out │           bench.out           │
             │      allocs/op       │  allocs/op    vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

pkg: github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics
                 │ build/main/bench.out │           bench.out           │
                 │        sec/op        │    sec/op     vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

                 │ build/main/bench.out │            bench.out             │
                 │         B/op         │     B/op       vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                 │ build/main/bench.out │           bench.out            │
                 │      allocs/op       │  allocs/op   vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

pkg: github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics
                        │ build/main/bench.out │           bench.out           │
                        │        sec/op        │    sec/op     vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

                        │ build/main/bench.out │           bench.out            │
                        │         B/op         │    B/op      vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                        │ build/main/bench.out │           bench.out            │
                        │      allocs/op       │  allocs/op   vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

pkg: github.com/elastic/apm-server/x-pack/apm-server/sampling
               │ build/main/bench.out │             bench.out              │
               │        sec/op        │    sec/op     vs base              │
geomean                  598.2n         618.9n        +3.47%
¹ need >= 6 samples for confidence interval at level 0.95

               │ build/main/bench.out │               bench.out               │
               │         B/op         │     B/op       vs base                │
geomean                             ³                  +0.05%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

               │ build/main/bench.out │              bench.out              │
               │      allocs/op       │  allocs/op   vs base                │
geomean                             ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage
                                            │ build/main/bench.out │              bench.out              │
                                            │        sec/op        │    sec/op     vs base               │
ReadEvents/nop_codec/1000_events-12                  1004.0µ ± ∞ ¹   880.6µ ± ∞ ¹  -12.29% (p=0.008 n=5)
geomean                                               31.68µ         31.14µ         -1.71%
¹ need >= 6 samples for confidence interval at level 0.95

                                            │ build/main/bench.out │               bench.out                │
                                            │         B/op         │      B/op       vs base                │
geomean                                              31.37Ki          31.37Ki        -0.02%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                                            │ build/main/bench.out │              bench.out               │
                                            │      allocs/op       │  allocs/op    vs base                │
geomean                                                144.7          144.7        +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

axw

This is an improvement, but I still think we can do better.

Can you please respond to my comment here? #10338 (comment)

What is the benefit of the initial ping? Why do we block in SourcemapFetcher.Fetch? Would it be a problem if we just returned ErrUnavailable until the metadata cache is populated? As it is, there's a window where SourcemapFetcher.Fetch may behave as if the cache is empty, after the initial ping succeeds but before the sync completes.

axw

LGTM. I think we should probably remove initErr in #10367 though, and only mark the metadata fetcher as ready when it has successfully synced. Any errors would be logged, but not returned to the SourcemapFetcher.

kruskall · 2023-03-01T08:28:46Z

@Mergifyio backport 8.7

mergify · 2023-03-01T08:29:02Z

backport 8.7

✅ Backports have been created

#10398 fix: reset metadata fetcher init error after successful sync (backport #10360) has been created for branch 8.7

(cherry picked from commit f3e8719)

…#10398) (cherry picked from commit f3e8719) Co-authored-by: kruskall <99559985+kruskall@users.noreply.github.com>

fix: reset metadata fetcher init error after successful sync

73bda9b

mergify bot added the backport-skip Skip notification from the automated backport with mergify label Feb 27, 2023

axw reviewed Feb 27, 2023

View reviewed changes

Merge branch 'main' into fix/metadata-fetcher-reset-initerr

b6f3da3

kruskall mentioned this pull request Feb 28, 2023

sourcemap: failing to ping ES within 1s in metadata fetcher renders fetcher permanently broken #10338

Closed

axw approved these changes Feb 28, 2023

View reviewed changes

Merge branch 'main' into fix/metadata-fetcher-reset-initerr

063c766

kruskall enabled auto-merge (squash) February 28, 2023 12:54

kruskall merged commit f3e8719 into elastic:main Feb 28, 2023

kruskall deleted the fix/metadata-fetcher-reset-initerr branch February 28, 2023 14:22

mergify bot mentioned this pull request Mar 1, 2023

fix: reset metadata fetcher init error after successful sync (backport #10360) #10398

Merged

mergify bot pushed a commit that referenced this pull request Mar 1, 2023

fix: reset metadata fetcher init error after successful sync (#10360)

4ef6cb2

(cherry picked from commit f3e8719)

kruskall added a commit that referenced this pull request Mar 1, 2023

fix: reset metadata fetcher init error after successful sync (#10360) (…

0bbbecd

…#10398) (cherry picked from commit f3e8719) Co-authored-by: kruskall <99559985+kruskall@users.noreply.github.com>

carsonip added test-plan v8.7.1 v8.8.0 and removed v8.7.1 labels Apr 26, 2023

simitt assigned lahsivjar May 2, 2023

lahsivjar added v8.7.0 and removed test-plan v8.8.0 labels May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reset metadata fetcher init error after successful sync #10360

fix: reset metadata fetcher init error after successful sync #10360

kruskall commented Feb 27, 2023

mergify bot commented Feb 27, 2023

apmmachine commented Feb 27, 2023 •

edited

Loading

Build stats

apmmachine commented Feb 27, 2023 •

edited

Loading

axw left a comment

axw left a comment

kruskall commented Mar 1, 2023

mergify bot commented Mar 1, 2023 •

edited

Loading

fix: reset metadata fetcher init error after successful sync #10360

fix: reset metadata fetcher init error after successful sync #10360

Conversation

kruskall commented Feb 27, 2023

Motivation/summary

Checklist

How to test these changes

Related issues

mergify bot commented Feb 27, 2023

apmmachine commented Feb 27, 2023 • edited Loading

💚 Build Succeeded

Build stats

🤖 GitHub comments

apmmachine commented Feb 27, 2023 • edited Loading

📚 Go benchmark report

axw left a comment

Choose a reason for hiding this comment

axw left a comment

Choose a reason for hiding this comment

kruskall commented Mar 1, 2023

mergify bot commented Mar 1, 2023 • edited Loading

✅ Backports have been created

apmmachine commented Feb 27, 2023 •

edited

Loading

apmmachine commented Feb 27, 2023 •

edited

Loading

mergify bot commented Mar 1, 2023 •

edited

Loading