Skip to content

Conversation

@zhaohuabing
Copy link
Member

@zhaohuabing zhaohuabing commented Jan 7, 2026

This PR fixes a potential go routine leak by the runnerDone channel - the config loader starts a new go routine for every change and the go routine can block on runnersDone <- struct{}{} and linger forever. This change ensures those goroutines can exit cleanly.

@zhaohuabing zhaohuabing requested a review from a team as a code owner January 7, 2026 08:41
@zhaohuabing zhaohuabing marked this pull request as draft January 7, 2026 08:42
@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.78%. Comparing base (2a1b624) to head (e6b9318).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/cmd/server.go 0.00% 4 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7880      +/-   ##
==========================================
- Coverage   72.83%   72.78%   -0.06%     
==========================================
  Files         235      235              
  Lines       35176    35177       +1     
==========================================
- Hits        25622    25603      -19     
- Misses       7741     7757      +16     
- Partials     1813     1817       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhaohuabing zhaohuabing force-pushed the fix-leak-goroutine branch 2 times, most recently from 6799252 to ce86584 Compare January 7, 2026 08:57
@zhaohuabing zhaohuabing marked this pull request as ready for review January 7, 2026 08:57
Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
@netlify
Copy link

netlify bot commented Jan 7, 2026

Deploy Preview for cerulean-figolla-1f9435 ready!

Name Link
🔨 Latest commit e6b9318
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/695e98f6a733b000088b104a
😎 Deploy Preview https://deploy-preview-7880--cerulean-figolla-1f9435.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@arkodg
Copy link
Contributor

arkodg commented Jan 7, 2026

hey @zhaohuabing the diff looks good, has this been manually tested to make sure there's no indefinite block

@jukie
Copy link
Contributor

jukie commented Jan 7, 2026

/retest

@zhaohuabing
Copy link
Member Author

hey @zhaohuabing the diff looks good, has this been manually tested to make sure there's no indefinite block

Yes, I've manually tested this PR in my local kind cluster by changing the envoy-gateway-config configMap.

This is also covered by e2e tests - the e2e tests apply the envoy-gateway-config configMap after deploying the EG.

@zhaohuabing zhaohuabing merged commit 051bc2f into envoyproxy:main Jan 8, 2026
54 of 59 checks passed
zirain pushed a commit to zirain/gateway that referenced this pull request Jan 8, 2026
fix goroutine leak

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>
@zhaohuabing zhaohuabing deleted the fix-leak-goroutine branch January 8, 2026 09:55
rudrakhp pushed a commit to rudrakhp/gateway that referenced this pull request Jan 8, 2026
fix goroutine leak

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
zirain added a commit that referenced this pull request Jan 9, 2026
* fix: set observedGeneration in envoy patch policy (#7715)

* fix: set observedGeneration in envoy patch policy

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>

* add release note

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>

---------

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>

* fix: add validation for request buffer limit (#7687)

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>

* fix: setting externalTrafficPolicy for NodePort service type (#7823)

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>

* fix: make port-forward worked for OTel collector on port 19001 (#7860)

Signed-off-by: zirain <zirain2009@gmail.com>

* chore: fix goroutine leak (#7880)

fix goroutine leak

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>

* fix gen

Signed-off-by: zirain <zirain2009@gmail.com>

* bump envoy to 1.35.8

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
Co-authored-by: Kota Kimura <86363983+kkk777-7@users.noreply.github.com>
Co-authored-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
Co-authored-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
rudrakhp added a commit that referenced this pull request Jan 9, 2026
* fix: set observedGeneration in envoy patch policy (#7715)

* fix: set observedGeneration in envoy patch policy

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>

* add release note

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>

---------

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: add validation for request buffer limit (#7687)

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: nil pointer error when applying BackendTrafficPolicy to HTTPRoute with no backendRefs (#7765)

* fix: checking route section name in backend traffic policy

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: setting externalTrafficPolicy for NodePort service type (#7823)

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: add indexing and processing for CRL references in ClientTrafficPolicies (#7829)

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* feat: change the benchmark report to json format (#6818)

* benchmark json output

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* fix lint

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* revert

Signed-off-by: zirain <zirain2009@gmail.com>

* fix seconds

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* bechmark: scale up RPS to test data plane CPU performance (#7810)

* Scale up RPS to test data plane CPU performance

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* set duration to 120s

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* discard invalid samples

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* change scrape interval to 10s

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* remove invalid cpu sampling data

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* reduce duration to 60

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* fix benchmark end time

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* fix data plane benchmark start time

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* increase test time to get more samples

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* adjust rps for each scale

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* address comments

Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com>

---------

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: make port-forward worked for OTel collector on port 19001 (#7860)

Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* chore: fix goroutine leak (#7880)

fix goroutine leak

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix gen-check

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

---------

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com>
Co-authored-by: Kota Kimura <86363983+kkk777-7@users.noreply.github.com>
Co-authored-by: zirain <zirain2009@gmail.com>
Co-authored-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
runnersDone := make(chan struct{})
var hookWG sync.WaitGroup
hook := func(c context.Context, cfg *config.Server) error {
hookWG.Add(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing a race condition in the main routine. Waitgroup.Add is not supposed to be called from other goroutine than the waiter. @envoyproxy/gateway-maintainers @nacx unfortunately this is introduced and released already in v1.6.2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been suggesting since the last year that EG definitely needs to have a -race flag enabled end to end tests. There has been too many race condition like this being fixed and being introduced multiple times. #5539

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words the so-called "goroutine leak" is not fixed with this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, before this, the chan would be block if the config load more than once.
I would say we fix the goroutinu leak, but introuduce another race.
Hope #7964 will fix it.

andreik-n2 pushed a commit to andreik-n2/gateway that referenced this pull request Jan 15, 2026
fix goroutine leak

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
@zirain zirain mentioned this pull request Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants