Skip to content

Conversation

@ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #798

What problem does this PR solve?

Issue Number: close #779

Problem Summary:
When these happen, the connScore, connCount, and pending migrations will be wrong, and a backend will never be removed:

  1. A connection finishes redirecting, and the signal loop tries to notify the router, but blocks at the router lock
  2. The connection is closing and gets the router lock. It checks the pending redirection result but finds nothing
  3. The onConnClosed clears the connection and the scores, and then releases the router lock
  4. The onRedirectFinished is called and makes the scores and the connection counts wrong

What is changed and how it works:
Properly fixing the concurrency problem requires refactoring. Considering that this fix will be merged into a stable version, I'm fixing it temporarily.

  • Move the redirectingAddr from the router to the backendConnMgr because the one on the router may be out of date if the redirection finishes
  • In onRedirectFinished, do not update the connection if the connection is closed
  • Add a metric tiproxy_balance_pending_migrate to check the pending migrations

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code
  1. Start a workload to create both short and long connections
  2. Start a chaos that makes the network between one TiDB and PD unstable so that migration happens
  3. Finish the workload
  4. Check the value of tiproxy_balance_pending_migrate and tiproxy_balance_b_score{factor="conn"}, which should be both 0

Notable changes

  • Has configuration change
  • Has HTTP API interfaces change
  • Has tiproxyctl change
  • Other user behavior changes

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

- Fix that the connection scores may be wrong when the connection closes and migrates concurrently

@ti-chi-bot ti-chi-bot bot added the lgtm label May 20, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented May 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djshow832

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link

ti-chi-bot bot commented May 20, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-05-20 07:03:40.398719243 +0000 UTC m=+72457.499886450: ☑️ agreed by djshow832.

@ti-chi-bot ti-chi-bot bot added the approved label May 20, 2025
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 52.17391% with 11 lines in your changes missing coverage. Please review.

Please upload report for BASE (release-1.3@8ba103e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
pkg/balance/router/router_score.go 60.00% 5 Missing and 1 partial ⚠️
pkg/proxy/backend/backend_conn_mgr.go 20.00% 2 Missing and 2 partials ⚠️
pkg/balance/router/router_static.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release-1.3     #799   +/-   ##
==============================================
  Coverage               ?   66.47%           
==============================================
  Files                  ?      122           
  Lines                  ?    11272           
  Branches               ?        0           
==============================================
  Hits                   ?     7493           
  Misses                 ?     3246           
  Partials               ?      533           
Flag Coverage Δ
unit 66.47% <52.17%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ti-chi-bot ti-chi-bot bot merged commit a69bb7c into pingcap:release-1.3 May 20, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants