Skip to content

fix data race in bft deliverer#5464

Merged
C0rWin merged 1 commit into
hyperledger:mainfrom
pfi79:fix-bft-deliverer
May 10, 2026
Merged

fix data race in bft deliverer#5464
C0rWin merged 1 commit into
hyperledger:mainfrom
pfi79:fix-bft-deliverer

Conversation

@pfi79
Copy link
Copy Markdown
Contributor

@pfi79 pfi79 commented Apr 25, 2026

The logs in the test TestBFTDeliverer_CensorshipMonitorEvents/repeated censorship events, with exponential backoff show that the FetchBlocks function can be run simultaneously in 2 instances (see marked * - entrance and exit).

* 2026-05-02 07:27:05.984 UTC 3bf1 DEBU [BFTDeliverer.test] FetchBlocks -> Trying to fetch blocks from orderer: orderer-address-3
2026-05-02 07:27:05.989 UTC 3c05 DEBU [BFTDeliverer.test] handleFetchAndCensorshipEvents -> Entry 
2026-05-02 07:27:05.989 UTC 3c07 INFO [BlockReceiver] Start -> BlockReceiver starting orderer-address=orderer-address-3
2026-05-02 07:27:05.992 UTC 3c09 INFO [BFTDeliverer.test] func3 -> monitor error channel returns censorship error num: 15
2026-05-02 07:27:05.993 UTC 3c0d DEBU [BFTDeliverer.test] handleFetchAndCensorshipEvents -> Error received from censorshipMonitor.ErrorsChannel: censorship 15 
2026-05-02 07:27:05.993 UTC 3c0e WARN [BFTDeliverer.test] handleFetchAndCensorshipEvents -> Censorship suspicion: censorship 15; going to retry fetching blocks from another orderer 
2026-05-02 07:27:05.993 UTC 3c0f INFO [BlockReceiver] Stop -> BlockReceiver stopped orderer-address=orderer-address-3
2026-05-02 07:27:05.993 UTC 3c10 WARN [BFTDeliverer.test] retryBackoff -> Failed to fetch blocks, count=15, round=3, going to retry in 800ms 
2026-05-02 07:27:05.993 UTC 3c14 DEBU [BFTDeliverer.test] handleFetchAndCensorshipEvents -> Entry 
* 2026-05-02 07:27:05.994 UTC 3c16 DEBU [BFTDeliverer.test] FetchBlocks -> Trying to fetch blocks from orderer: orderer-address-4
2026-05-02 07:27:05.996 UTC 3c24 INFO [BlockReceiver] Start -> BlockReceiver starting orderer-address=orderer-address-4
2026-05-02 07:27:05.994 UTC 3c1c INFO [BlockReceiver] ProcessIncoming -> BlockReceiver got a signal to stop orderer-address=orderer-address-3
2026-05-02 07:27:05.997 UTC 3c28 WARN [BlockReceiver] func1 -> Encountered an error reading from deliver stream: fake-recv-step-error orderer-address=orderer-address-3
* 2026-05-02 07:27:05.998 UTC 3c31 DEBU [BFTDeliverer.test] FetchBlocks -> BlockReceiver stopped while processing incoming blocks: got a signal to stop 
2026-05-02 07:27:06.003 UTC 3c33 INFO [BFTDeliverer.test] func3 -> monitor error channel returns censorship error num: 16
2026-05-02 07:27:06.004 UTC 3c37 DEBU [BFTDeliverer.test] handleFetchAndCensorshipEvents -> Error received from censorshipMonitor.ErrorsChannel: censorship 16
2026-05-02 07:27:06.004 UTC 3c38 WARN [BFTDeliverer.test] handleFetchAndCensorshipEvents -> Censorship suspicion: censorship 16; going to retry fetching blocks from another orderer
2026-05-02 07:27:06.004 UTC 3c39 INFO [BlockReceiver] ProcessIncoming -> BlockReceiver got a signal to stop orderer-address=orderer-address-4
2026-05-02 07:27:06.005 UTC 3c3e WARN [BlockReceiver] func1 -> Encountered an error reading from deliver stream: fake-recv-step-error orderer-address=orderer-address-4
* 2026-05-02 07:27:06.006 UTC 3c42 DEBU [BFTDeliverer.test] FetchBlocks -> BlockReceiver stopped while processing incoming blocks: got a signal to stop
2026-05-02 07:27:06.007 UTC 3c43 INFO [BlockReceiver] Stop -> BlockReceiver stopped orderer-address=orderer-address-4
2026-05-02 07:27:06.007 UTC 3c44 WARN [BFTDeliverer.test] retryBackoff -> Failed to fetch blocks, count=16, round=4, going to retry in 1.6s 
2026-05-02 07:27:06.007 UTC 3c48 DEBU [BFTDeliverer.test] handleFetchAndCensorshipEvents -> Entry 

This results in several fields of the BFTDeliverer structure (fetchErrorsC and blockReceiver) being compromised.

Solution:

  1. Explicitly pass the fetchErrorsC channel to FetchBlocks
  2. added another call to blockReceiver.Stop()
  3. Change logging. Logs created using the package testing.T is not output immediately, but accumulates and is output at the end of the test. Consequently, the logs output by different packages are spread out over the output out of time.

@pfi79 pfi79 requested a review from a team as a code owner April 25, 2026 10:59
@pfi79 pfi79 marked this pull request as draft April 29, 2026 21:53
@tock-ibm
Copy link
Copy Markdown
Contributor

tock-ibm commented May 3, 2026

@pfi79 Can you share the race detector output?

@pfi79
Copy link
Copy Markdown
Contributor Author

pfi79 commented May 3, 2026

@pfi79 Can you share the race detector output?

There is a "data race", but it is not detected by the detector.
The detector showed up once. But it's more complicated than that. I'm just dealing with it.

@tock-ibm
Copy link
Copy Markdown
Contributor

tock-ibm commented May 3, 2026

@pfi79 Yes, very well! We are dealing with a similar issue in fabric-x, I'll keep an eye on what is going on here. In fabric-x the race detector in unit tests complains some time. Not sure if it is a test flake or a production code issue, though. I assume test flake.

Signed-off-by: Fedor Partanskiy <fredprtnsk@gmail.com>
@pfi79 pfi79 force-pushed the fix-bft-deliverer branch from 12c5262 to cb4f3c8 Compare May 4, 2026 08:28
@pfi79 pfi79 marked this pull request as ready for review May 4, 2026 08:55
@pfi79
Copy link
Copy Markdown
Contributor Author

pfi79 commented May 4, 2026

@pfi79 Yes, very well! We are dealing with a similar issue in fabric-x, I'll keep an eye on what is going on here. In fabric-x the race detector in unit tests complains some time. Not sure if it is a test flake or a production code issue, though. I assume test flake.

Figured it out. I added a description above

@C0rWin C0rWin merged commit dadd739 into hyperledger:main May 10, 2026
16 checks passed
@pfi79 pfi79 deleted the fix-bft-deliverer branch May 10, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants