Skip to content

mssqlserver: various test fixes#4304

Open
mmatczuk wants to merge 6 commits intomainfrom
mmt/mssqlserver-fixes-v2
Open

mssqlserver: various test fixes#4304
mmatczuk wants to merge 6 commits intomainfrom
mmt/mssqlserver-fixes-v2

Conversation

@mmatczuk
Copy link
Copy Markdown
Contributor

Commits

  • guard Connect against concurrent calls
  • fix flaky CDC streaming tests under x86 emulation
  • fix flaky CDC checkpoint resume test under x86 emulation
  • tolerate context.Canceled in resume test stream goroutine
  • store checkpoint LSN as varbinary
  • synchronise ordering test stream goroutine

Jira

  • CON-382
  • CON-383
  • CON-389
  • CON-414
  • CON-437

The benthos framework can call Connect() concurrently or re-enter it
before the background goroutine has stopped. Without a guard, two
goroutines share the same publisher and checkpoint tracker, corrupting
state and causing the streaming phase to hang indefinitely.

Add a mutex and a HasStoppedChan check so that a second Connect() is
a no-op while the previous goroutine is still running.

Fixes CON-389
The CDC capture agent in SQL Server under x86 emulation on Apple Silicon
is too slow to process streaming inserts within the test timeout,
especially with 3 parallel subtests each spinning up their own container.

- Add WaitForCDCChanges helper that polls CDC change tables until the
  expected row count appears, replacing fixed time.Sleep(5s)
- Reduce streaming insert count from 3000 to 30 (snapshot already
  validates volume at 3000 rows)
- Wait for CDC to process streaming inserts before asserting

Fixes CON-414
Same CDC agent slowness issue as CON-414. The test inserts rows and
expects them via pure CDC streaming, but under emulation the capture
agent is too slow.

- Reduce rows per phase from 1000 to 100
- Wait for CDC to process inserts before starting the stream
- Remove fixed time.Sleep in favor of deterministic CDC polling

Fixes CON-382
When StopWithin cancels the resume stream context, Run returns
context.Canceled. Match the first launch's handling so the goroutine
does not fail the test on expected shutdown.

Fixes CON-437
The CDC checkpoint cache stored LSNs in a varchar(100) column. When
writing a []byte to a varchar, go-mssqldb performs an implicit
binary-to-hex-string conversion (e.g. "0x0000002d00000bc3b80003"), so
reads returned the ASCII representation of the LSN rather than the
original binary value. On resume, those ASCII bytes were passed as a
varbinary parameter to the change-table query and, starting with 0x30,
sorted above any real LSN (which starts with 0x00), leaving the stream
spinning with no rows matched.

Change the cache column and stored procedure parameter to varbinary(10)
so the driver transfers the LSN as-is.

Fixes CON-382
The ordering-of-iterator test's background goroutine called t.Error if
stream.Run returned a non-context.Canceled error. When the test failed
early or timed out, the goroutine could still be running and fire
t.Error after the test function had returned, panicking and surfacing
as an unnamed package-level event.

Park the Run error on a buffered channel and drain it in t.Cleanup
(after calling StopWithin), so late errors are reported against the
still-active test instead of a completed one.

Fixes CON-383
@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Commits
LGTM

Review
Targeted fixes for a cluster of mssqlserver CDC flakiness issues: LSN stored as varbinary (root cause for resume hang), Connect guarded against concurrent re-entry, integration tests polled via a new WaitForCDCChanges helper instead of sleeping, and two test goroutines synchronised so late errors report to the still-active test.

LGTM

This was referenced Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants