🐛 [Bug]: Isolation Issue with Parallel Subtests #2871

sixcolors · 2024-02-21T20:34:07Z

An issue that appears throughout the tests. Running tests in parallel using the t.Parallel() function can show problems when the tests that are not isolated. When tests share a state, such as a common instance of app, running them in parallel can lead to race conditions or other unexpected behavior leading to unpredictable results.

To fix this issue [with false results in tests], we could refactor the tests and make sure that each test has its isolated state.

If we want to eliminate repetitive code for setup and teardown for each parallel test, we could consider using the testify/suite package.

For example I ran into issues with the Test_Client_UserAgent.

Example, Test_Client_UserAgent original:

func Test_Client_UserAgent(t *testing.T) {
	t.Parallel()

	ln := fasthttputil.NewInmemoryListener()

	app := New()

	app.Get("/", func(c Ctx) error {
		return c.Send(c.Request().Header.UserAgent())
	})

	go func() {
		require.NoError(t, app.Listener(ln, ListenConfig{
			DisableStartupMessage: true,
		}))
	}()

	t.Run("default", func(t *testing.T) {
		t.Parallel()
		for i := 0; i < 5; i++ {
			a := Get("http://example.com")

			a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

			code, body, errs := a.String()

			require.Equal(t, StatusOK, code)
			require.Equal(t, defaultUserAgent, body)
			require.Empty(t, errs)
		}
	})

	t.Run("custom", func(t *testing.T) {
		t.Parallel()
		for i := 0; i < 5; i++ {
			c := AcquireClient()
			c.UserAgent = "ua"

			a := c.Get("http://example.com")

			a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

			code, body, errs := a.String()

			require.Equal(t, StatusOK, code)
			require.Equal(t, "ua", body)
			require.Empty(t, errs)
			ReleaseClient(c)
		}
	})
}

Originally posted by @sixcolors in #2864 (comment)

The issues found in three test files, namely app_test.go, client_test.go, and ctx_test.go.

The text was updated successfully, but these errors were encountered:

efectn · 2024-02-21T20:56:03Z

I've noticed a common issue that appears throughout the tests. Running tests in parallel using the t.Parallel() function can cause problems if the tests are not isolated correctly. When tests share a state, such as a common instance of app, running them in parallel can lead to race conditions. This means that one test could modify the state while another is reading it, leading to unpredictable results.

To fix this issue, we should refactor the tests and make sure that each test has its isolated state.

If we want to eliminate repetitive code for setup and teardown for each parallel test, we could consider using the testify/suite package.

For example, in the Test_Ctx_Parsers noted above, I also ran into issues with the Test_Client_UserAgent.

Example, Test_Client_UserAgent with a fix:
func Test_Client_UserAgent(t *testing.T) {
	t.Parallel()

	setupApp := func(t *testing.T) *fasthttputil.InmemoryListener {
		t.Helper()
		ln := fasthttputil.NewInmemoryListener()
		app := New()
		app.Get("/", func(c Ctx) error {
			return c.Send(c.Request().Header.UserAgent())
		})
		go func() {
			require.NoError(t, app.Listener(ln, ListenConfig{
				DisableStartupMessage: true,
			}))
		}()
		time.Sleep(100 * time.Millisecond) // wait for server to start
		return ln
	}

	t.Run("default", func(t *testing.T) {
		t.Parallel()
		ln := setupApp(t)
		for i := 0; i < 5; i++ {
			a := Get("http://example.com")

			a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

			code, body, errs := a.String()

			require.Equal(t, StatusOK, code)
			require.Equal(t, defaultUserAgent, body)
			require.Empty(t, errs)
		}
	})

	t.Run("custom", func(t *testing.T) {
		t.Parallel()
		ln := setupApp(t)
		for i := 0; i < 5; i++ {
			c := AcquireClient()
			c.UserAgent = "ua"

			a := c.Get("http://example.com")

			a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

			code, body, errs := a.String()

			require.Equal(t, StatusOK, code)
			require.Equal(t, "ua", body)
			require.Empty(t, errs)
			ReleaseClient(c)
		}
	})
}
Please let me know if you have any questions or concerns.

Originally posted by @sixcolors in #2864 (comment)

There are some issues with a particular pattern found in three test files, namely app_test.go, client_test.go, and ctx_test.go. However, it seems that earlydata_test.go is doing it correctly by using t.Helper(). Additionally, I noticed that the subtests in etag_test.go, session_test.go, and store_test.go are also using the subtest pattern, but they are not using shared instances as far as I can tell from a quick scan.

We should also detect the shared stuff between tests. Something may be wrong or just false positive.

Btw i agree about setupApp. Maybe we can create something like helper to reduce code duolication on tests/benchmarks

sixcolors · 2024-02-21T21:00:52Z

A similar problem was just noticed in internal/storage/memory/memory_test.go where var testStore = new is set globally in the package, and I got memory corruption issues, that randomly occurred. Since this pkg is using sync and a mutex, we should probably investigate the cause.

ReneWerner87 · 2024-02-21T21:47:22Z

we should investigate these cases and, in my opinion, not simply fix them through isolation

because normally it is a valid setup
if a server is up and running and processing multiple simultaneous requests, if something is shared there, then we should fix the problem rather than the test

ReneWerner87 · 2024-02-21T21:58:44Z

https://github.com/gofiber/fiber/blob/v2/ctx_test.go#L1376-L1467
there must be a bug somewhere in the merge branch or v3, the same test works perfectly in v2 and also in parallel

ReneWerner87 · 2024-02-21T22:27:23Z

we found the problem,
The fasthttp ctx is filled when the struct is created for the pool and is set to nil when it is packed back into the pool

If you take it out of the pool again it's bad
because it was only filled when the pool item was created and not, as with v2, when the item was configured from the pool

Will fix it tomorrow or the day after, so not isolating it has shown us a real world bug which also occurs in a real application when ctx is not completely recreated, but is reused from the pool

ReneWerner87 · 2024-02-21T22:35:06Z

v2

fiber/ctx.go

Line 181 in ddc6b23

c.fasthttp = fctx

fiber/ctx.go

Line 193 in ddc6b23

c.fasthttp = nil

v3

fiber/ctx_interface.go

Lines 446 to 452 in 26346d6

    
           func (app *App) AcquireCtx() Ctx { 
        
           	ctx, ok := app.pool.Get().(Ctx) 
        
           	if !ok { 
        
           		panic(errors.New("failed to type-assert to Ctx")) 
        
           	} 
        
           	return ctx 
        
           }

fiber/ctx_interface.go

Lines 455 to 458 in 26346d6

    
           func (app *App) ReleaseCtx(c Ctx) { 
        
           	c.release() 
        
           	app.pool.Put(c) 
        
           }

fiber/ctx_interface.go

Line 489 in 26346d6

c.fasthttp = nil

The problem here is that the fasthttp context is only set when new items are created for the pool, but not when items from the pool are reused, so we would have to move this to AcquireCtx

fiber/app.go

Lines 497 to 501 in 26346d6

    
           app.pool = sync.Pool{ 
        
           	New: func() any { 
        
           		return app.NewCtx(&fasthttp.RequestCtx{}) 
        
           	}, 
        
           }

We should also examine this process again

fiber/ctx_interface.go

Lines 461 to 484 in 26346d6

    
           func (c *DefaultCtx) Reset(fctx *fasthttp.RequestCtx) { 
        
           	// Reset route and handler index 
        
           	c.indexRoute = -1 
        
           	c.indexHandler = 0 
        
           	// Reset matched flag 
        
           	c.matched = false 
        
           	// Set paths 
        
           	c.pathOriginal = c.app.getString(fctx.URI().PathOriginal()) 
        
           	// Attach *fasthttp.RequestCtx to ctx 
        
           	c.fasthttp = fctx 
        
           	// reset base uri 
        
           	c.baseURI = "" 
        
           	// Set method 
        
           	c.method = c.app.getString(fctx.Request.Header.Method()) 
        
           	c.methodINT = c.app.methodInt(c.method) 
        
           	// Prettify path 
        
           	c.configDependentPaths() 
        
           }

ReneWerner87 · 2024-02-21T22:37:32Z

We also have to use the code of the Reset method in the AcquireCtx

sixcolors · 2024-02-22T04:28:32Z

@efectn

We should also detect the shared stuff between tests. Something may be wrong or just false positive.

@ReneWerner87

we should investigate these cases and, in my opinion, not simply fix them through isolation

because normally it is a valid setup
if a server is up and running and processing multiple simultaneous requests, if something is shared there, then we should fix the problem rather than the test

I appreciate the thoughtful discussion. My intention in raising the issue was to address what appeared to be false positives [failures] in the unit tests. I understand the importance of considering the validity of shared resources and agree that fixing the underlying problem was crucial. I didn't mean to suggest avoiding further investigation into the root cause.

What occurred to me is that the failures might be signaling a need for more granular unit testing with greater isolation, but I acknowledge that this approach could be impractical. Achieving effective coverage using more isolated unit tests could lead to requiring an extensive set of additional tests, covering such interaction scenarios with concurrency and other complexities. The current tests, though indirectly, did help identify an issue.

Thanks for your understanding. Open to further insights or discussion.

nickajacks1 · 2024-02-22T15:48:54Z

I think there are two important points to take home.

Testing concurrency is important. For example, recent changes by @gaby to testing & benchmarking revealed several concurrency issues in the templates project.
Having independent tests is important to prevent both false positives and false negatives. Here is a false positive example, 🚨 Test: fix race condition in parallel tests #2734 . A hypothetical false negative might be that you test that foo() creates a file called bar.txt, but that file was actually created in a separate test.

It seems to me that it might be good to at the very least comb over the existing tests to make sure that we are addressing both points. Given how many tests there are, that would probably be a lot of work.

gaby · 2024-02-23T05:25:07Z

@nickajacks1 I'm planning on starting to take a look at the benchmarks for the core and the middlewares.

Several middlewares are missing Benchmarks. We also need to add RunParallel() benchmarks. None of the benchmarks are asserting inside the loop which is what exposed the issue with the templates.

Will start tomorrow with 1-2 of them to see if anything raises, we can go from there.

leonklingele · 2024-02-23T05:49:18Z

I have made almost all tests work with t.Parallel() in #2469 which got closed unfortunately. Maybe some new work can partly be based upon that PR.

ReneWerner87 · 2024-02-28T06:49:48Z

bug fixed with 5a0167a

gaby · 2024-03-17T12:54:15Z

This is mostly fix by #2892 I'm going to double check today how many tests are not parallel.

ReneWerner87 · 2024-03-17T12:56:01Z

ok then no need for keeping this open

sixcolors added ☢️ Bug v3 v2 labels Feb 21, 2024

sixcolors assigned gaby, ReneWerner87, nickajacks1, sixcolors and efectn Feb 21, 2024

sixcolors mentioned this issue Feb 21, 2024

🐛[Bug]: internal/storage/memory/memory_test #2872

Closed

sixcolors removed the v2 label Feb 22, 2024

sixcolors unassigned gaby, sixcolors, ReneWerner87, efectn and nickajacks1 Feb 22, 2024

ReneWerner87 mentioned this issue Feb 28, 2024

♻️ refactor: Migrate HealthChecker to v3 #2884

Merged

13 tasks

gaby mentioned this issue Mar 3, 2024

🐛 [Bug]: Enable go-require for testify-lint #2891

Closed

3 tasks

ReneWerner87 closed this as completed Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [Bug]: Isolation Issue with Parallel Subtests #2871

🐛 [Bug]: Isolation Issue with Parallel Subtests #2871

sixcolors commented Feb 21, 2024 •

edited

efectn commented Feb 21, 2024

sixcolors commented Feb 21, 2024 •

edited

ReneWerner87 commented Feb 21, 2024

ReneWerner87 commented Feb 21, 2024

ReneWerner87 commented Feb 21, 2024

ReneWerner87 commented Feb 21, 2024

ReneWerner87 commented Feb 21, 2024

sixcolors commented Feb 22, 2024 •

edited

nickajacks1 commented Feb 22, 2024

gaby commented Feb 23, 2024 •

edited

leonklingele commented Feb 23, 2024

ReneWerner87 commented Feb 28, 2024

gaby commented Mar 17, 2024

ReneWerner87 commented Mar 17, 2024

🐛 [Bug]: Isolation Issue with Parallel Subtests #2871

🐛 [Bug]: Isolation Issue with Parallel Subtests #2871

Comments

sixcolors commented Feb 21, 2024 • edited

efectn commented Feb 21, 2024

sixcolors commented Feb 21, 2024 • edited

ReneWerner87 commented Feb 21, 2024

ReneWerner87 commented Feb 21, 2024

ReneWerner87 commented Feb 21, 2024

ReneWerner87 commented Feb 21, 2024

ReneWerner87 commented Feb 21, 2024

sixcolors commented Feb 22, 2024 • edited

nickajacks1 commented Feb 22, 2024

gaby commented Feb 23, 2024 • edited

leonklingele commented Feb 23, 2024

ReneWerner87 commented Feb 28, 2024

gaby commented Mar 17, 2024

ReneWerner87 commented Mar 17, 2024

sixcolors commented Feb 21, 2024 •

edited

sixcolors commented Feb 21, 2024 •

edited

sixcolors commented Feb 22, 2024 •

edited

gaby commented Feb 23, 2024 •

edited