Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestConnectionClosureAbnormal is flaky #39

Open
imiric opened this issue Oct 27, 2021 · 9 comments
Open

TestConnectionClosureAbnormal is flaky #39

imiric opened this issue Oct 27, 2021 · 9 comments
Labels
bug Something isn't working tests user request Requested by the community

Comments

@imiric
Copy link
Contributor

imiric commented Oct 27, 2021

We've seen intermittent failures of TestConnectionClosureAbnormal/closure_abnormal mostly in CI and ocasionally in local runs as well.

The current test expects the error code 1006 (abnormal closure), but sometimes--typically under high load or in VMs running on overprovisioned hardware, as is usually the case for GH Actions--it will fail with code 1001 (going away):

=== RUN   TestConnectionClosureAbnormal/closure_abnormal
    connection_test.go:67:
                Error Trace:    connection_test.go:67 Error:          Error message not equal:
                                expected: "websocket: close 1006 (abnormal closure): unexpected EOF"
                                actual  : "websocket: close 1001 (going away)"
                Test:           TestConnectionClosureAbnormal/closure_abnormal

We tried simply handling both cases in #33, but they're separate scenarios and the test should only fail with 1006 as that's what the /closure-abnormal endpoint should cause.

@inancgumus
Copy link
Member

inancgumus commented Oct 27, 2021

I tried to add a test for the 1001 (going away) status. Let's keep it here only for reference.

func TestConnectionGoingAway(t *testing.T) {
	const wantErr = "websocket: close 1001" // going away

	done := make(chan struct{})
	server := testutils.NewWSTestServer(t,
		"/going-away",
		http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
			conn, err := (&websocket.Upgrader{}).Upgrade(w, req, w.Header())
			require.NoError(t, err)

			conn.WriteControl(
				websocket.CloseMessage,
				websocket.FormatCloseMessage(websocket.CloseGoingAway, ""),
				time.Now().Add(time.Second),
			)

			<-done
			defer func() {
				done <- struct{}{}
			}()

			var any interface{}
			err = conn.ReadJSON(any)
			require.ErrorContains(t, err, wantErr)
		}))
	defer server.ServerHTTP.Close()

	url, _ := url.Parse(server.ServerHTTP.URL)
	wsURL := fmt.Sprintf("ws://%s/going-away", url.Host)
	ctx := context.Background()
	conn, err := NewConnection(
		ctx,
		wsURL,
		NewLogger(ctx, NullLogger(), false, nil),
	)
	require.NoError(t, err)

	// close the connection before the client has a chance to read
	// from it. this will result in a "going away" error.
	conn.Close()
	done <- struct{}{}

	action := target.SetDiscoverTargets(true)
	err = action.Do(cdp.WithExecutor(ctx, conn))
	require.ErrorContains(t, err, wantErr)

	<-done
}

@inancgumus
Copy link
Member

@inancgumus
Copy link
Member

inancgumus commented Oct 27, 2021

This is what I saw after running the test for half an hour on my Mac:

--- FAIL: TestConnectionClosureAbnormal (0.10s)
    --- FAIL: TestConnectionClosureAbnormal/closure_abnormal (0.07s)
        connection_test.go:67:
            	Error Trace:	connection_test.go:67
            	Error:      	Error message not equal:
            	            	expected: "websocket: close 1006 (abnormal closure): unexpected EOF"
            	            	actual  : "websocket: close 1001 (going away)"
            	Test:       	TestConnectionClosureAbnormal/closure_abnormal
--- FAIL: TestConnectionClosureAbnormal (1.04s)
    --- FAIL: TestConnectionClosureAbnormal/closure_abnormal (1.01s)
        connection_test.go:67:
            	Error Trace:	connection_test.go:67
            	Error:      	Error message not equal:
            	            	expected: "websocket: close 1006 (abnormal closure): unexpected EOF"
            	            	actual  : "websocket: close 1001 (going away)"
            	Test:       	TestConnectionClosureAbnormal/closure_abnormal
--- FAIL: TestConnectionClosureAbnormal (2.00s)
    --- FAIL: TestConnectionClosureAbnormal/closure_abnormal (2.00s)
        connection_test.go:67:
            	Error Trace:	connection_test.go:67
            	Error:      	Error message not equal:
            	            	expected: "websocket: close 1006 (abnormal closure): unexpected EOF"
            	            	actual  : "websocket: close 1001 (going away)"
            	Test:       	TestConnectionClosureAbnormal/closure_abnormal
panic: test timed out after 10m0s

It happens while sending to or receiving from CDP. It's probably an I/O timeout issue. So we might want to give it a timeout somewhere in the code (probably in the connection.go):

goroutine 451 [select, 9 minutes]:
github.com/grafana/xk6-browser/common.(*Connection).sendLoop(0xc000ce41c0)
	/Users/inanc/xk6-browser/common/connection.go:351 +0x1d9
created by github.com/grafana/xk6-browser/common.NewConnection
	/Users/inanc/xk6-browser/common/connection.go:148 +0x4d5

goroutine 74 [IO wait, 10 minutes]:
internal/poll.runtime_pollWait(0x6e00bf0, 0x72)
	/usr/local/Cellar/go/1.17.1/libexec/src/runtime/netpoll.go:229 +0x89
internal/poll.(*pollDesc).wait(0xc000138498, 0xc0001fe000, 0x0)
	/usr/local/Cellar/go/1.17.1/libexec/src/internal/poll/fd_poll_runtime.go:84 +0xbd
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/Cellar/go/1.17.1/libexec/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000138480, {0xc0001fe000, 0x1000, 0x1000})
	/usr/local/Cellar/go/1.17.1/libexec/src/internal/poll/fd_unix.go:167 +0x419
net.(*netFD).Read(0xc000138480, {0xc0001fe000, 0x1000, 0x1000})
	/usr/local/Cellar/go/1.17.1/libexec/src/net/fd_posix.go:56 +0x51
net.(*conn).Read(0xc00011e000, {0xc0001fe000, 0x1000, 0x1000})
	/usr/local/Cellar/go/1.17.1/libexec/src/net/net.go:183 +0xb1
bufio.(*Reader).fill(0xc0001981e0)
	/usr/local/Cellar/go/1.17.1/libexec/src/bufio/bufio.go:101 +0x294
bufio.(*Reader).Peek(0xc0001981e0, 0x2)
	/usr/local/Cellar/go/1.17.1/libexec/src/bufio/bufio.go:139 +0xcc
github.com/gorilla/websocket.(*Conn).read(0xc000268420, 0x63a2c20)
	/Users/inanc/go/pkg/mod/github.com/gorilla/websocket@v1.4.2/conn.go:370 +0x5b
github.com/gorilla/websocket.(*Conn).advanceFrame(0xc000268420)
	/Users/inanc/go/pkg/mod/github.com/gorilla/websocket@v1.4.2/conn.go:798 +0x125
github.com/gorilla/websocket.(*Conn).NextReader(0xc000268420)
	/Users/inanc/go/pkg/mod/github.com/gorilla/websocket@v1.4.2/conn.go:980 +0x165
github.com/gorilla/websocket.(*Conn).ReadMessage(0xc000491ec8)
	/Users/inanc/go/pkg/mod/github.com/gorilla/websocket@v1.4.2/conn.go:1064 +0x31
github.com/grafana/xk6-browser/testutils.CDPReadMsg(0xc00019c001)
	/Users/inanc/xk6-browser/testutils/cdp.go:151 +0x45
github.com/grafana/xk6-browser/testutils.getWebsocketHandlerCDP.func1.1()
	/Users/inanc/xk6-browser/testutils/cdp.go:79 +0xa5
created by github.com/grafana/xk6-browser/testutils.getWebsocketHandlerCDP.func1
	/Users/inanc/xk6-browser/testutils/cdp.go:71 +0x2a5

@inancgumus
Copy link
Member

inancgumus commented Jan 7, 2022

I've removed the low prio tag from this because it's such an issue that happens almost every time I commit, and I need to rerun the CI jobs over and over 🤷

@ankur22 ankur22 added this to the v0.7.0 milestone Nov 10, 2022
@andrewslotin andrewslotin modified the milestones: v0.7.0, v0.8.0 Jan 3, 2023
@elimatripathy
Copy link

I am also facing this issue in my local and it is happening almost 99% of the time even with one VU .

ERRO[0043] communicating with browser: websocket: close 1006 (abnormal closure): unexpected EOF category=cdp elapsed="0 ms" goroutine=5988
ERRO[0043] process with PID 28665 unexpectedly ended: signal: killed category=browser elapsed="1 ms" goroutine=5985

Below is my test script
`/* init code - Load local files, import modules, declare lifecycle functions

  • init context: importing modules */
    import { sleep, check } from 'k6';
    import { urlradix, username, password } from "../backendScripts/common.js";

/* init context: define custom function

  • Function to login into radix and navigate to requests tab in activity center*/
    export function loginHomePage(page, browser) {
    page
    .goto(urlradix, { waitUntil: 'networkidle' })
    .then(() => {
    // Enter login credentials and login
    page.locator('input[placeholder="name@company.com"]').type(username);
    page.locator('input[placeholder="Enter password..."]').type(password);
    page.locator('button').click(),
    sleep(6.0),
    check(page, {
    'Radix Navbar Company LOGO': page.locator('#navbar__action--user').textContent() == 'tu',
    },
    {myTag:'notificationUI'})
    }).finally(() => {
    page.screenshot({ path: 'screenshot.png' });
    page.close();
    browser.close();
    });
    }`

@ankur22 ankur22 modified the milestones: v0.8.0, v0.9.0 Feb 2, 2023
@inancgumus
Copy link
Member

Hey, @elimatripathy, could you try this again with the k6 version 0.43.1 and let us know if this error still happens?

Could you also let me know about your environment, like CPUs, memory size, OS, etc.?

@inancgumus inancgumus added the awaiting waiting for user to respond label Feb 28, 2023
@inancgumus inancgumus removed this from the v0.9.0 milestone Mar 31, 2023
@elimatripathy
Copy link

elimatripathy commented Mar 31, 2023

thank you @inancgumus . Yes this issue is reproducible with the latest k6 version 0.43.1 as well and I tried many things to make sure that this is not happening due to my system limitations like running tests in both local and docker containers plus also running performance tests on the facebook.com site. Below are my findings

Script for facebook login:

import { check, sleep } from 'k6';
import { chromium } from 'k6/x/browser';

export const options = {
  thresholds: {
    checks: ["rate==1.0"]
  },
  ext: {
		loadimpact: {
		  distribution: {
			private: {
			  loadZone: 'amazon:ca:montreal',
			  percent: 50,
			},
		  },
		  projectID:$enterTheProjectID,
		},
	  },
  scenarios: {
    login: {
   executor: 'per-vu-iterations',
   vus: 35, 
   iterations: 1,
   maxDuration: '1m'
    }
  },
}

export default async function() {
  const browser = chromium.launch({
    headless: __ENV.XK6_HEADLESS ? false : true,
  });
  const context = browser.newContext();
  const page = context.newPage();

  try {
    // Goto front page, find login link and click it
    await page.goto('https://www.facebook.com/', { waitUntil: 'networkidle' });
   // page.waitForSelector('img[class="background15"]');
    // Enter login credentials and login
     page.locator('input[name="email"]').type('$');
     page.locator('input[name="pass"]').type('$');
    // We expect the form submission to trigger a navigation, so to prevent a
    // race condition, setup a waiter concurrently while waiting for the click
    // to resolve.
    await Promise.all([
       page.waitForNavigation(),
       page.locator('button[type="submit"]').click(),
    ]);
    sleep(1);
    check(page, {
      'header': find(page, '#facebook').isVisible()
    });
  } finally {
    page.screenshot({ path: 'screenshot.png' });
    page.close();
    browser.close();
  }
}

// Custom method to find web element
export function find(page,locator,waitTime) {
  waitTime = waitTime || 10000;
  let flag=false;
  let iter=0;
  let element;
  while(flag==false){
    try {
      element = page.$(locator);
      if(element.isVisible()){
        flag=true;
            }
    } catch (error) {} 
    sleep(0.01);
    iter++;
    if(iter>waitTime){
      break;
    }
  }
  return element;
}

Running this script either in my local or docker containers with more than 35 virtual users caused the exception as below
image

This issue with xk6-browser is definitely stopping to test performance for any application fully . This should be prioritized or please let me know if anything I can do to fix it.

Below are my system configuration
image

@inancgumus
Copy link
Member

Hi @elimatripathy, thanks for the detailed information. We'll consider this in our next planning.

@inancgumus inancgumus added user request Requested by the community and removed awaiting waiting for user to respond labels Apr 3, 2023
@inancgumus
Copy link
Member

inancgumus commented May 15, 2024

This has happened again while running tests locally (v1.5.1):

--- FAIL: TestConnectionClosureAbnormal (0.00s)
    --- FAIL: TestConnectionClosureAbnormal/closure_abnormal (0.01s)
        connection_test.go:55: 
                Error Trace:    /Users/inanc/grafana/k6browser/main/common/connection_test.go:55
                Error:          Error "read tcp 127.0.0.1:53920->127.0.0.1:53910: read: connection reset by peer" does not contain "websocket: close 1006 (abnormal closure): unexpected EOF"
                Test:           TestConnectionClosureAbnormal/closure_abnormal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests user request Requested by the community
Projects
None yet
Development

No branches or pull requests

5 participants