Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Grid is going down everyday and I need to manually restart the hub #14467

Closed
abhi-iac opened this issue Sep 2, 2024 · 9 comments
Closed

Comments

@abhi-iac
Copy link

abhi-iac commented Sep 2, 2024

What happened?

Everyday at specific timeframe my grid is going down, making to fail the all the pipelines. Grid is good on weekends as no pipelines are running against it.

I am really unsure why the grid is going down even though there are sufficient sessions avaialable. While running the pipeline the error shows "Cannot find session with id: 92598c7905b9a20bfcbe25414a79b6de Build info: version: '4.4.0', revision: 'e5c75ed026a' System info: host: 'xxxx', ip: 'x.x.x.x', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16.1' Driver info: driver.version: unknown"

But once the pipeline completes the error is "OpenQA.Selenium.WebDriverException : The HTTP request to the remote WebDriver server for URL http://selenium-grid.unum.com:4444/wd/hub/session/2044c12416348b5c4f61ca618922146d/element timed out after 120 seconds"

I found these other bugs but those are on Docker
SeleniumHQ/docker-selenium#2135
#14322

How can we reproduce the issue?

This is happening at specific time

Relevant log output

2024-08-30T08:05:53.5372772Z   Stack Trace:
2024-08-30T08:05:53.5372897Z      at OpenQA.Selenium.Remote.HttpCommandExecutor.MakeHttpRequest(HttpRequestInfo requestInfo)
2024-08-30T08:05:53.5373067Z    at OpenQA.Selenium.Remote.HttpCommandExecutor.Execute(Command commandToExecute)
2024-08-30T08:05:53.5373245Z    at OpenQA.Selenium.Remote.RemoteWebDriver.Execute(String driverCommandToExecute, Dictionary`2 parameters)
2024-08-30T08:05:53.5373419Z    at OpenQA.Selenium.Remote.RemoteWebDriver.FindElement(String mechanism, String value)

Operating System

Windows server 2019 Datacenter

Selenium version

openjdk 11.0.16.1 2022-08-12 LTS

What are the browser(s) and version(s) where you see this issue?

Chrome 128.0.6613.85

What are the browser driver(s) and version(s) where you see this issue?

ChromeDriver.128.0.6613.84

Are you using Selenium Grid?

Selenium 4.152369

Copy link

github-actions bot commented Sep 2, 2024

@abhi-iac, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@diemol
Copy link
Member

diemol commented Sep 2, 2024

There is no information we can use to triage this. In addition, you are using Selenium 4.4.

Please update to the latest (4.24) on both tests and Grid and open a new issue with the complete template filled out if you still face the problem.

@diemol diemol closed this as not planned Won't fix, can't repro, duplicate, stale Sep 2, 2024
@abhi-iac
Copy link
Author

abhi-iac commented Sep 5, 2024

Even if it is the version discrepancy then it should throw the error rather it is throwing error like "The HTTP request to the remote WebDriver server for URL http://selenium-grid.xxxxxxxx.com:4444/wd/hub/session timed out after 120 seconds. ---> System.Net.WebException: The request was aborted: The operation has timed out" we have timeout set to default 300.

Do you have any better explanation?

@diemol
Copy link
Member

diemol commented Sep 5, 2024

4.4 was released two years ago, and the Grid has had many bug fixes and changes. I cannot give you an explanation why, my assumption is that session creation took too long and the client timed out.

@abhi-iac
Copy link
Author

abhi-iac commented Sep 5, 2024

We are seeing this error due to Selenium grid itself was down randomly that is the reason we are seeing this error. But really can't able to determine why the grid is going down. Anyway, thank you for your reply. We will debug it more and see.

@abhi-iac
Copy link
Author

Digged in more and added logs to distributor, router, sessionqueue, session and event-bus services on my server. As I mentioned I can see these logs everyday at specific time on router and distributor services. With your expertise have you saw something similar in past?

PS: I know I haven't updated the grid version, there is dependency on this grid for some automation pipelines
Router

04:08:12.823 WARN [SeleniumSpanExporter$1.lambda$export$1] - Unable to execute request for an existing session: Unable to find session with ID: 62a591c058a29e59e9dcc254adb27b28
Build info: version: '4.4.0', revision: 'e5c75ed026a'
System info: host: 'xxxxxxx', ip: 'xxxxxxx', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16.1'
Driver info: driver.version: unknown
Build info: version: '4.4.0', revision: 'e5c75ed026a'
System info: host: 'xxxxxxx', ip: 'xxxxxxx', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16.1'
Driver info: driver.version: unknown
04:08:12.823 WARN [SeleniumSpanExporter$1.lambda$export$1] - org.openqa.selenium.NoSuchSessionException: Unable to find session with ID: 62a591c058a29e59e9dcc254adb27b28
Build info: version: '4.4.0', revision: 'e5c75ed026a'
System info: host: 'xxxxxxx', ip: 'xxxxxxx', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16.1'
Driver info: driver.version: unknown
Build info: version: '4.4.0', revision: 'e5c75ed026a'
System info: host: 'xxxxxxx', ip: 'xxxxxxx', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16.1'
Driver info: driver.version: unknown
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.openqa.selenium.remote.ErrorCodec.decode(ErrorCodec.java:134)
at org.openqa.selenium.grid.web.Values.get(Values.java:48)
at org.openqa.selenium.grid.sessionmap.remote.RemoteSessionMap.makeRequest(RemoteSessionMap.java:119)
at org.openqa.selenium.grid.sessionmap.remote.RemoteSessionMap.get(RemoteSessionMap.java:90)
at org.openqa.selenium.grid.router.HandleSession.lambda$loadSessionId$4(HandleSession.java:159)
at io.opentelemetry.context.Context.lambda$wrap$2(Context.java:224)
at org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:122)
at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.grid.router.Router.execute(Router.java:91)
at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

04:08:12.823 WARN [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "7506a231bc88ffffed52b53f4822f12b","eventTime": 1726474092826299300,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: Unable to find session with ID: 62a591c058a29e59e9dcc254adb27b28\nBuild info: version: '4.4.0', revision: 'e5c75ed026a'\nSystem info: host: 'xxxxxxx', ip: 'xxxxxxx', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16.1'\nDriver info: driver.version: unknown\nBuild info: version: '4.4.0', revision: 'e5c75ed026a'\nSystem info: host: 'xxxxxxx', ip: 'xxxxxxx', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16.1'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.NoSuchSessionException: Unable to find session with ID: 62a591c058a29e59e9dcc254adb27b28\nBuild info: version: '4.4.0', revision: 'e5c75ed026a'\nSystem info: host: 'xxxxxxx', ip: 'xxxxxxx', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16.1'\nDriver info: driver.version: unknown\nBuild info: version: '4.4.0', revision: 'e5c75ed026a'\nSystem info: host: 'xxxxxxx', ip: 'xxxxxxx', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16.1'\nDriver info: driver.version: unknown\r\n\tat java.base\u002fjdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\r\n\tat java.base\u002fjdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\r\n\tat java.base\u002fjdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\r\n\tat java.base\u002fjava.lang.reflect.Constructor.newInstance(Constructor.java:490)\r\n\tat org.openqa.selenium.remote.ErrorCodec.decode(ErrorCodec.java:134)\r\n\tat org.openqa.selenium.grid.web.Values.get(Values.java:48)\r\n\tat org.openqa.selenium.grid.sessionmap.remote.RemoteSessionMap.makeRequest(RemoteSessionMap.java:119)\r\n\tat org.openqa.selenium.grid.sessionmap.remote.RemoteSessionMap.get(RemoteSessionMap.java:90)\r\n\tat org.openqa.selenium.grid.router.HandleSession.lambda$loadSessionId$4(HandleSession.java:159)\r\n\tat io.opentelemetry.context.Context.lambda$wrap$2(Context.java:224)\r\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:122)\r\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\r\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\r\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\r\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\r\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\r\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\r\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\r\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\r\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\r\n\tat org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)\r\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\r\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\r\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\r\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\r\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\r\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\r\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\r\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\r\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\r\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\r\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\r\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\r\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\r\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\r\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\r\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\r\n","exception.type": "org.openqa.selenium.NoSuchSessionException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "selenium-grid.unum.com:4444","http.method": "POST","http.request_content_length": "58","http.scheme": "HTTP","http.target": "\u002fsession\u002f62a591c058a29e59e9dcc254adb27b28\u002felement","http.user_agent": "selenium\u002f4.13.0 (.net windows)","session.id": "62a591c058a29e59e9dcc254adb27b28"}}

Distributor
04:02:37.375 INFO [LocalDistributor.newSession] - Session created by the Distributor. Id: 62a591c058a29e59e9dcc254adb27b28
Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 128.0.6613.138, chrome: {chromedriverVersion: xxxxxxx (fe621c5aa2d..., userDataDir: C:\Users\xxxxxxx\AppData\L...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:57324}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: WINDOWS, proxy: {}, se:cdp: ws://xxxxxxx:4444/ses..., se:cdpVersion: 128.0.6613.138, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
04:02:40.867 INFO [GridModel.release] - Releasing slot for session id b458e3630a7536e4bd052db11d7dfd5b
04:02:51.443 INFO [LocalDistributor.newSession] - Session request received by the Distributor:
[Capabilities {browserName: chrome, goog:chromeOptions: {args: [--headless=new, --force-device-scale-factor...]}}]
04:02:52.365 INFO [LocalDistributor.newSession] - Session created by the Distributor. Id: 4a2f14540412670cd1c5770e87630903
Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 128.0.6613.138, chrome: {chromedriverVersion: xxxxxxx (fe621c5aa2d..., userDataDir: C:\Users\xxxxxxx\AppData\L...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:55155}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: WINDOWS, proxy: {}, se:cdp: ws://xxxxxxx:4444/ses..., se:cdpVersion: 128.0.6613.138, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
04:03:49.882 INFO [GridModel.release] - Releasing slot for session id c2e7c4e4134d3b56418fe183809e22be
04:03:59.771 INFO [GridModel.release] - Releasing slot for session id ccdde5b4f88b73e871cdd443f6fb8072
04:04:50.078 INFO [GridModel.release] - Releasing slot for session id 32d93f13b41e65419578b620d14208d4
04:07:49.876 INFO [GridModel.release] - Releasing slot for session id 62a591c058a29e59e9dcc254adb27b28
04:07:50.073 INFO [GridModel.release] - Releasing slot for session id e4c0b07b66500c2ccdef79120316d8fe
04:07:59.792 INFO [GridModel.release] - Releasing slot for session id 173f192b17c00e4cdcb39161d60e4e85
04:07:59.960 INFO [GridModel.release] - Releasing slot for session id ad16c800b59e1ae987aa72e177db41f2
04:08:20.057 INFO [GridModel.release] - Releasing slot for session id 4a2f14540412670cd1c5770e87630903
04:16:03.649 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
04:16:03.659 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
04:16:03.739 INFO [UnboundZmqEventBus.] - Connecting to tcp://xxxxxxx:4442 and tcp://xxxxxxx:4443
04:16:03.799 INFO [UnboundZmqEventBus.] - Sockets created
04:16:04.804 INFO [UnboundZmqEventBus.] - Event bus ready
04:16:05.477 INFO [DistributorServer.execute] - Started Selenium Distributor 4.4.0 (revision e5c75ed): http://xxxxxxx:4447

If you see the time frame after "04:16:03.649" distributor was up because I have killed the task Distributor and Router. I have sufficient memory allocated for all the services.

@abhi-iac
Copy link
Author

Hello guys do anyone had similar issue? Unable to find the resolution it is been a while

@abhi-iac
Copy link
Author

abhi-iac commented Oct 8, 2024

After upgrading the grid to 4.24.0 and added few arguments in distributor and router. Grid looks stable now

Copy link

github-actions bot commented Nov 7, 2024

This issue has been automatically locked since there has not been any recent activity since it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants