Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) #1802

Open
eravion opened this issue Oct 21, 2022 · 15 comments

Comments

@eravion
Copy link

eravion commented Oct 21, 2022

Hello,

I have an issue and open two ticket on Selenium and RobotFramework :
SeleniumHQ/selenium#10258
robotframework/robotframework#4194

Version installed :
Python 3.8

Version of Lib :
robotframework 4.1.2
robotframework-appiumlibrary 1.6.3
robotframework-pythonlibcore 3.0.0
robotframework-requests 0.9.2
robotframework-seleniumlibrary 5.1.3
selenium 4.1.0

Selenium grid hub and nodes Version 4.4.0

May I ask you to have a look please ?

@emanlove
Copy link
Member

emanlove commented Oct 23, 2022

@eravion Do you have a python traceback stack on when this occurs? See it in the issue robotframework/robotframework#4194

@emanlove
Copy link
Member

I'm going through the two mentioned issues as well as a general search on that error. Looking at the traceback in #4194 it looks to me as if the connection this error response is coming from is between selenium and webdriver.

It interesting that you are seeing it on Get Element Count xpath and Go To. So there is nothing special about these two and all selenium python calls to the webdriver go through similar code. Knowing that one thing I am wondering or thinking about is that I could see these taking some time; especially the xpath count one. Few questions here ..

  • Are you seeing it with other keywords?
  • What is the ratio of times it occurs with Get Element Count xpath as compared to Go To?
  • Is it always with the same element or elements with xpath count and the same page with go to? Or does it seems random?
  • Do you see any difference in the amount of time one the webdriver request and response between times it fails and when it passes?

@emanlove
Copy link
Member

Are you seeing this on one browser versus another? Are you primarily using/testing against one browser?

@eravion
Copy link
Author

eravion commented Oct 25, 2022

Hello,

We have the issue on both Firefox / Chrome.

Some stats on the last 7 days, focus on 4 scenarios.

7 days stats      
Function Nb error Nb usage % error
Click Element 187 26208 0,71
Wait Until Page Contains Element 1 12096 0,01
Wait Until Element Is Visible 74 19152 0,39
Input Text 81 15120 0,54
Get Text 105 2016 5,21
Get Element Count 85 1008 8,43
Input Text 81 15120 0,54
Click Button 1 5040 0,02
Go To 1 4032 0,02
Wait Until Page Contains 1 31248 0,00
Wait Until Page Contains Element 1 12096 0,01

Stats on the last 7 days, focus on 2 other scenarios.

Function Nb error Nb usage % error
Get Element Count 83 2016 4,12
Run Keyword If 82 3360 2,44
Input Text 37 4032 0,92

Answer to your question :

  • Is it always with the same element or elements with xpath count and the same page with go to? Or does it seems random?

Not the same for a scenario. I think it is Random.

  • Do you see any difference in the amount of time one the webdriver request and response between times it fails and when it passes?

Not really.

When it fail, sometime it is quick
Exemple on one fail : Wait Until Page Contains DevEUI, timeout=10 00:00:00.285
Same which is OK: Wait Until Page Contains DevEUI, timeout=10 00:00:01.074

Sometime not
Exemple on one fail : Wait Until Element Is Visible xpath=//[@id="dropdown-languages-btn"] 00:00:01.305
Same which is OK : Wait Until Element Is Visible xpath=//
[@id="dropdown-languages-btn"] 00:00:00.226

For information, in any cases, the screenshot is ok, after the error message, I have a screenshot automatically done by the scenario.

@emanlove
Copy link
Member

Some background on my question. You noted this does not occur with selenium 2.53 but does so with 4.x versions so I diff'd the py branch and in particular the remote_connection.py file (with,

git diff selenium-2.53.1 HEAD -- py/selenium/webdriver/remote/remote_connection.py

There was, as expected, some changes there. I was tracing out some perceived differences in the headers and in how Chrome setup its headers. Thus the question about Chrome (and Firefox). But hearing your response and looking thorough that code that seems a bit of a dead end.

And the Get Element Count I thought might be a time sink in that it could take a fair amount of time. Was curious as to whether or not they all took time and some just too much time. But your stats show it is a bit more random in the keywords and in the length of time it takes to fail.

A few more questions ..

  • When you are running selenium version 2.53 which version of Robot Framework and SeleniumLibrary are you running?
  • Do you execute the tests with loglevel set to TRACE and if so anything interesting with the selenium webdriver network traffic?

@emanlove
Copy link
Member

One more question .. are you able to run, maybe a subset of tests, without the grid? Just trying to eliminate where the issues might be coming from ..

@eravion
Copy link
Author

eravion commented Oct 25, 2022

About Robot Framework 2.53:
robotframework (3.0)
robotframework-appiumlibrary (1.4.6)
robotframework-browsermobproxylibrary (0.1.3)
robotframework-requests (0.7.2)
robotframework-ride (1.5.2.1)
robotframework-selenium2library (1.8.0)
selenium (2.53.6)

I added " Set Log Level TRACE" on Suite Setup on a scenario, let's wait for an error.

I cannot run a test without grid.
For security reason, robot are open only through the grid.

About selenium 4, we updated Python lib
robotframework 6.0
robotframework-appiumlibrary 1.6.3
robotframework-pythonlibcore 3.0.0
robotframework-requests 0.9.2
robotframework-seleniumlibrary 6.0.0
selenium 4.5.0

Same issue.

@eravion
Copy link
Author

eravion commented Oct 25, 2022

20221025 13:59:10.836 - INFO - +-- START KEYWORD: SeleniumLibrary.Click Element [ xpath=//button[@Class='btn try-out__btn'] ]
20221025 13:59:10.837 - INFO - Clicking element 'xpath=//button[@Class='btn try-out__btn']'.
20221025 13:59:10.837 - DEBUG - POST http://X.X.X.X/wd/hub/session/d20e7b23-9ca2-485b-ad7a-395ad1cb7fd6/elements {"using": "xpath", "value": "//button[@Class='btn try-out__btn']"}
20221025 13:59:10.839 - DEBUG - GET http://X.X.X.X/wd/hub/session/d20e7b23-9ca2-485b-ad7a-395ad1cb7fd6/screenshot {}
20221025 13:59:10.839 - DEBUG - Starting new HTTP connection (2): X.X.X.X:80
20221025 13:59:11.018 - DEBUG - http://X.X.X.X:80 "GET /wd/hub/session/d20e7b23-9ca2-485b-ad7a-395ad1cb7fd6/screenshot HTTP/1.1" 200 183076
20221025 13:59:11.094 - DEBUG - Remote response: status=200 | data={"value":" XXXXX Dump of Screenshot XXXXX"} | headers=HTTPHeaderDict({'Date': 'Tue, 25 Oct 2022 13:59:10 GMT', 'Se
rver': 'Apache', 'content-length': '183076', 'content-type': 'application/json; charset=utf-8', 'cache-control': 'no-cache', 'X-Frame-Options': 'sameorigin', 'Keep-Alive': 'timeout=5, max=100', 'Connect
ion': 'Keep-Alive'})
20221025 13:59:11.097 - DEBUG - Finished Request
20221025 13:59:11.098 - INFO - XXXX Link to screenshot XXXX
20221025 13:59:11.099 - FAIL - ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
20221025 13:59:11.102 - DEBUG - Traceback (most recent call last):
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 1322, in getresponse
response.begin()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 303, in begin
version, status, reason = self._read_status()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 272, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/init.py", line 496, in run_keyword
return DynamicCore.run_keyword(self, name, args, kwargs)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/robotlibcore.py", line 94, in run_keyword
return self.keywords[name](*args, **(kwargs or {}))
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/keywords/element.py", line 658, in click_element
self.find_element(locator).click()
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/base/context.py", line 82, in find_element
return self.element_finder.find(locator, tag, True, required, parent)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/locators/elementfinder.py", line 93, in find
return self._find(locators[-1], tag, first_only, required, element)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/locators/elementfinder.py", line 123, in _find
elements = strategy(criteria, tag, constraints, parent=parent or self.driver)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/locators/elementfinder.py", line 181, in _find_by_xpath
parent.find_elements(By.XPATH, criteria), tag, constraints
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 889, in find_elements
return self.execute(Command.FIND_ELEMENTS, {
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 427, in execute
response = self.command_executor.execute(driver_command, params)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/remote_connection.py", line 344, in execute
return self._request(command_info[0], url, body=data)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/remote_connection.py", line 366, in _request
response = self._conn.request(method, url, body=body, headers=headers)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 78, in request
return self.request_encode_body(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/poolmanager.py", line 375, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 1322, in getresponse
response.begin()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 303, in begin
version, status, reason = self._read_status()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 272, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
20221025 13:59:11.102 - INFO - +-- END KEYWORD: SeleniumLibrary.Click Element (266)

@emanlove
Copy link
Member

I recall you mentioning the Screenshots on failure and raised the question whether this could be the problem. I don't think it is. There's a strange, to me at least, ordering with the screenshots and the last error. For some reason which I don't recall the screenshots appear before the last error log / traceback.. So it has always appeared to me that the screen shots occur before the for which they do not. That is just an artifact of the logging order.

Now with that said I would be curious to see what happens in this case if you turn them off on library import with something like,

*** Settings ***
Library  SeleniumLibrary  run_on_failure=None

Again I don't think this is the cause but I would think that it doesn't help trying to grab this fairly large request. What is noticeable in the TRACE level output is that there are two calls for the screenshot because the first one fails. Marking up that input I see

Element find where     20221025 13:59:10.837 - DEBUG - POST http://X.X.X.X/wd/hub/session/d20e7b23-9ca2-485b-ad7a-395ad1cb7fd6/elements {"using": "xpath", "value": "//button[@Class='btn try-out__btn']"}
failure occurs
First screenshot on    20221025 13:59:10.839 - DEBUG - GET http://X.X.X.X/wd/hub/session/d20e7b23-9ca2-485b-ad7a-395ad1cb7fd6/screenshot {}
failure
??? notices connection 20221025 13:59:10.839 - DEBUG - Starting new HTTP connection (2): X.X.X.X:80
lost so retries
Second screenshot on   20221025 13:59:11.018 - DEBUG - http://X.X.X.X:80 "GET /wd/hub/session/d20e7b23-9ca2-485b-ad7a-395ad1cb7fd6/screenshot HTTP/1.1" 200 183076
failure
Screenshot response    20221025 13:59:11.094 - DEBUG - Remote response: status=200 | data={"value":" XXXXX Dump of Screenshot XXXXX"} | headers=HTTPHeaderDict({'Date': 'Tue, 25 Oct 2022 13:59:10 GMT', 'Server': 'Apache', 'content-length': '183076', 'content-type': 'application/json; charset=utf-8', 'cache-control': 'no-cache', 'X-Frame-Options': 'sameorigin', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive'})
Screenshot response    20221025 13:59:11.097 - DEBUG - Finished Request
Robot logging the      20221025 13:59:11.098 - INFO - XXXX Link to screenshot XXXX
screenshot
The connection dropped 20221025 13:59:11.099 - FAIL - ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
error (with odd robot
log placement after
screenshot log)	    

So the screenshot also notices the dropped connection. Now I am not sure who is retrying that one and by what mechanism .. because looking into the traceback I see the element find also does a retry .. here,

File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 78, in request
return self.request_encode_body(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/poolmanager.py", line 375, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(

with it eventually in that traceback

File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()

which on that response it fails when trying to read the response

File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 1322, in getresponse
response.begin()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 303, in begin
version, status, reason = self._read_status()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 272, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

I'm curious as to these two different retries. In the screenshot one it seems to be a somewhat higher level method, guessing within selenium webdriver "protocols", to retry. Whereas the first (which appears second due to that odd logging order of screenshot before last error) seems a low level communication protocol / urllib retry. I do notice with the second screenshot webdriver call the session is still the same. I was wondering if the difference between the two is the screenshot retry is doing something in addition to just sending a webdriver call .. although i would expect to see that .. ???

One thing I might try in this situation given I could do this cleanly without breaking my build system is to patch selenium to turnoff connection manager's retries [code]. I have not tried this code but I was thinking something like

         pool_manager_init_args = {"timeout": self.get_timeout(), "retries": 0}

!!! Caution !!! Again I warn you that this is not my system so be very careful if you do try this and know how to change it back if you do experiment with this. I don't expect this to totally fix this as I would think the connection would still break. Its just that low level retry would not occur. And what I would want it to do is the same as the screenshot in this example is successfully retry ..

@eravion
Copy link
Author

eravion commented Oct 26, 2022

I disable screenshot, here is a trace :

20221026 06:59:05.578 - INFO - +-- START KEYWORD: SeleniumLibrary.Click Element [ xpath=//button[@Class='btn try-out__btn'] ]
20221026 06:59:05.579 - INFO - Clicking element 'xpath=//button[@Class='btn try-out__btn']'.
20221026 06:59:05.579 - DEBUG - POST http://X.X.X.X/wd/hub/session/9fd69a97-71ff-4d00-a8a5-2a5476e7762a/elements {"using": "xpath", "value": "//button[@Class='btn try-out__btn']"}
20221026 06:59:05.580 - FAIL - ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
20221026 06:59:05.583 - DEBUG - Traceback (most recent call last):
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 1322, in getresponse
response.begin()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 303, in begin
version, status, reason = self._read_status()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 272, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/init.py", line 496, in run_keyword
return DynamicCore.run_keyword(self, name, args, kwargs)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/robotlibcore.py", line 94, in run_keyword
return self.keywords[name](*args, **(kwargs or {}))
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/keywords/element.py", line 658, in click_element
self.find_element(locator).click()
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/base/context.py", line 82, in find_element
return self.element_finder.find(locator, tag, True, required, parent)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/locators/elementfinder.py", line 93, in find
return self._find(locators[-1], tag, first_only, required, element)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/locators/elementfinder.py", line 123, in _find
elements = strategy(criteria, tag, constraints, parent=parent or self.driver)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/SeleniumLibrary/locators/elementfinder.py", line 181, in _find_by_xpath
parent.find_elements(By.XPATH, criteria), tag, constraints
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 889, in find_elements
return self.execute(Command.FIND_ELEMENTS, {
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 427, in execute
response = self.command_executor.execute(driver_command, params)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/remote_connection.py", line 344, in execute
return self._request(command_info[0], url, body=data)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/remote_connection.py", line 366, in _request
response = self._conn.request(method, url, body=body, headers=headers)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 78, in request
return self.request_encode_body(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/poolmanager.py", line 375, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 1322, in getresponse
response.begin()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 303, in begin
version, status, reason = self._read_status()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/http/client.py", line 272, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
20221026 06:59:05.583 - INFO - +-- END KEYWORD: SeleniumLibrary.Click Element (5)

@emanlove
Copy link
Member

That is pretty much what I had expected, no effect.

If I get a chance, and it might not be today, I am curious as to how the screenshot request "recovers" from the dropped connection. From what I see these webdriver requests are retrying but failing. Also note that to me it does look like issue is on the selenium/grid/webdriver side.

@emanlove
Copy link
Member

If there is some grid logging as well as browser side webdriver and/or browser logging, I would suggest diving into that too. There might be a clue in there.

@eravion
Copy link
Author

eravion commented Oct 26, 2022

The trace while opening the session on the test and trace from the hub.

20221026 06:58:36.656 - DEBUG - Starting new HTTP connection (1): X.X.X.X:80
20221026 06:58:53.493 - DEBUG - http://X.X.X.X:80 "POST /wd/hub/session HTTP/1.1" 200 2951
20221026 06:58:53.493 - DEBUG - Remote response: status=200 | data={
"value": {
"sessionId": "9fd69a97-71ff-4d00-a8a5-2a5476e7762a",
"capabilities": {
"acceptInsecureCerts": false,
"browserName": "firefox",
"browserVersion": "104.0.1",
"moz:accessibilityChecks": false,
"moz:buildID": "20220829141339",
"moz:firefoxOptions": {
"profile": "UEsDBBQAAAAIAFI3WlVjabclngMAAP0MAAAHAAAAdXNlci5qc6VWy47bNhTd9ysCrxqgJjyTZtOuJpMpUCDoFDEGWRIUeSVxTJEseWmN\u002f76XkpW4Hr2S7vQ4h\u002fd9LlOEwH2A8udNEVxLb0y5hrXaKtcylRrPwIrCgNr88gZDgre\u002f\u002f5Rec4xACCHZC3ApTJxAW2hRFCwFQ7iNKFzC3woj7GEzh\u002feignXnRxQBk2eZQcjdPKp2DZyRK3y5ZnF3pMi1AtZEdLY7RVfWBZg4gD6pR\u002fulS3C82TFtMbinz5\u002fmzeeiNOKFyzqQYR5l0B45ZZyjbrLRd7txwjpkAO8CcmEMf44cXiR41M7G0aI\u002fi6Poj2Wuh7FYu\u002fZPe0+PzsAoyQK2LhyYFEQ5wjbbE2ZLgKOWC3WNIFPQeGIyDu04AbwuTgtG5nwtttockwmldI5TLBzSQqECBRe4kDmDPFFtU0RQXELA8WxekGJM2eJXkqYPEEZZ1DXkV2+FPKkS+ZzP35ThyqkKLARhct8FAlkq2rljM\u002fyeLLlme5d\u002fXBGF9yx5RZPNREI3nvEL0KrRVDp6I06gHm52+78cUu2J8OsE2LXWOKGokS25Hro2+1KD3edqaVvN23p4+nR3Gey0fHXId+d8Lkmdtgfqe7A0JiR6fVPcLmFJwnpNnQa7siT8RGcPoABC5cwAgiRP\u002f9Ah4h3lQk6M3VfZEiV0z5S1lSJ6yWiEaUVYK78ggqzPbbEENWVMPovBsA6mdBdizFKDpKyUhW5SStJCLoOI9YKVGoxhsgZ5+AilSAY\u002f9H\u002fmabRzIqOw7aO9Ny4uhHKBfqSKj4MpI6LX2pzVGoTBun9nq5Rwhp98HpWHH2QP1svcUJ\u002fT97tvXFXlj7LfAd\u002fhh3dGyxPLH\u002fepaHRX5\u002f\u002fL\u002f7v7ePdtoMeOgSM6ZyIjhXDh7PjsdSevSlKwDODdWCvaS7biOfPjNrJG2K5xOxH92JP3kthZrm92k\u002fDCOHkwOuJ8R1wwuga\u002fd40XqAtt8sK0uqrRnBapayT8NdySfpenp8lBurgozGV1uBr0Kr896+CWVjKmOH7yQKkR6ZpX61iftvm3px26NWArzJJw+\u002f79f2nD0bS1qB7GuJYXJ656TRj1zZPEeIx5unnjFO3RdTeVUhugJ+aCprHgfY\u002fmy9cCPmLQEvk1bcxUvmVaksIGmgLCvnudSFdu84NGdk7bsE7PrTwa+EBBMGQBw4n1qegiv13CzibpNTzAc7fORj158Zpkn6rFcsjUE6T+xPgn6TBl4RulrTWZyVM0wvgXUEsBAhQDFAAAAAgAUjdaVWNptyWeAwAA\u002fQwAAAcAAAAAAAAAAAAAALSBAAAAAHVzZXIuanNQSwUGAAAAAAEAAQA1AAAAwwMAAAAA"
},
"moz:geckodriverVersion": "0.31.0",
"moz:headless": false,
"moz:platformVersion": "5.10.0-9-amd64",
"moz:processID": 231382,
"moz:profile": "\u002ftmp\u002frust_mozprofileuXzw6W",
"moz:shutdownTimeout": 60000,
"moz:useNonSpecCompliantPointerOrigin": false,
"moz:webdriverClick": true,
"moz:windowless": false,
"networkname:applicationName": "XXXX",
"pageLoadStrategy": "normal",
"platformName": "LINUX",
"proxy": {
},
"se:cdp": "ws:\u002f\u002f172.17.0.2:4444\u002fsession\u002f9fd69a97-71ff-4d00-a8a5-2a5476e7762a\u002fse\u002fcdp",
"se:noVncPort": 7900,
"se:vnc": "ws:\u002f\u002f172.17.0.2:4444\u002fsession\u002f9fd69a97-71ff-4d00-a8a5-2a5476e7762a\u002fse\u002fvnc",
"se:vncEnabled": true,
"se:vncLocalAddress": "ws:\u002f\u002f172.17.0.2:7900",
"setWindowRect": true,
"strictFileInteractability": false,
"timeouts": {
"implicit": 0,
"pageLoad": 300000,
"script": 30000
},
"unhandledPromptBehavior": "dismiss and notify"
}
}
} | headers=HTTPHeaderDict({'Date': 'Wed, 26 Oct 2022 06:58:36 GMT', 'Server': 'Apache', 'content-length': '2951', 'Cache-Control': 'no-cache', 'Content-Type': 'application/json; charset=utf-8', 'X-Frame-Options': 'sameorigin', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive'})
20221026 06:58:53.493 - DEBUG - Finished Request

:~$ sudo docker logs selenium-hub | grep --color 9fd69a97-71ff-4d00-a8a5-2a5476e7762a
06:58:53.469 INFO [LocalDistributor.newSession] - Session created by the Distributor. Id: 9fd69a97-71ff-4d00-a8a5-2a5476e7762a
06:59:06.351 INFO [GridModel.release] - Releasing slot for session id 9fd69a97-71ff-4d00-a8a5-2a5476e7762a
06:59:06.351 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local Session Map, Id: 9fd69a97-71ff-4d00-a8a5-2a5476e7762a

I also added a timeout in the load of the lib :
Library SeleniumLibrary run_on_failure=None timeout=600s

I'll check in the futur if it help or not and let you know.

@eravion
Copy link
Author

eravion commented Oct 26, 2022

And we are using docker for node / hub.
Maybe it can be linked to this part?

@emanlove
Copy link
Member

Could be .. I once had a server running some service virtualization tool. The servers and their setup and configuration were under a highly skilled team and I was just using the software. I started to have a problem that the virtualization would fail and on immediate retry pass. Then if left alone for a few minutes again fail on the first try and succeed every time afterwards till we were quiet for a while. Repeat. Turns out there was a load balancer in the middle which was shutting down and the service virtualization tool was not picking up the dropped connection/fixing it till my retry.

From all I can see it looks like somewhere the communication channel from selenium through the grid to the webdriver. And from the logs it appears to be either grid or webdriver that initially drops the channel but selenium's retry fails. So grid is definitely in play as well as the docker instances that contain those parts. The drop channel could be from a crash (maybe unpreventable) or from some part of the communication chain breaking down. And there is some retry but that fails. So I would be examining in how the communication chain works and any logs from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants