-
-
Notifications
You must be signed in to change notification settings - Fork 335
Empty pool of VM for setup Capabilities #296
Comments
Hi @thorsteneckel, You can check the {
"status": 0,
"value": {
"ready": true,
"message": "Hub has capacity",
"build": {
"revision": "aacccce0",
"time": "2018-08-02T20:13:22.693Z",
"version": "3.14.0"
},
"os": {
"arch": "amd64",
"name": "Linux",
"version": "4.9.93-linuxkit-aufs"
},
"java": {
"version": "1.8.0_181"
}
}
} and when "ready": equals to true, that means that the Grid has nodes registered to handle incoming requests. |
Hi @diemol - thanks for your fast reply. That's great to know! I wrote a small test script that requests a new session (via I get this exception (from the ruby context) while the server hasn't started yet:
Next exception I get is the one described in my first comment (nice!):
This is where the Selenium server usually gets stuck in my scenario. However, as opposed to the situation described in the first comment my local test gets through after a few seconds and successfully creates a session 🤔 I'll now try to fetch the |
So I did as described and wrote a little test script (see below). Unfortunately the Any thoughts on what to try/debug next? Help is greatly appreciated. Here is the script: http_client = Selenium::WebDriver::Remote::Http::Default.new(
open_timeout: 120,
read_timeout: 120
)
uri = URI.parse(ENV['REMOTE_URL'])
uri.path += '/' unless uri.path.end_with?('/')
http_client.server_url = uri
Timeout.timeout(10.minutes) do
begin
loop do
response = http_client.call(:get, 'status', {})
if !response['status'].zero?
puts "Got unexpected response: #{response}"
next
end
break if response['value']['ready']
puts "Not ready yet: #{response['value']['message']}"
end
rescue EOFError
puts "Server started but status endpoint isn't ready yet"
retry
rescue Errno::ECONNREFUSED
puts 'Server is down'
retry
end
end
puts 'Ready to roll 🚀' OT: @diemol - if you are into stickers make sure to write a mail to contact@zammad.com and refer to me. We will send you some! |
That's strange, we use that approach widely, as a smoke test to see that the Grid is up. We documented it a bit more in the official images here. We just normally start the images (via docker-compose or just docker commands) and then use the script to wait for it. Thanks for offering the stickers! I see you are in Berlin, I am also helping to organise the Berlin Selenium Meetup, and I'd like to do one in February 2019... Maybe you would like to show how you do testing at Zammad? (Or any topic related to testing that you might want to show :)) |
I totally agree. There's nothing special about the setup and 99% of the jobs run smoothly, perfectly fine. However, if there is anything I can do debug this further (by myself) please let me know. I think I didn't mention it earlier but I got access to the container in the broken state and can execute commands and everything. Wow - thanks for the invitation. I'd love to! Actually we just finished the pretty big migration (by our standard) of our GitLab CI SSH runners to docker powered containerization and there's always some knowledge to share. Our comprehensive Selenium/browser test suite took a big part in it. Don't hesitate to get in touch via contact@zammad.com and refer to me. |
Guys, I am experiencing same issue, as described here problem is that for some reasons chrome node is starting to connect and gets stuck
good scenario:
I am experiencing this issue with latest docker-selenium bundle - selenium:3.141.59
What can I do to debug this issue? |
If it's of any interest - we're using a custom TZ ( @andrejska - no new information/progress from my side. Sill happening in about 1% of the runs. Would be interesting which processes are not starting correctly. Maybe some |
Hi again, sorry for the late reply (holidays and family matters during these weeks).
|
Side node @thorsteneckel, just sent an email about the Selenium Meetup to contact@zammad.com :) |
Hi @diemol - no need to feel sorry. There is nothing we can ask from you. I hope you had a good time.
In our case it's done automatically by the GitLab CI runner. However, I could verify the steps @andrejska posted locally. It took me about 20 tries.
Sure. you can find everything from the
I can reproduce it on Linux (CoreOS, CentOS, Ubuntu) and Mac giving it enough tries. Thanks for your help and email! I already replied and noted in my head that |
How do you reproduce it on a Mac? I was trying but I haven't bumped into it... Could you share the command you used to reproduce? |
I wrote a little ruby script (tested with The output looks like this:
As stated in the output you can enter the container via You can set the following ENVs if you need to use other commands/URLs etc. I added the default values. The used values get printed in the header output:
I noticed this time that the container has a pretty high load and seems to be in an endless loop. See the
Let me know if you need anything further. |
Hi everyone, just reporting here again, currently I am trying @thorsteneckel's ruby script, will report back with my findings during the day. |
OK, issue reproduced, now adding |
Sadly the debug log does not reveal anything special, it just shows the two registration requests from the nodes and then @andrejska mentions, the CPU usage goes up to the roof. I'll try to debug a bit more to see if I find something particular. |
Do you guys know when this started to happen? Is it possible for you to pin a specific release? @thorsteneckel @andrejska |
Hi @diemol - thanks for your efforts so far! Unfortunately I can't pin it down. It started while migrating our CI to use docker instead of a local Selenium installation. This was in November, maybe October. So we "always" had this issue. My colleagues are using the docker image in other, smaller projects from time to time for quite a while but I never heard of it before. It may have never happed on that smaller scale. I was looking around and thinking of a way to help. There is a great profiler called rbspy for ruby that gives you a flame graph / stack trace of a running ruby script. Unfortunately my Java days are long gone and I wasn't able to find anything similar for Java :/ So I'm no help here. If there's anything I can contribute, please let me know. |
I think the root cause is this issue SeleniumHQ/selenium#6918 But we have to see what we can do because ideally we are not going to release anything else after 3.141.59 |
It would be fine for me to use an older version of |
After having a look at the version information and tags of this repo/image it's pretty obvious that they match the underlying Selenium version. I'll switch our CI over to version |
I had a quick read through this ticket and it does in fact seem like our issue described by 6918 is probably the root cause of this issue. We injected a quick patch into our grid and the problem went away. The "proper" fix was a bit more involved and a PR was just submitted for review. It also worth mentioning that when we had throwOnCapabilityNotPresent set to true, we saw the "Empty pool of VM for setup ..." error message. After flipping that flag to false, everything hung up because the nodes would never finish registering and the hub would just hang onto the new session requests indefinitely. |
@diemol I am quite sure that it started with What is your plan to deal with this until Selenium hasn't fixed these issues as latest docker-selenium release are not usable and we are quite behind latest Chrome version :( |
Rollback to 3.14.0 due to #296 until: SeleniumHQ/selenium#6918 SeleniumHQ/selenium#6924
all: thank you so much for nailing this down! Rolled back to the last working version via #305 36f32a6 For now, until Selenium fixes upstream. Latest release now has latest Chrome, latest Firefox, last working Selenium version: |
…g CI jobs until elgalu/docker-selenium#296 is resolved.
Closed via #319 |
Reopening, wasn't really included in the release 🤦♂ |
Hi there 👋Thanks a lot for this great docker image 🐳
We over at Zammad use your image in our GitLab CI env. In a bit more than 1% of all runs jobs fail with the following error message (in
/var/log/cont/selenium-hub-stderr.log
):I noticed a differences between a regular run in (and only) the
selenium-hub-stderr.log
: The twoRegistered a node
lines are missing.The timestamps look like as if there might be a race condition between registering the nodes and requesting a new session. However, I started debugging and found out that (after the failing "new session" request) no new sessions can be generated - wether firefox nor chrome - even after 10 minutes and more.
After googling around I found issue #64 - which is now more than 2 years old. Back in the days @elgalu wrote to use
docker exec selenium wait_all_done 30s
which now seems to be deprecated.GitLab CI uses a HEALTHCHECK (which shouldn't be available in the first place?) to check the containers but logs the following for selenium on every run (successful and unsuccessful):
However, I was wondering how to approach this these days? How can I resolve this?
I have collected all the logs I could get but found no valuable lines (in my eyes) except the ones I posted. Please let me know if you need any of the others.
Looking forward to read from you 👋
Operating System
Darwin MacBookPro 2016 18.0.0 Darwin Kernel Version 18.0.0: Wed Aug 22 20:13:40 PDT 2018; root:xnu-4903.201.2~1/RELEASE_X86_64 x86_64
Linux CI ENV 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Image version
docker pull elgalu/selenium
Docker version
docker --version
:Docker version 18.09.0, build 4d60db4
The text was updated successfully, but these errors were encountered: