Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent "session is already started" error for bluetooth tests #22619

Closed
jdm opened this issue Jan 5, 2019 · 17 comments
Closed

Intermittent "session is already started" error for bluetooth tests #22619

jdm opened this issue Jan 5, 2019 · 17 comments

Comments

@jdm
Copy link
Member

@jdm jdm commented Jan 5, 2019

Traceback (most recent call last):
  File "/Users/servo/buildbot/slave/mac-rel-wpt4/build/tests/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/protocol.py", line 48, in setup
    self.connect()
  File "/Users/servo/buildbot/slave/mac-rel-wpt4/build/tests/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/executorservodriver.py", line 100, in connect
    self.session.start()
  File "/Users/servo/buildbot/slave/mac-rel-wpt4/build/tests/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 421, in start
    value = self.send_command("POST", "session", body=body)
  File "/Users/servo/buildbot/slave/mac-rel-wpt4/build/tests/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 472, in send_command
    raise err
SessionNotCreatedException: session not created (500): Session is already started


  ▶ ERROR [expected OK] /bluetooth/characteristic/getDescriptors/gen-characteristic-is-removed.https.html
  └   → ''

I recall seeing this in #21370 as well.

@CYBAI CYBAI mentioned this issue Jan 8, 2019
4 of 4 tasks complete
@jdm jdm mentioned this issue Jan 8, 2019
4 of 5 tasks complete
bors-servo added a commit that referenced this issue Jan 9, 2019
Reduce bluetooth test parallelism.

This might work around #22619.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/22662)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Jan 9, 2019
Reduce bluetooth test parallelism.

This might work around #22619.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/22662)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Jan 9, 2019
Reduce bluetooth test parallelism.

This might work around #22619.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/22662)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Jan 10, 2019
Reduce bluetooth test parallelism.

This might work around #22619.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/22662)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Jan 10, 2019
Reduce bluetooth test parallelism.

This might work around #22619.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/22662)
<!-- Reviewable:end -->
@aditj
Copy link
Contributor

@aditj aditj commented Mar 9, 2019

Hey @jdm I was looking at the issue, have you figured out why this is being caused? Why is a new session being created when a session already exists?
Is this because this check is not working ? or some other issue perhaps?
Also is the webdriver ready for multiple sessions?

@jdm
Copy link
Member Author

@jdm jdm commented Mar 9, 2019

I have not figured out why yet. I believe it should be possible to add support for multiple sessions to the server, though.

@georgeroman
Copy link
Contributor

@georgeroman georgeroman commented Mar 19, 2019

I tried to replicate this error locally but instead of the SessionNotCreatedException here is what I've got:

[ERROR script::dom::bindings::error] Error at https://web-platform.test:8443/bluetooth/resources/bluetooth-helpers.js:177:10 Illegal constructor.
[ERROR script::dom::bindings::error] Error at https://web-platform.test:8443/bluetooth/characteristic/getDescriptors/gen-characteristic-is-removed.https.html:11:18 Illegal constructor.
@jdm
Copy link
Member Author

@jdm jdm commented Mar 19, 2019

It should be possible to reproduce by running ./mach test-wpt --processes 8 --headless --product=servodriver bluetooth. We should file a separate issue for the Illegal constructor exceptions triggered by https://github.com/web-platform-tests/wpt/blob/e45bda7edddd2cffb4169efe2f61812537b09abd/bluetooth/resources/bluetooth-helpers.js#L177.

@aditj
Copy link
Contributor

@aditj aditj commented Mar 20, 2019

@jdm , i couldnt reproduce this error all my tests ran as expected , can this error be system specific?

@aditj
Copy link
Contributor

@aditj aditj commented Mar 20, 2019

@aditj Try modifying the test's expectation to PASS in https://github.com/servo/servo/blob/master/tests/wpt/metadata/bluetooth/characteristic/getDescriptors/gen-characteristic-is-removed.https.html.ini.

Oh !I am now getting the Illegal constructor error but not the SessionNotCreatedException

@jgraham
Copy link
Contributor

@jgraham jgraham commented Apr 30, 2019

I have not figured out why yet. I believe it should be possible to add support for multiple sessions to the server, though.

Adding multiple sessions to the server probably isn't that easy in the sense that the webdriver crate isn't designed to have multiple clients at a time, and ensuring the sessions are isolated on the servo side seems like it would be a lot of work since you need to maintain a map between session id and all the resources associated with that session.

Possibly this just means "shutting down a session and starting a new one without quitting servo", which I think is much more reasonable. However in this case I guess there's some race condition in shutting down the session which allows us to try to start a new one before the previous one is finished.

@hundredeir
Copy link
Contributor

@hundredeir hundredeir commented May 7, 2019

As I can see whenever any test function runs it calls a fixture which tries to create a new session and if gets SessionNotCreated error then it raises the exception. And between two test functions there isn't any cleanup/teardown happening.
So, when the next function is called the webdriver doesn't even let the message reach servo and itself returns an error.

@georgeroman
Copy link
Contributor

@georgeroman georgeroman commented May 8, 2019

I managed to reproduce the error locally and I noticed that two different processes attempted to start WebDriver on the same port:
servo1

A few lines down in the log there are two attempts to starting a WebDriver session on the same port:
servo2

Next, the error was raised. From what I investigated, the ports are chosen here and that code is vulnerable to race conditions.

@jdm
Copy link
Member Author

@jdm jdm commented May 8, 2019

That looks extremely suspicious!

bors-servo added a commit that referenced this issue May 23, 2019
…th_tests, r=jdm

Reintroduce parallelism in the bluetooth tests

<!-- Please describe your changes on the following line: -->
With web-platform-tests/wpt#16819 in place, #22619 should be fixed, so the changes introduced by #22662 can be reverted.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/23407)
<!-- Reviewable:end -->
@atouchet
Copy link
Contributor

@atouchet atouchet commented May 23, 2019

Is this issue fixed now?

@georgeroman
Copy link
Contributor

@georgeroman georgeroman commented May 24, 2019

The issue is fixed for the bluetooth tests, but it's still there in the new_session wdspec tests.

@georgeroman
Copy link
Contributor

@georgeroman georgeroman commented May 24, 2019

So, I further investigated what causes the Session is already started error in the new_session wdspec tests and I found out that the WebDriver instances that get assigned to each process each have a port of 4444. Here are some debug prints illustrating this:
servo

Although the port retrieval in _find_next_free_port is protected with a lock, it must be the case that each process has a different copy of the WebDriverServer class and in this case the lock is irrelevant.

@georgeroman
Copy link
Contributor

@georgeroman georgeroman commented May 27, 2019

So, after lots of debugging, I finally managed to understand the real cause of the error in the wdspec new_session tests.

When running the wdspec tests in parallel, wpt starts new processes having as target the start_runner function. Here, the executor is ServoWdspecExecutor, which has an associated ServoDriverServer. So, the port initialization occurs for each instance of ServoDriverServer in a different process. Although _used_ports is a class variable and is protected by a lock from concurrent access, since we are in different processes, each process will have a different copy of it. In other words, _used_ports is not shared between the processes (the changes made by one process to it are not visible by the other processes) and the lock that protects it is useless since there is only one process that modifies a particular copy of _used_ports. Because the port initialization is not synchronized we have multiple processes trying to create new sessions on the same WebDriver port and so we get those Session is already started errors.

Why didn't this also happen in the other occurrences of get_free_port (here for example)? Because there the initialization occurs in threads, not processes, and each thread shares the same used_ports variable. (I've used multiprocessing.Lock for protecting used_ports, but threading.Lock would have also been good enough in this case).

I thought about several possible solutions:

  • run the new_session tests in a single process
  • find a way to share _used_ports between processes
  • modify wptrunner to do the port initialization in threads instead of processes
  • support multiple sessions in WebDriver
@georgeroman
Copy link
Contributor

@georgeroman georgeroman commented May 31, 2019

Good news! Once we update web-platform-tests to include this commit, this issue will be fixed! (As noted here there is still a small chance of getting this error, but the probability of it happening is quite low)

@georgeroman
Copy link
Contributor

@georgeroman georgeroman commented Jun 5, 2019

This can be closed now!

@jdm jdm closed this Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.