New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selenium tests are flaky #625

Closed
mythmon opened this Issue Mar 22, 2017 · 6 comments

Comments

Projects
None yet
3 participants
@mythmon
Member

mythmon commented Mar 22, 2017

We are often getting test failures due to Selenium timing out finding elements, being unable to connect to the test server, or simply failing for reasons I can't figure out. Retrying the CI build generally makes the problem go away, but it reduces our confidence in CI an slows us down.

We should make Selenium more robust. Possibly by increasing timeouts, waiting for the server to be ready better, or intelligently retrying when we find errors.

@Osmose

This comment has been minimized.

Member

Osmose commented Mar 22, 2017

@chartjes @stephendonner Any tribal knowledge on how we historically deal with issues like this in Selenium automation?

@Osmose

This comment has been minimized.

Member

Osmose commented Mar 23, 2017

For context, an overview of our current test setup:

  • Selenium tests are in the functional_tests directory
  • runtests.sh is where most of our setup is scripted. We run the site with docker-compose, as well as using the public selenium/standalone-firefox:3.0.1-fermium docker image to get Selenium running.
  • All this happens on CircleCI against a local server. Our tests are not (yet) able to be targeted towards a live server.
  • Here's an example of one of our failures, in this case a failure attempting to connect to the selenium server (I think): https://circleci.com/gh/mozilla/normandy/1892. But, as mythmon mentioned above, we've had several types of failures with our Selenium tests, including elements just not being found on the page.
  • The tests themselves are in test_basic.py, supported by my page object implementation in utils.py. The admin interface we're testing is a single-page JS app.

@davehunt On IRC, @stephendonner mentioned you might have some ideas on common stability issues with running Selenium in the way that we do. Any thoughts?

@davehunt

This comment has been minimized.

Member

davehunt commented Mar 23, 2017

Your example of https://circleci.com/gh/mozilla/normandy/1892 does indeed look like the Selenium standalone server is not running when the test tries to create a new session. When starting the selenium/standalone-firefox:3.0.1-fermium container, you may want to wait for the log to contain "Selenium Server is up and running" or that it you can connect to it on a port mapped to 4444.

For elements not being found, looking at your page object implementation you may need to tweak your waits so that they're not only waiting for elements to be present but also visible. You may be hitting a race condition where elements exist but are not yet in a state that they can be interacted with. FYI we use a page object package named PyPOM to build our page objects and regions, though with a simple single page JS app you might consider this overkill.

For the waits, there are also several expected conditions provided in a support module within the Selenium package. I'm not a big fan of these, but they can be useful even if just as a model for writing your own. Sometimes you need to be specific about what you're waiting for. I also always advise an explicit wait after any interaction rather than before an interaction. For example, I click a checkbox and then wait for a dropdown to be populated.

As for intelligently retrying tests, this is possible but I would advise caution. There is a pytest-rerunfailures plugin that will either rerun marked flaky tests or all failing tests up to a maximum number of times. This of course will add to your test execution time, and may even mask real intermittent issues. If you do enable this I highly recommend starting with a single rerun so that you're alerted if something fails twice in a suite. My recommendation with intermittent failures is to fix them or delete them, otherwise you can lose trust in your suite and start to assume failures are just flaky tests.

Additionally, something we're doing to battle flaky tests is sending all of our test results to ActiveData. This allows us to query, analyse, and create visualisations for results. If we have a way to track intermittent failures, and the impact on suite duration when introducing or increasing automatic reruns I am much more comfortable with this feature being used. I've just seen too many cases of increasing timeouts and automatic reruns causing suites to take hours when a critical regression is introduced. Reporting to ActiveData is simply a case of publishing a raw structured log to S3. You can read more about this here, and it's not only limited to functional UI tests.

@davehunt

This comment has been minimized.

Member

davehunt commented Mar 24, 2017

Another suggestion for weeding out intermittent failures is to use the pytest-repeat plugin. With this you can run the collected tests multiple times to ensure they're reasonably robust. Combine it with -x to exit on the first failure if you want to walk away from your machine while it runs, or even --pdb to enter an interactive debug session when a failure occurs.

@mythmon

This comment has been minimized.

Member

mythmon commented Mar 27, 2017

Here is a log of when Selenium failed to connect: https://gist.github.com/mythmon/f4c4802d9f691b239a6ce34ffab16c9d

@mythmon mythmon closed this Mar 27, 2017

@mythmon mythmon reopened this Mar 27, 2017

@mythmon

This comment has been minimized.

Member

mythmon commented Oct 18, 2017

The functional tests, which this applied to, have been removed.

@mythmon mythmon closed this Oct 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment