Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixtures with class scope not working as expected with --dist=loadscope option #501

Closed
fpiccione opened this issue Feb 9, 2020 · 4 comments

Comments

@fpiccione
Copy link

I need to run about 10 tests that depend on a set up done with a fixture, it all works fine without parallelism using a class fixture and tests within class.
Once I add the xdist parallelism I experience the following:
When I do not use loadscope, all the tests are sent to their own worker. I do not want this because I would like to only build the fixture once and use it for all related class tests.

When I use loadscope, all the tests are executed against gw0 and I am not getting any parallelism.

Exactly same as experienced on this thread by someone else:
https://stackoverflow.com/questions/51756594/pytest-xdist-indirect-fixtures-with-class-scope

Is there another syntax or way of achieving a class level group of tests (that depends on one session level fixture) to make sure only one worker will go through
The specific use case is having to process each file in a session fixture list and based on the output of this process, run 10x tests. I can't afford xdist to give some of these 10x tests to one worker and some to other as they both end up processing same file again. Cannot afford to move this to conftest and do it session-scoped as each worker will pay the penalty of processing each file.

The wiki says that tests are grouped by class scope when using this loadscope option, so failing to understand why this is not working. Is there an issue/collision that the class refers to another session level fixture (not stand alone)?

@SalmonMode
Copy link

SalmonMode commented Feb 9, 2020

@fpiccione it's the other way around. Running all the tests in a single test class on a single worker is the intended use case for the loadscope distribution. For tests that aren't in a test class, they are grouped based on the module they're defined in.

The parallelism provided by this plugin is done by grouping to chunks of tests into "test/work groups", where those tests are effectively ran together, but in complete isolation from any other test/work groups, at least in regards to runtime. Even if 2 groups run on the same worker, it should be assumed that the previous group would have torn down anything it set up, and all fixture cached values were destroyed. So if one group executes a fixture, no other group would have access to its cached value, meaning those other groups would view that fixture as having not been run.

The idea is that a single group is comprised of tests that share a resource that was expensive to setup. The other groups have different expensive resources, so lumping all the tests for this group into the one worker that did their expensive setup frees up the other workers to do the expensive setups for those groups in the same manner. The stuff that comes after the expensive setup is relatively less expensive, so breaking up the less expensive stuff of a single group over multiple workers would only delay the expensive setups for those other groups. This means that breaking things up this way is the most optimal.

For example, running 1000 asserts is significantly less expensive than launching a browser and doing a bunch of stuff for a single e2e test.

Even if the resource isn't that expensive to setup, grouping tests that perform the same steps is a good idea so that steps aren't repeated when it isn't needed.

However, if you have an expensive to setup resource that you want to share across multiple groups, then I would recommend either not having that resource's setup be tied to the tests (e.g. setting up the database for the backend of your service), or leveraging lock files as a means of communication between workers so they can know what has and hasn't been done, and what is currently being done.

@fpiccione
Copy link
Author

fpiccione commented Feb 9, 2020

Thanks for the quick reply @SalmonMode !

Am I right in understanding that the current implementation only works if you have say 3x workers and 3x separately defined classes, so gw0 picks class1, gw1 picks class2, gw2 picks class3?
So basically use of fixtures within classes are not considered currently to parallelise?

I'm new to xdist so maybe I'm just implementing this wrongly.

My high level requirement is:

  • Obtain a list of files
  • Execute a setup/process on each one of the files. Each file has to be processed just once as it's expensive. This is where I really need parallelism :)
  • Run 10x tests based on the output of the process executed against each file (same worker that does set up needs to run these tests, this setup is no longer required for any other tests/other workers shouldn't need it)

My current implementation is:

  • 1x session scope fixture obtaining a list of files
  • 1x class using that session scope fixture
  • 1x class-scope fixture outside the class that executes something against each one of the files in the session fixture.
  • 10x tests inside that class that require the output of the class-scope fixture.

Other options I tried resulting in similar behavior were to:

  • run the process inside a setup_class within class instead of fixture
  • moved class scope fixture inside the class

Is this doable today to be run in parallel with xdist? Do I have to resort to similar session scope techniques such as locks/semaphores or writing to files using them as flags?

Similar code is shared in link in post above.
Thanks again

@SalmonMode
Copy link

Am I right in understanding that the current implementation only works if you have say 3x workers and 3x separately defined classes, so gw0 picks class1, gw1 picks class2, gw2 picks class3?

Almost. There would be 3 groups, yes. But which worker gets which group is decided in a round robin fashion, i.e. first come first serve. When a worker has no group to process, it goes to the master process and has it pop a group off of the work queue so the worker can then go and do that group of tests. So which worker gets which group is not decided beforehand.

So basically use of fixtures within classes are not considered currently to parallelise?

It doesn't have to do with the scope of the fixtures, and it won't be considered in the future by default. When a worker gets a group of tests, it runs the first test function/method in the group. In order to run it, it needs to look at what fixtures should apply to it and run those first. Once those are all done, it runs the test, and then moves on to the next test function/method.

The process is repeated here, except fixture caches are now available (this is where scope comes in), so pytest factors those in when deciding what to keep, and what to teardown and re-execute.

When all the test functions/methods in that group are finished, all the work done is effectively tossed out the window. If this wasn't done, it could cause some nasty race conditions. So by the time that worker starts running the tests in the next group, everything from the previous group it ran would be torn down. This means that even session, package, or module scoped fixtures are torn down between groups run by the same worker (and other workers), and, even if running in the same worker, the next group will have to do everything as if it what called on its own.

It sounds like you're trying to go backwards with this a bit. You want to parallelize your code execution, not parallelize test groups in isolation from each other. The tests that come after the expensive setup don't need to be split up over multiple workers, as I mentioned in my previous comment. Pytest-xdist isn't meant to parallelize your code for you. It just takes groups of tests and runs them in isolation from each other at the same time, as if they had been individually invoked from multiple terminals at the same time, e.g. calling pytest in mytest.py::TestSomething in one terminal and then calling pytest on mytest.py::TestSomethingElse in another terminal at the same time. This is significantly easier to do than parallelization your code for you because every test should be capable of running on its own, and calling pytest on individual things in multiple processes is effectively what the plugin is doing.

I think what you want to do is build your own multi-process utility that can parse those files in parallel. This is not something this plugin can do for you, and making your own utility would eliminate the need to use this plugin (at least in regards to these specific tests, as they could all just go in the same group).

@fpiccione
Copy link
Author

Thanks again for you speedy reply
Taking the 'setup' process outside pytest/xdist is not something I really want to do at this stage.

I will play around with this alternative to implement a lock mechanism for each file so that each worker running a session fixture will pick a new file not being processed/already processed.
I can then split the tests within the class outside to enable them to run in parallel as well.

making session scoped fixtures execute only once

I will close this issue as I misunderstood the capabilities of the current implementation.
Thanks for your help clarifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants