-
-
Notifications
You must be signed in to change notification settings - Fork 765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethinking ordering API and use cases #2251
Comments
Thanks for the feedback. It's going to take me a while to respond to all of it, so I'm going to start responding in this first comment, and circle back around in later comments for other bits. First off, it's important to realize that RSpec generally has no concept of files as it relates to ordering. RSpec's ordering logic operates only on example groups and examples. RSpec interacts with your spec files only to load the specs, but the order the files are loaded is not necessarily the order the specs run in. RSpec uses a multi-pass approach:
So the order that RSpec loads your files doesn't directly determine the order your specs run in. However, if you use the As for the reason we sort the files before loading them...there's an interesting history here (part of which I had forgotten until digging into it just now).
So, I think we can revert the "sort the entire list of files" change from #717. What I'd like to do is only apply alphabetical sorting to the file glob calls (to ensure determinism) while respecting the list of file/directory arguments you pass at the command line, so that if you run @e2, if we implement that, would it go most of the way towards meeting your needs? |
I'm not sure what out-of-the-box support for unit+acceptance testing would look like, or what you even mean by this. Is it just wanting to run unit specs before acceptance specs? |
I don't understand this point at all. In every programming language I've ever used, "random" is really just pseudo-random, with a seed that fixes the randomness. In what way do you find it unintuitive that the seed makes it reproducible instead of truly random? Also, the entire point of the seed system is to have a way to reproduce a particular ordering -- which would be impossible if not for the fact that the seed fixes the ordering.
I don't understand what seems magical about this. The idea is that you may have needs for a particular example group to be ordered differently from the overall ordering you are using for the suite. This is commonly the case when you are converting an existing suite from defined ordering (the old standard) to random ordering (what we recommend now). When you are converting it can be useful to go piece meal and to convert one group at a time to random ordering and/or to enable random ordering overall but configure particular groups to still run in defined order until you fix the issue. Is the name "global" part of the confusion? FWIW, we discussed the name at length in #1025. We were going to call it "default" (as it is the ordering strategy that is used by default, unless you set
Can you explain this point more? I don't understand what you mean. |
I completely disagree with this entire bullet point. Random ordering has proven useful to me (and teams I've worked on) on many occasions. The feedback I've gotten from the community (particularly about the The point of random ordering is to surface ordering dependencies as soon as possible (ideally while you are developing the code changes that caused them, while it is cheap to fix). Random ordering only on CI doesn't work very well to quickly surface ordering dependencies because you need many runs to have a high likelihood of exposing the issue. Locally, you may run the suite 10s or 100s of times as part of preparing a PR, but on CI it may just once.
It sounds like you use RSpec for very different types of projects than the ones I've seen it used on, because I have seen many, many situations where there are ordering dependencies:
Random ordering was definitely the impetus for us supporting ordering strategies at all and remains, IMO, the main use case. |
FWIW, we originally had separate APIs for sorting examples and groups (see #547 and some of the attached commits for what we originally added). However, in #1025 we decided to consolidate on one strategy instead of 2 after a bunch of discussion (including involving some people who had used RSpec 2's ordering API). Please read #1025 for background on this. Given that we used to have separate ordering strategy for examples vs groups, and then simplified to a single API based on open feedback and discussion (at which time no one voiced an opinion in favor of keeping it separate), I think it's unlikely we'd go back to splitting them again, without a number of voices from the community explaining why they'd like to see it change back. |
I wish I knew that sooner. Thinking about RSpec like a "database" really helps.
For a CI, yes. For a development workflow, it can be really annoying. And for me, "alphabetical" is pretty much the same as "deterministically random" anyway. I agree users shouldn't attempt to order tests, except for workflow convenience. You can change the order using an ordering strategy, though many use cases just need "temporary storting". I mean that 3-5 files need to be tested in some order when making changes across files. Once that works, that order is never again needed. So using it as a configuration option doesn't make sense, since that order isn't something you'd commit into a repository.
Basically yes. Unless running multiple RSpec processes is recommended instead (via commandline or via multiple runners through configuration). Thanks for the explanation - I'm definitely going to need it. Recursion is a bit of a surprise, but it makes sense because it fits into how things are processed. It explains the sorting calls as well.
That's interesting. I was rather convinced it had something to do with shell argument expansion. E.g. if I pass files in a given order, I expect that order to be respected, even if the files in the dirs (e.g. files = runner_args.map do |arg|
arg.dir? ? Dir["#{arg}/**/*_spec.rb"].sort.uniq : arg
end.flatten # don't sort the whole result Which would sort files when globs are used, but preserve the overall order I'd pass on the command-line. Parameters have to be consistent anyway for a seed to produce deterministic results. Files are usually independent, so there's little gain in sorting them unless they use global variables or some common resource. (Though that might happen). It should be up to the Random strategy to shuffle files according to the seed. There's no need to do that on a full list, though. E.g. Most was mentioned in the links you gave so, so thanks. The idea of not naming it "alphabetical" make sense, since "alphabetical" is simply just a "hashing" algorithm where each file obviously has a unique name where it can be sorted/shuffled consistently. I see 3 types of "ordering" being used/mixed here:
This way the only time you might get a different failure is when you both leave out a parameter and have tests that are order-dependent. The drawback is that this would introduce a "file group", where each parameter ( The problem with Array#shuffle is: it doesn't do what the intention of using it needed. It should be encapsulated in a method that sorts the input before shuffling it. That way you don't have to sort file lists anywhere else in RSpec. That way the whole "alphabetical" concept would be buried deep inside this To be clear, yes, I'd revert #717 change, except I'd sort directory contents instead: File.directory?(path) ? gather_directories(path, patterns).sort : extract_location(path) in here: paths.map do |path|
files = paths.map do |path|
# (...)
File.directory?(path) ? gather_directories(path, patterns) : extract_location(path)
end.flatten # no sorting here
end This way you keep consistency stays within the argument/directory instead of being global across all files found. I'm not sure about distinguishing between user-supplied arguments and e.g. configuration. An array of files/dirs should be preserved I think.
Yes, absolutely! That would help a lot, because randomizing pretty much gets in the way if the user supplies custom paths. If I run the full tests, expecting them to pass, I don't care whether the order is random or not (so random order is useful when full suite is run). I'm guessing a lot of users are expecting custom-supplied arguments to be honored with their order. Especially if failures are expected. |
Yes. It's about knowing how to achieve that in a recommended way. If test order randomizing is important, there should be a common setup available for describing this. I'm thinking Order randomizing is recommended, but there's no "official recommendation" about how to design a workflow (e.g. slow/acceptance tests last). It's a "common" use case with no easily available cheatsheet to reference to achieve it. That's what I mean by "out of the box support" - even an "official recommendation" with a complex custom sort strategy is better than "guessing" how to achieve such a setup. (Or wasting time setting up scripts to run multiple invocations of RSpec).
|
We'd only every not want to |
It's unreasonable for RSpec to specially treat a I mentioned somewhere that likely many people are using RSpec + Cucumber simply as an alternative to not knowing how to achieve unit tests + acceptances tests within one process. rspec-rails is a lot less useful when integration and unit tests are mixed up together. Running each manually is no fun either.
I agree. It is best to "deal with it as soon as possible". But it also should be possible to "opt-out" of it you're dealing with failures across a whole stack (unit <-> functional <-> acceptance). That's also what a seed is for: reproducing errors. And if I have multiple errors, I expect to get the same failures in the same order. If there's a order dependency issue, sure I want to fix that immediately. But if there are no dependency issues, I don't want to get a different order of failures every time. I can set the seed to a fixed value. But then I'll most likely forget, effectively turning off the dependency order checking anyway. In short, it's like I want the ordering fixed, but not by a seed+alphabetical combination. Sometimes my custom order will be permanent (unit vs acceptance), sometimes it's temporary (particular bug across layers in a stack). I don't want to reproduce dependency order failures in this case. I want to reproduce my own set of failures. During TDD I'm mostly dealing with fixing failures. On a CI, I want failures to occur. Before a push, I don't mind even running tests 3 times: once in defined order, second in reverse, and a bonus "random" order too. I'm a bit torn here, because my needs are tied to a specific workflow, and not something generic. But defining such a workflow within a single order strategy definition in a configuration spec ... that seems like a stack of hacks using ordering for something it wasn't designed for.
I've always understood the concept of random and seeds in the context of RSpec. What I never understood was: why am I getting different failures in a different order every time? I can set the seed, but then the order is "fixed" in technical terms, but as a software developer looking at the screen, it's still looks random every time. If tests were always ordered, I'd get a "feel" for which failure maps to which file. With the progress formatter, I could "learn" that "the fifth dot" is spec/foo_spec.rb. And that if I see about 30 lines of document formatter output, I know that the failure is at 3/4 of the spec file. But with shuffled examples, it's a bit like I'm working a slightly different project every time. It might make sense to have 2 rspec instances running. One "ordered" for development and one for "finding order dependency problems" - which can work in the background while I'm fixing the development issues. But of course RSpec has no way of distinguishing dependency order failures from everything else. I'm just saying there's a perception/focus cost to having tests randomized. Especially when failures occur.
I agree, but there may simply be better ways of shuffling examples to increase the likelyhood of detecting issues. e.g. If you have
The examples seem right. It's also probably why I use mocking a lot more extensively - and it's also why I rely on acceptance/integration tests to prevent the issue of mocking "too much". That's why I want to always get the mocking working first - until then, acceptance tests are broken by definition. Mocks isolate so much already, that randomizing just gets in the way. Still, databases are hard to mock without them hiding bugs, and filesystem touching tests are notoriously time-consuming to get right. Instead of "testing for order dependency issues" I'd much rather be "protected" by them in some way. I agree that detecting them early is crucial though. Sadly, that doesn't come free. The best thing I can think of right now - to get the best of both worlds - is: "turning ordering off for selected tests". E.g. unit tests with mocks (which are independent by default) probably aren't going to benefit from test shuffling. Integration tests are probably going to benefit the most. I'm more trending towards simply avoiding dependencies rather than trying to detect them. A poorly written test doesn't mean the code is going to fail in production, while running each acceptance test in e.g. Docker for guaranteed isolation. Kind of like mocks, because they don't use host resources either. So probably there are some niche cases like mine where shuffling is just cost, overhead and little gain. |
Cargo culting is one of the biggest problems in Ruby development, it helps no-one; our aim should be to educate users how to use our APIs, not give them copy and pasteble snippets to bootstap faster.
You should get the same failures in the same order with
Then you've not understood the concept of random.
For you, I've certainly never found this to be the case, where the failures occur in the test run is usually unimportant, after finding failures most people will run just the failures until they're fixed. This is why we introduced
Randomisation is a way to check you're isolated.
You can set ordering per example group, and given it's just ruby you could even turn that on and off as you saw fit. e.g.
If you successfully avoid dependencies randomisation causes you no issues, but running each test truely isolated is beyond the scope of RSpec. |
If I could travel through time (to 2012), I'd suggest a few things:
If you ever invent a time machine, let me know... ;) As for the
To summarize, I just want to run some tests before others. Ideally in a generic way that doesn't require adding a complex, brittle block to every project. Ideally something I can unit test, so even if the behavior ever changes (RSpec 4?), I can update it in one place, without going through multiple projects and spec_helper files to fix it. My options until now have been:
Preserving the file argument order would be awesome, because it's intuitive and at least gives me some execution consistency - especially during "one-off" situations. The discussions linked were enlightening - thanks for looking for them. I never would've thought many of the ideas would've been recurring today still. I appreciate the work and thought that was put into this more. Actually it's part of the reason I struggled with this - I assumed I was doing something wrong, since RSpec has such a long history and widespread usage. |
You can pass any instance of any object that responds to config.order :global, MyOrderingStratergy.new(config.seed) |
Suggestions for "applies globally as the default ordering strategy" then, remembering that we have no control of the implementation. The name |
If your block is more complex than you'd like write a gem to do this and pass in as a callable object, or delegate to it within the block. Your gem is free to register this as a specific named ordering strategy so you can use |
The problem with the sorting strategy was that an example would be much better education for me. It's not obvious that the shuffling was shared - I would've needed to dive into the codebase or issue archives to find that out. I'm not into remembering methods or trying to recall if It's not like typing a few hundreds of characters from memory every time is a win, either. The idea to copy (for me at least) is not to avoid reading, but to avoid making mistakes and spending lots of time figuring out what I got wrong. If I can paste something that works, I can get on with my life. Ideally though, I wouldn't have to paste anything and I'd just get the order I want in some intuitive way. Like changing one setting from x to y. (And not registering a whole custom strategy for something I believe is "typical").
Yes, but this has 2 problems:
I meant that the failures in the same files with different seeds results in a different order of failures. Yes, this is how it works, but it's annoying to be debugging and getting the same failures in a different order every time.
Yes, but the actual order of those failures is randomized based on seed. E.g. what if I'd like to first see the failures in the Rails models first? I can set the seed to a fixed value, but then I've gotten rid of randomizing altogether. (Until I remember to unset it or after a fresh checkout). This kind of shows how useless the randomization is. If it gets in the way I can disable it. I don't want to manually have to set the seed and select a different set of files every time there's a failure during refactoring. If I was to do that, I might as well just create a script to run each file one by one in a separate process with the order I want. But that would negate using RSpec as a runner.
I don't want to "check" if I'm isolated. I know I'm isolated when using mocks. That's how mocks work. If I don't use instance variables or globals or include modules, it's not like Ruby scope will leak by accident. The point is that randomization is so hard-backed in, I can't seem to have a sane way of avoiding it if I write fully isolated tests.
By "tests" I meant "groups", not examples. Having examples shuffled is annoying too (when my using doc formatter to check if I've covered all the edge cases), but much less so.
For such isolated tests in the past I've used something like My point here: if isolation is important, there are more robust and powerful ways. If it's just an "addon bonus", it would be nice to be able to opt-out if I'm not interested. A custom sort strategy doesn't help unless I work out how to access the list of specs/spec dirs passed to RSpec, partition examples by those file/dir arguments and sort examples by the order defined, while possibly including any example-specific ordering defined by metadata. That's an extremely complex way of "run these tests in the order I pass on the command-line". A multi-process approach is the best I can do now until command-line argument order is honored. (It would be AMAZING if it was!). |
I never even got through responding to your initial comment, and now there are like 4 pages more to read through. I'm sorry, but I don't have the time to read through everything you've posted. If you could ask for one specific thing it would be easier to respond and hopefully meet your needs than when you have multiple comments that are hundreds or thousands of words each. Anyhow, I've started working on a fix locally (and will hopefully have a PR up in the next day or two) to respect the ordering of the files/directories passed at the CLI when RSpec loads spec files. I hope this will meet your needs. |
No problem. Respecting the ordering solves everything for me. I'd much rather not take your focus away. Thanks for everything you do! |
BTW, I will respond to one thing: "why global?" Individual example groups can choose how to order themselves via Sorry if it's confusing. |
@JonRowe -
That should be documented somewhere. I thought you could only pass a symbol. Looks like a great starting point to insert a plugin. Thanks!
There really should be separate examples for:
But I'm glad I've gathered all this info, so thanks. |
The best short explanation I can think of is "not-from-metadata" or "sort regardless of metadata". What I feel I'm doing here that's "weird" is: I'm considering using ordering for defining a custom workflow with RSpec. E.g. I'd like failures to impact which tests are run and possibly detect dependencies between files to impact the order of failures. I'm saying "weird" here, because according to the discussions, ordering was supposed to be only used for outdated order-dependent tests. And not for "workflow optimizing" like I feel I need. Timeout for me. I hope I can repay you guys for the attention, answers and insights. |
I wouldn't say that's accurate at all. The main use cases we were solving was supporting random ordering, and being able to control ordering on a per-example group level with metadata, but we did not want to limit ordering to just those use cases -- so we exposed an API that allows you to configure ordering with a block of arbitrary logic. |
@myronmarston - works great. One issue though is: if you pass the file multiple times on the command-line, it's run multiple times. E.g. So But it works beautifully otherwise. If I don't specify an order, it works and also doesn't shuffle examples (no seed shown). |
Is that an observed result? The |
I tested it locally and observed that the spec file is loaded twice. The problem is that spec files from the dir glob get expanded but the files passed as CLI were left as is, so the I've fixed it with my latest push. |
One really cool thing is: putting a slow failing integration feature first with fail-fast ... and if it passes the rest of the suit is run minus that already-passing spec. No need to "rerun everything" after a fix makes things green. Others may complain that this is all file/dir based, but as is it couldn't make me happier :) |
Sounds like your use case has been addressed so I'm going to close this. |
…e CLI. This fixes rspec#2247 and addresses some of the feedback from rspec#2251.
TL;DR - Summary
The way ordering is configured and handled in RSpec is far from intuitive and the use cases aren't compelling or practical IMHO, even though they're potentially extremely useful if properly understood and conveniently implemented.
Goal
Ideally, learning from experiences with existing/previous API and current potential use cases would help design a flexible, maintainable, clear, configurable, extensible and simple new API.
So first I'd like to just collect all the issues and useful use cases in one place, so that any new API proposal would have to take all or most into account.
Earlier discussions
I started with a need to run specs in a different order than "alphabetical":
Then I unintentionally started a long discussion in the wrong place (PR comments) here: #2248 (comment)
List of issues with current/past issues with ordering API
1. Invisible algorithms.
It's not clear to the user (no output, insufficient docs) how things are being ordered within RSpec and why. E.g. it's not clear files are sorted alphabetically for the sake of providing consistent input for seeding. There's no doc describing even how the ordering works, e.g. that:
2. Cmdline order of files not respected.
This is counter-intuitive, because it's alphabetical, even if the order mode is
:random
. Even if it "makes sense" to make randomized orders deterministic, that's already too much "know how".3. Not out-of-the-box support for unit+acceptance testing
No useful, intuitive example for a folder named 'spec/acceptance` where the user expects tests there to be run last.
4. Actual options aren't intuitive in how they work.
E.g. Random with same seed is not random. Files are sorted alphabetically (though this isn't mentioned anywher).
:global
order strategy probably does some magic, depending on whether individual examples have an:order
tag or not. Strategy is reused for sorting both groups and their examples, except by default. (files are alphabetical, examples are randomized).5. Bad practice supported by default.
It seems like the basic theme behind ordering is to avoid having order-dependent examples. This feels like more of a half-baked CI feature - as opposed to running each example in isolation. The drawback of fully isolated examples (e.g. one per RSpec process) is: they're extremely slow. But on a CI, a fast build time is less of a priority than avoiding bugs.
The only "order related" issues I've ever run into were with:
require
.The first needs threads to be joined/killed before the next tests runs, so ordering can't help here much, since it's not deterministic anyway. The second needs running each spec in isolation, for which random ordering also isn't a good replacement for.
I personally think the idea of randomizing tests if overemphasized so much, it's been only detrimental with little pay-off. Rerunning tests simply in normal and then reverse order would likely catch even more issues than a "random" set.
So I'd argue that "reverse" would actually be more useful in practice than "random".
6. Ordering gets in the way of workflows.
This is my biggest issue. A "workflow" can mean e.g. "running files in the order I want". E.g. I want to run just the unit test, then the functional test, then the acceptance test for one given feature/bug. I don't have the confidence I can achieve this quickly and without making mistakes. It's actually easier to just run 3 separate RSpec processes in sequence, which pretty much means ordering is useless. (Because there's no "ordering" if you have only one object).
Also, with randomizing, fail-fast will return a different result every time. Too many times I thought I fixed something, only to realize it's still buggy a few rspec runs later (once I've forgotten the context already). The "workaround" is to set a fixed seed in
.rspec-local
, but if you forget to remove it afterwards, it renders the order randomizing feature useless.It's like the only convenient "choice" I have is: "specific random order" and "random random order". Not very good for any intelligent/useful workflow.
7. Ordering strategy doesn't protect from removing/duplicating entries.
The idea of passing a list to a sorting strategy (and expecting a sorted list to be returned) seems flawed. It's should be sufficient to have a
compare
method that takes either two groups or two examples and returns the result of comparing, much like<=>
. Kind of like Ruby'ssort
works.This would make configuration simpler, because reverse alphabetical sorting of groups can be done by just switching arguments, e.g. when using
sort
:And the resulting array is guaranteed to have the same count of items as the input array. Meanwhile, accidentally modifying the given list can have all sorts of hard-to-debug side effects.
It's also much easier to reason about whether one example should come before another - than to work with partitions or use sort a whole list anyway with a custom compare method (lots more code in configuration).
And there's a lot less testing too (as opposed to having lists to work with).
Additionally, if examples define a custom order themselves, then the
compare
would be skipped for those, so custom:order
metadata would always work, regardless of the sorting strategy. (So the strategy wouldn't need to take those into account at all and tests would be safe from breakage).8. Reusing the same strategy for sorting examples vs groups.
Previously, these were separate and ordering become example-specific. This doesn't make sense semantically, because examples and groups are different types and potentially need different sorting strategies.
So with a custom strategy, it's easy to mess up one set (e.g. examples) while tweaking the sorting of another (e.g. groups). It heavily blurs the distinction between files, example groups and examples.
E.g. I'm less likely to have "acceptance examples" within a group. I'm more likely to have "acceptance test" defined on groups.
9. I understand SQL, but I don't understand RSpec.
It should be easier to relate one to the other. I mentioned in another post that RSpec is much like a database client app, because you can expect needs like those that can be expressed in pseudo-SQL, e.g:
There's no reason why selecting and filtering in RSpec shouldn't be as understandable, simple and flexible with the right API. A simple custom
compare
method (or block) with 2 arguments would handle ordering, and a simplematch?
method with 1 argument would handle filtering.There's also no reason why filtering couldn't be as generic as well. More filtering options could be handled with just one command-line option, e.g.:
Of course the goal isn't to implement SQL or Arel in RSpec, but just to allow an internal API where this can be implemented reliably and usefully if needed - with minimal trade-offs.
The text was updated successfully, but these errors were encountered: