Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance performance of selecting tests using --include and --exclude #3579

Closed
vrpolakatcisco opened this issue May 22, 2020 · 12 comments
Closed

Comments

@vrpolakatcisco
Copy link

Hello.

In FD.io CSIT project we are running many robotframework tests. Imagine single robot invocation selecting hundreds out of tens of thousands test cases. Due to both historic inertia and desire for clarity, we ended up with a scheme where we construct really long tag expressions. Basically each test case is selected by a separate --include option, each ANDing ~8 single tags.

The downside is in robot taking too long to figure out which tests are selected.
Here is an example console output [0] with al the gory details (using robotframework==3.1.2 and CPython3.6.9.final.0-64 in case tat matters).
See the time gap between
14:04:31 +++ robot
and
15:07:45 ===

In middle term we are working on a solution which uses --test instead of --include.
But I think in long run it would be good if robotframework was faster also when dealing with tags.

[0] https://logs.fd.io/production/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-skx/898/console-timestamp.log.gz

@vrpolakatcisco
Copy link
Author

I have some ideas how to speed things up. I will upload a draft of a solution as a pull request, but I doubt I will have the skill (and time) to make it through the review process.

vrpolakatcisco added a commit to vrpolakatcisco/robotframework that referenced this issue May 22, 2020
Employ memoization to speed up tag filtering.

- This is a first draft.
- Unit tests not edited at all.
- Not sure about storing to self._memo everywhere.
- Not sure if normalization is applied everywhere needed.
+ Normalization applied in most places.
+ Memo dictionary is passed from callers, created empty if needed.
+ Memo dictionary uses normalized pattern or tag as the key.
+ Multiple copies of the same pattern for SingleTagPattern do not duplicate matchers.
+ Each such matcher has memo dictionary to store (normalized) tag match results.
+ Calls to the same matcher with the same tag return the memoized result.
- Matcher memo should probably have different name than result memo.

Signed-off-by: Vratko Polak <vrpolak@cisco.com>
@vrpolakatcisco
Copy link
Author

So I managed to create #3580
I forgot about markdown rendering of commit messages, so you do not see which items have plus sign (improvements) and which have minus sign (missing features or possible bugs).

@vrpolakatcisco
Copy link
Author

My main question is whether it is an attempt that goes in a good direction, or some other approach is preferred.

@pekkaklarck
Copy link
Member

I don't have time to look at this in more detail right now so only quick comments:

  • I'm really surprised that selecting tests is so slow that it causes problems.
  • It would be nice if you could explain the problem in a bit more detail so that we didn't need to open some external log file. Most importantly, how much time is actually spent for selecting tests?
  • If you already know where the performance issue is, please share that as well.
  • I took a quick look at Issue #3579: Incomplete draft of a solution #3580 and was surprised how much changes were needed. I hope we could come up with something simpler.
  • RF 3.2.2 is planned for early July so there's still plenty of time to get this fixed and included in it.

@skhomuti
Copy link

We have the same problem. It takes about a minute to select even one test by a unique tag. Choosing a test by name did not improve the situation in any way, even using the full test name.
However, if you choose tests for the suite, then the speed is instant. Now we use a compromise - combining --suitе and --test, in this case the search for a test quickly works out for a specific suite.

@pekkaklarck
Copy link
Member

Any updates related to this @vrpolakatcisco? Based on the comment by @skhomuti others have encountered this issue so it would be nice to fix it. I just needed to know a bit about the situation to be able to reproduce it myself.

@vrpolakatcisco
Copy link
Author

Ah, sorry for dropping this.
Basically, my attempt at the fix did not work at all, I got busy with other tasks, and this never got high enough in my todo list again.

Now we use a compromise - combining --suitе and --test

In our case the --test was sufficient enough (although we have one --suite matching a whole tree) for us to start looking elsewhere when trying to speed up the overall job time.

to be able to reproduce it myself.

I created quick&dirty reproducer: https://gerrit.fd.io/r/c/csit/+/31492

Hopefully it is self-explanatory enough.
It makes robot think for almost 4 minutes (on my machine) before realizing no test case matches.
I can also create a version that takes closer to an hour (and actually matches some test cases).

@pekkaklarck
Copy link
Member

I adapted the example at https://gerrit.fd.io/r/c/csit/+/31492 so that I could run it on my machine and it took only 0.3 seconds. Could someone create a generic example demonstrating the issue I could try running? If this isn't anymore a problem for anyone, we can also just close this issue.

@vrpolakatcisco
Copy link
Author

The original example is still slow for me, although the slowness is less severe (better hardware, newer Python ad robotframework versions). Less severe means 40-50 seconds.

Anyway, I created a more scalable reproducer with synthetic suites (no git clone needed), still https://gerrit.fd.io/r/c/csit/+/31492 but now patch set #4.
The script has two arguments, one controls the number of tests generated, the other controls the number of --include options.

For 16k test cases (one test case per suite file, plus the same amount of empty init.robot files in case that matters), in my setup robot takes around 10 seconds to go through suites, plus roughly 10 seconds per 40 --include options.

Worst case for tests in our current production would be selecting 300 test cases (out of 400k possibilities) using more complicated tag expressions.

@pekkaklarck
Copy link
Member

The new script is nice, but it could have had a bit better usage instructions. I initially tried using it with values 200 100, but generating all the needed files took too long. After some trial and error I was able to get meaningful results with 13 80 that generated ~16k tests. Because the issue is in filtering tests, I also believe the script didn't need to create that many individual files. Same amount of tests in a smaller number of files would likely mean faster generation and parsing time. Finally, having a way to generate the robot command to use without needing to recreate tests would be convenient.

After generating tests and the robot command with values 13 80, I run the command and it took ~20s altogether. Some debugging revealed that parsing took ~7s so selecting tests to be run took ~13s. That's a pretty long time, but there were lot of tests and altogether 80 --include options each with a somewhat complicated AND pattern. The overall code that's responsible on handling these tag patterns is pretty good, but I noticed one issue.

Our tags are case, space and underscore insensitive, and matching them requires normalizing both patterns and tags. At the moment normalization is done each time tags are matched, which means that if there are multiple of patterns normalization is done multiple times. I changed the code so that normalization was done only once and that dropped the execution time to ~12s. That means selecting tests now took ~5s which is quite a bit faster than ~13s earlier.

The change explained above is relatively simple and all our acceptance tests pass with it. There's thus no reason not to commit it. Selecting tests still takes some time if there are lot of tests and lot of patterns, but I consider this enhancement good enough to close this issue.

@pekkaklarck pekkaklarck changed the title Filtering tests by tags is not fast enough Enhance performance of selecting tests using --include and --exclude Jun 1, 2023
@pekkaklarck pekkaklarck added this to the v6.1 milestone Jun 1, 2023
@pekkaklarck
Copy link
Member

When there are lot of tests, one way to enhance performance is using the new functionality to convert the data to JSON (#3902). For example, the t directory created by the script can be converted to a t.rbt file with JSON data like this:

from robot.running import TestSuite

suite = TestSuite.from_file_system('t')
suite.to_json('t.rbt')

Running that t.rbt file instead of the t directory dropped the overall execution time to ~7.5s.

vrpolakatcisco added a commit to vrpolakatcisco/robotframework that referenced this issue Jun 2, 2023
Tries to improves upon robotframework#3579 but makes it worse instead.
@vrpolakatcisco
Copy link
Author

The new script is nice, but it could have had a bit better usage instructions.

Sorry about that.

I changed the code so that normalization was done only once ...
... closed this as completed in 704bcf6

I confirm both old and new reproducer now spend roughly half the previous time on tag filtering.

The overall code that's responsible on handling these tag patterns is pretty good

I also confirm that.
I tried to add caching to Matcher.match but it made the process slower in all reproducers.
The best I got is [2], still around 10% slower for the older reproducer.
I guess my implementation CPU overhead is already making it slower than compiled regexps, not to mention what happens if I let cache grow to a larger size.

After some thinking I realized an approach based on --test will always be significantly faster than any approach based on --include. You cannot beat one regexp match per testcase and --test item, when --include needs at least one regexp per testcase and test tag and --include item (not to mention ANDed subitems).
Therefore big scale users will be advised to start using --test regardless of how fast --include gets.

[2] c375381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants