Enhance performance of selecting tests using `--include` and `--exclude` #3579

vrpolakatcisco · 2020-05-22T16:10:39Z

Hello.

In FD.io CSIT project we are running many robotframework tests. Imagine single robot invocation selecting hundreds out of tens of thousands test cases. Due to both historic inertia and desire for clarity, we ended up with a scheme where we construct really long tag expressions. Basically each test case is selected by a separate --include option, each ANDing ~8 single tags.

The downside is in robot taking too long to figure out which tests are selected.
Here is an example console output [0] with al the gory details (using robotframework==3.1.2 and CPython3.6.9.final.0-64 in case tat matters).
See the time gap between
14:04:31 +++ robot
and
15:07:45 ===

In middle term we are working on a solution which uses --test instead of --include.
But I think in long run it would be good if robotframework was faster also when dealing with tags.

[0] https://logs.fd.io/production/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-skx/898/console-timestamp.log.gz

vrpolakatcisco · 2020-05-22T16:11:31Z

I have some ideas how to speed things up. I will upload a draft of a solution as a pull request, but I doubt I will have the skill (and time) to make it through the review process.

Employ memoization to speed up tag filtering. - This is a first draft. - Unit tests not edited at all. - Not sure about storing to self._memo everywhere. - Not sure if normalization is applied everywhere needed. + Normalization applied in most places. + Memo dictionary is passed from callers, created empty if needed. + Memo dictionary uses normalized pattern or tag as the key. + Multiple copies of the same pattern for SingleTagPattern do not duplicate matchers. + Each such matcher has memo dictionary to store (normalized) tag match results. + Calls to the same matcher with the same tag return the memoized result. - Matcher memo should probably have different name than result memo. Signed-off-by: Vratko Polak <vrpolak@cisco.com>

vrpolakatcisco · 2020-05-22T16:25:57Z

So I managed to create #3580
I forgot about markdown rendering of commit messages, so you do not see which items have plus sign (improvements) and which have minus sign (missing features or possible bugs).

vrpolakatcisco · 2020-05-22T16:27:29Z

My main question is whether it is an attempt that goes in a good direction, or some other approach is preferred.

pekkaklarck · 2020-06-05T16:32:02Z

I don't have time to look at this in more detail right now so only quick comments:

I'm really surprised that selecting tests is so slow that it causes problems.
It would be nice if you could explain the problem in a bit more detail so that we didn't need to open some external log file. Most importantly, how much time is actually spent for selecting tests?
If you already know where the performance issue is, please share that as well.
I took a quick look at Issue #3579: Incomplete draft of a solution #3580 and was surprised how much changes were needed. I hope we could come up with something simpler.
RF 3.2.2 is planned for early July so there's still plenty of time to get this fixed and included in it.

skhomuti · 2020-07-20T08:36:03Z

We have the same problem. It takes about a minute to select even one test by a unique tag. Choosing a test by name did not improve the situation in any way, even using the full test name.
However, if you choose tests for the suite, then the speed is instant. Now we use a compromise - combining --suitе and --test, in this case the search for a test quickly works out for a specific suite.

pekkaklarck · 2021-03-03T16:28:44Z

Any updates related to this @vrpolakatcisco? Based on the comment by @skhomuti others have encountered this issue so it would be nice to fix it. I just needed to know a bit about the situation to be able to reproduce it myself.

vrpolakatcisco · 2021-03-03T17:42:52Z

Ah, sorry for dropping this.
Basically, my attempt at the fix did not work at all, I got busy with other tasks, and this never got high enough in my todo list again.

Now we use a compromise - combining --suitе and --test

In our case the --test was sufficient enough (although we have one --suite matching a whole tree) for us to start looking elsewhere when trying to speed up the overall job time.

to be able to reproduce it myself.

I created quick&dirty reproducer: https://gerrit.fd.io/r/c/csit/+/31492

Hopefully it is self-explanatory enough.
It makes robot think for almost 4 minutes (on my machine) before realizing no test case matches.
I can also create a version that takes closer to an hour (and actually matches some test cases).

pekkaklarck · 2023-05-31T22:19:03Z

I adapted the example at https://gerrit.fd.io/r/c/csit/+/31492 so that I could run it on my machine and it took only 0.3 seconds. Could someone create a generic example demonstrating the issue I could try running? If this isn't anymore a problem for anyone, we can also just close this issue.

vrpolakatcisco · 2023-06-01T12:37:30Z

The original example is still slow for me, although the slowness is less severe (better hardware, newer Python ad robotframework versions). Less severe means 40-50 seconds.

Anyway, I created a more scalable reproducer with synthetic suites (no git clone needed), still https://gerrit.fd.io/r/c/csit/+/31492 but now patch set #4.
The script has two arguments, one controls the number of tests generated, the other controls the number of --include options.

For 16k test cases (one test case per suite file, plus the same amount of empty init.robot files in case that matters), in my setup robot takes around 10 seconds to go through suites, plus roughly 10 seconds per 40 --include options.

Worst case for tests in our current production would be selecting 300 test cases (out of 400k possibilities) using more complicated tag expressions.

pekkaklarck · 2023-06-01T20:55:30Z

The new script is nice, but it could have had a bit better usage instructions. I initially tried using it with values 200 100, but generating all the needed files took too long. After some trial and error I was able to get meaningful results with 13 80 that generated ~16k tests. Because the issue is in filtering tests, I also believe the script didn't need to create that many individual files. Same amount of tests in a smaller number of files would likely mean faster generation and parsing time. Finally, having a way to generate the robot command to use without needing to recreate tests would be convenient.

After generating tests and the robot command with values 13 80, I run the command and it took ~20s altogether. Some debugging revealed that parsing took ~7s so selecting tests to be run took ~13s. That's a pretty long time, but there were lot of tests and altogether 80 --include options each with a somewhat complicated AND pattern. The overall code that's responsible on handling these tag patterns is pretty good, but I noticed one issue.

Our tags are case, space and underscore insensitive, and matching them requires normalizing both patterns and tags. At the moment normalization is done each time tags are matched, which means that if there are multiple of patterns normalization is done multiple times. I changed the code so that normalization was done only once and that dropped the execution time to ~12s. That means selecting tests now took ~5s which is quite a bit faster than ~13s earlier.

The change explained above is relatively simple and all our acceptance tests pass with it. There's thus no reason not to commit it. Selecting tests still takes some time if there are lot of tests and lot of patterns, but I consider this enhancement good enough to close this issue.

pekkaklarck · 2023-06-01T21:04:34Z

When there are lot of tests, one way to enhance performance is using the new functionality to convert the data to JSON (#3902). For example, the t directory created by the script can be converted to a t.rbt file with JSON data like this:

from robot.running import TestSuite

suite = TestSuite.from_file_system('t')
suite.to_json('t.rbt')

Running that t.rbt file instead of the t directory dropped the overall execution time to ~7.5s.

Tries to improves upon robotframework#3579 but makes it worse instead.

vrpolakatcisco · 2023-06-02T18:03:26Z

The new script is nice, but it could have had a bit better usage instructions.

Sorry about that.

I changed the code so that normalization was done only once ...
... closed this as completed in 704bcf6

I confirm both old and new reproducer now spend roughly half the previous time on tag filtering.

The overall code that's responsible on handling these tag patterns is pretty good

I also confirm that.
I tried to add caching to Matcher.match but it made the process slower in all reproducers.
The best I got is [2], still around 10% slower for the older reproducer.
I guess my implementation CPU overhead is already making it slower than compiled regexps, not to mention what happens if I let cache grow to a larger size.

After some thinking I realized an approach based on --test will always be significantly faster than any approach based on --include. You cannot beat one regexp match per testcase and --test item, when --include needs at least one regexp per testcase and test tag and --include item (not to mention ANDed subitems).
Therefore big scale users will be advised to start using --test regardless of how fast --include gets.

[2] c375381

pekkaklarck mentioned this issue May 31, 2023

Issue #3579: Incomplete draft of a solution #3580

Closed

pekkaklarck changed the title ~~Filtering tests by tags is not fast enough~~ Enhance performance of selecting tests using --include and --exclude Jun 1, 2023

pekkaklarck added this to the v6.1 milestone Jun 1, 2023

pekkaklarck added enhancement priority: medium effort: small labels Jun 1, 2023

pekkaklarck closed this as completed in 704bcf6 Jun 1, 2023

vrpolakatcisco added a commit to vrpolakatcisco/robotframework that referenced this issue Jun 2, 2023

Add global caching to Matcher.match

c375381

Tries to improves upon robotframework#3579 but makes it worse instead.

vrpolakatcisco mentioned this issue Jun 2, 2023

Add global caching to Matcher.match #4782

Closed

pekkaklarck added the rc 1 label Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance performance of selecting tests using `--include` and `--exclude` #3579

Enhance performance of selecting tests using `--include` and `--exclude` #3579

vrpolakatcisco commented May 22, 2020

vrpolakatcisco commented May 22, 2020

vrpolakatcisco commented May 22, 2020

vrpolakatcisco commented May 22, 2020

pekkaklarck commented Jun 5, 2020

skhomuti commented Jul 20, 2020

pekkaklarck commented Mar 3, 2021

vrpolakatcisco commented Mar 3, 2021

pekkaklarck commented May 31, 2023

vrpolakatcisco commented Jun 1, 2023

pekkaklarck commented Jun 1, 2023

pekkaklarck commented Jun 1, 2023

vrpolakatcisco commented Jun 2, 2023

Enhance performance of selecting tests using --include and --exclude #3579

Enhance performance of selecting tests using --include and --exclude #3579

Comments

vrpolakatcisco commented May 22, 2020

vrpolakatcisco commented May 22, 2020

vrpolakatcisco commented May 22, 2020

vrpolakatcisco commented May 22, 2020

pekkaklarck commented Jun 5, 2020

skhomuti commented Jul 20, 2020

pekkaklarck commented Mar 3, 2021

vrpolakatcisco commented Mar 3, 2021

pekkaklarck commented May 31, 2023

vrpolakatcisco commented Jun 1, 2023

pekkaklarck commented Jun 1, 2023

pekkaklarck commented Jun 1, 2023

vrpolakatcisco commented Jun 2, 2023

Enhance performance of selecting tests using `--include` and `--exclude` #3579

Enhance performance of selecting tests using `--include` and `--exclude` #3579