Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression when using Run Keyword and keyword name contains a variable #4659

Closed
dinualx opened this issue Feb 15, 2023 · 17 comments
Labels
acknowledge To be acknowledged in release notes alpha 1 bug priority: high
Milestone

Comments

@dinualx
Copy link

dinualx commented Feb 15, 2023

We are using RF 3.2.2 in production and we have decided to upgrade to the latest RF, 6.0.2.
But we encountered the situation that the execution time for our test cases increased extremely.

We started to downgrade incrementally until we reached RF 5.0.1, with which we no longer had this problem. So, starting with RF 6.0, this problem appears. With RF 5.0.1 the results are similar to what we have in production, RF 3.2.2, there are no differences.

As a concrete example, for a certain testSetup keyword, in production (RF 3.2.2) it takes about 1 minute, and with RF>6.0 it takes 14 minutes. In order to isolate the problem and see why this increase in time execution occurs, we moved all the keywords from the robot resource files that we were using (2 resource files), inside the test suite using the respective keywords without importing the robot resource files. In this way, the running time of this keyword decreased from 14 minutes to approximately 7 minutes. It's an obvious decrease, but it still hasn't reached the value we have in production, of 1 minute. We think that there's still another problem here and if we'll manage to isolate it further we will log a separate bug for it.

I added a screenshot showing these differences. We removed from the report the names of the respective keywords specific to our project for reasons of confidentiality.

comparison

@pekkaklarck
Copy link
Member

This looks like a rather serious performance regression. Because nobody else has reported such a problem, there likely is something special with your tests, libraries or environment.

Can you create a simple example demonstrating the problem that we could run locally? Without such an example, it's pretty much impossible for us to investigate this.

@dinualx
Copy link
Author

dinualx commented Feb 16, 2023

The problem with the increased time seems to occur when we are performing operations which have as purpose adding the necessary elements for our config in the namespace. To populate the necessary objects we use some information stored in a dictionary, we create an instance in robot for a certain object, we populate that instance with all its properties using multiple python methods and we do this for a lot of elements that are needed in the test configuration.

To call these python methods and be able to apply them efficiently, we use different keywords with several arguments, and under these keywords it is necessary to use the run keyword syntax, they cannot be called otherwise. These keywords are placed in robot resource files which are imported into the test suites. Almost all these keywords are placed under one big keyword that includes them all, that of testSetup.

Keyword example of adding properties to an instance:

SetProperties
    [Arguments]  ${non dict property1}  ${name}  ${non dict property2}  ${propertyDict}  
    Run Keyword  ${name}.setmethod1 ${ non_dict_property1}
    Run Keyword  ${name}.setmethod2  ${propertyDict['${name}']}
    Run Keyword  ${name}.setmethod3  ${propertyDict['${name}']}
    Run Keyword  ${name}.setmethod4  ${ non_dict_property2}
    Run Keyword  ${name}.setmethod5  ${propertyDict['${name}']}
    Run Keyword  ${name}.setmethod6  ${propertyDict['${name}']}

All these methods called with the run keyword are affected by the increase in execution time. I was saying that the keyword with which we do the setup for most tests, and which adds in the namespace everything necessary for the test configuration takes 1 minute in production, after upgrading to RF >6.0 it takes 14 minutes. I tried the option not to import resource robot files and it decreased from 14 minutes to 7 minutes, still a very big difference. We suspected that it is because we use this syntax of the run keyword to call the python methods. To confirm this, we have tried to create each instance individually and populate it each one with the related properties without using the run keyword to see if we reach the execution time we have in production (avoiding variable substitution).
So, continuing to not import any resource file, we did what was done under a certain keyword that populated the instance of an object with several properties without using the run keyword syntax and calling directly the python methods for an object in the test setup.

The results are as follows:

  • in production it takes 0.122 seconds,
  • in the environment where we did not make any changes to the framework but only after upgrading to RF>6.0 it takes 11.523 seconds,
  • in the environment where we no longer import any resource file but use the run keyword syntax, it takes around 2.5~2.8 seconds
  • in the environment where we no longer import any resource file and don’t use the run keyword syntax, it takes around 0.948 seconds

What are your thoughts on this so far? Do you spot anything that might lead to this problem and that we might improve?

Can we possibly make a call to show you how we use what I explained above?

@pekkaklarck
Copy link
Member

pekkaklarck commented Feb 16, 2023

There clearly has been some change between RF 5 and RF 6 that affects your particular setup. Unfortunately I don't have any idea what that change could be. Anyway, I believe your measurements and there's no reason to have a call for you to show it.

As I wrote earlier, it would be easiest for me to investigate this if you could create an example I could run locally. I could then debug it and also use git bisect to find out which commit caused the issue. If you cannot provide such an example, you can also use git bisect with your real tests and tell what commit is the culprit.

@pekkaklarck
Copy link
Member

If you don't understand why I'd need that example, or how you could create it, see http://www.sscce.org/.

@yahman72
Copy link
Contributor

my two cents here = possibly not related, but could be:
We had similar issue a while back - don't remember if noticed it with RF4.x or RF5.x.

The root cause was the amount of robot variables e.g. one suite had a variable file with ~10K variables and another one had ~2K variables defined. When running the suite with 10K variables variable operations were much slower, e.g. Set Test Variable took ~10x longer to execute than with the suite with 2K variables.

For example we had a test case that was executing in 40s - after defining additional 10K dummy variables (they were just imported, not used) the same test case execution took 120s

@dinualx
Copy link
Author

dinualx commented Feb 20, 2023

We made a small sample project similar to our production project. We ran it with RF 5.0.1, 6.0 and 6.0.2. There is a difference starting with RF 6.0 in the results, but we did not manage to obtain the big differences that appear in our production project. We believe this because our production project is very complex and this small sample project is just a small piece and does not illustrate the complexity.

RF_5 0 1

RF_6 0

RF_6 0 2

We think that even these small differences obtained with the sample project would help in debugging. I will attach the archive with the project and the working environment is this one:

robotbackgroundlogger 1.2
robotframework 5.0.1, 6.0, 6.0.2
robotframework-debuglibrary 2.2.1
robotframework-lint 1.1
robotframework-notifications 1.1.3
robotframework-pythonlibcore 4.0.0
robotframework-reportportal 5.0.4
robotframework-seleniumlibrary 6.0.0
robotframework-sshlibrary 3.8.0
robotframework-winrmlibrary 2.0.0
robotframeworklexer 1.1
Python 3.8.10
Ubuntu 20.04.5 LTS

The project is the .zip archive named "Root_repository folder"
We hope that this sample project will help with debugging.

Root_repository folder.zip

@pekkaklarck
Copy link
Member

Based on the examples above, the issue seems to be somehow related to Run Keyword. Is there a reason you use it so much? Especially usages like

Run Keyword    ${name}.keyword    arg

look strange to me because typically you could use just

Keyword    arg

and would avoid all overhead caused by Run Keyword altogether. Anyway, the performance degradation looks real and worth fixing.

I looked at the provided zip containing an example, but it requires a pretty complicated setup and I cannot run it locally directly. Could you please make a minimal example demonstrating the issue? At least remove all libraries that don't affect the performance so that I don't need to spend time installing them. Ideally the example should be just a single test case file, but if the issue only occurs when resource files or certain libraries are used, they should obviously be included.

@dinualx
Copy link
Author

dinualx commented Feb 24, 2023

Well, why do we use the run keyword? the short answer is because that's the only way to call those methods for the robot objects, when these methods that populate the instance of the respective robot object are under a keyword. So, these are not keywords, these are methods from the python libraries that we call for certain robot objects.

Below is the example I am talking about:

image

Why do we do this? Well, let's say we have 100 objects that we need to instantiate and populate their properties using these python methods for each one, it could be done without using the run keyword syntax, only if we don't do all these instantiations and populate the respective instances under a large keyword, and we do it in each separate test, in the following way exemplified below:

image

And so on to populate all 100 instances of the objects.

But in this way we would end up repeating the same thing in each test, instead of doing it inside a big keyword like testSetup, applied at the level of each test, and at the same time we would end up having a very large amount of text inside each TC.

So, when we want to call these methods under a keyword, with parameters to be more efficient, the syntax is

image

Otherwise it doesn't work, if we try use under that keyword

image

there will be an error that the respective keyword (robot_instance_object.python_method1) is not found

Regarding the sample project file, I will try to resend a version as light as possible.
I hope this explanation helps to understand why we have to use the run keyword syntax.

@dinualx
Copy link
Author

dinualx commented Feb 27, 2023

I have attached a light sample project.

I have removed from the imported libraries into the RobotUtils.robot file, and only these remained:

*** Settings ***
Library ./Root_repository/ProjectRF/frwk/car.py
Library ./Root_repository/ProjectRF/frwk/parking.py
Variables ./Root_repository/ProjectRF/PoolDefs/parking_cars.py

Root_repository_light_zip.zip

Regarding the environment, it is under Ubuntu 20.04.5 LTS with Python 3.8.10.

All the robot framework related dependencies I suppose they come together with robot framework version, I gave this info in a previous comment so that it is known that these are the ones installed in the environment. This project doesn't need to use robotframework-seleniumlibrary since is not a UI scenario.

robotbackgroundlogger 1.2
robotframework-debuglibrary 2.2.1
robotframework-lint 1.1
robotframework-notifications 1.1.3
robotframework-pythonlibcore 4.0.0
robotframework-reportportal 5.0.4
robotframework-seleniumlibrary 6.0.0
robotframework-sshlibrary 3.8.0
robotframework-winrmlibrary 2.0.0
robotframeworklexer 1.1

If there is any info I can provide or any ambiguity that I can help regarding the sample project or the environment, hope to clarify them further.

@pekkaklarck
Copy link
Member

I tried executing the provided example but library and resource imports failed. Did you test locally that you can run the example? Are you sure all library imports in the example are relevant to the problem? If they aren't, please remove them. See http://www.sscce.org for more info related to creating a good example.

@dinualx
Copy link
Author

dinualx commented Feb 28, 2023

Sorry, I didn't run it locally, I saw after your observation that the structure was not ok to be able to run it locally. I modified it now, I ran it locally too, it should work now.

RFProject.zip

@pekkaklarck
Copy link
Member

Thanks, this time I was able to run the example and reproduce the issue. It took ~19ms to run the SetCarProps keyword with the latest code while it took only ~6ms with RF 5.0. git bisect revealed that the commit causing the slowdown is 8301773 that fixed #1595, and looking at the commit message I can see that it certainly can have an effect on performance. That's a pretty good fix and not something we'd like to revert, but I'll investigate could performance be enhanced otherwise.

@pekkaklarck
Copy link
Member

pekkaklarck commented Feb 28, 2023

I have some good and bad news. Let's start from the bad ones.

It seems fixing this performance regression isn't that easy. The root cause of #1595 was that Run Keyword unnecessarily resolved variables when using embedded arguments. To avoid that, Run Keyword nowadays checks does the used keyword accept embedded arguments and that requires first finding the keyword. If no keyword matches the used name or the matching keyword doesn't accept embedded arguments, variables are resolved and then there's another attempt to find a matching keyword. In these cases the keyword is thus searched twice which obviously is slower than searching it only once like earlier.

To avoid unnecessary extra keyword search, the first search is skipped if the name doesn't contain a variable. In your case the name is something like ${name}.setParking, which always initiates two searches.

Unfortunately I don't see any easy way to avoid the initial search in your case. We could somehow explicitly tell Run Keyword that the used keyword cannot accept embedded arguments, but that would require changing the keyword signature which would be backwards incompatible. We could also introduce a new keyword for this purpose but I consider this a too rare use case.

I took a quick look to see could keyword searching be made faster. That would have benefit that it would make overall execution faster as well, but the little tricks I tried didn't seem to have any measurable effect. I may commit them anyway.

As you can see, it's unlikely performance can be enhanced, but I wrote that there are also some good news. The good news is that it seems to be easy to avoid using Run Keyword which avoids all overhead it causes. The trick is to use the Set Library Search Order keyword so that instead of

*** Keywords ***
SetCarProps
    [Arguments]  ${parkingObj}  ${name}  ${mode}  ${carDict}
    Run Keyword    ${name}.setParking    ${parkingObj}
    Run Keyword    ${name}.setCarName    ${name}
    Run Keyword    ${name}.setCarDefaultName    ${carDict['${name}']}

you use

*** Keywords ***
Set Car Props
    [Arguments]  ${parkingObj}  ${name}  ${mode}  ${carDict}
    Set Library Search Order    ${name}
    Set Parking    ${parkingObj}
    Set Car Name    ${name}
    Set Car Default Name    ${carDict}[${name}]

If you need to restore the original search order, you can accomplish it as shown below. You could also set the search order already earlier, before calling the Set Car Props keyword.

*** Keywords ***
Set Car Props
    [Arguments]  ${parkingObj}  ${name}  ${mode}  ${carDict}
    ${orig order}=    Set Library Search Order    ${name}
    Set Parking    ${parkingObj}
    Set Car Name    ${name}
    Set Car Default Name    ${carDict}[${name}]
    [Teardown]    Set Library Search Order    ${orig order}

I tested this approach and execution speed was on the same level as with RF 5.0. It ought to actually be a bit faster because all Run Keyword overhead is avoided, but that difference can probably only been seen with bigger data sets.

@dinualx
Copy link
Author

dinualx commented Mar 1, 2023

Thank you for investigating this, I also tried in RF 6.0.2 the workaround with setting the library search order and I had similar times to RF 5.0.1 without this workaround. But I used only this small sample project for testing this.
I will be able to give an update on the situation for our project after I make this workaround for it.
I am sure that the execution times will improve for our project, it would be great if this workaround would fix all the delay including the time difference that appears if we call the keywords from a robot resource file and not directly from the test suite, I will come back with an update on this.

pekkaklarck added a commit that referenced this issue Mar 1, 2023
The motivaiton was trying to make finding keyword handlers a bit
faster to address #4659. Didn't seem to have any measurable effect,
but I consider this code better anyway.
yanne pushed a commit that referenced this issue Mar 18, 2023
The motivaiton was trying to make finding keyword handlers a bit
faster to address #4659. Didn't seem to have any measurable effect,
but I consider this code better anyway.
@pekkaklarck
Copy link
Member

Great news, @totu had debugged this issue further and found a great fix getting the performance nearly to RF 5.0 level. As I had analyzed earlier, the issues was caused by a fix for #1595 that basically requires finding the keyword to be run twice. I though that alone slowed down execution this much, but @totu figured out that time is actually spent more in error reporting if the first attempt to find a matching keyword fails. More precisely, time was spend in creating a nice error report with recommendations to keywords that have a name close to the one used. These recommendations were never shown to anyone so all that work was wasted. PR #4899 disables recommendations in this case and makes execution a lot faster.

@pekkaklarck pekkaklarck added this to the v7.0 milestone Oct 13, 2023
@pekkaklarck pekkaklarck added bug priority: high acknowledge To be acknowledged in release notes labels Oct 13, 2023
@pekkaklarck pekkaklarck changed the title Starting with RF 6.0 massively increased execution time for the case of importing the robot resource file in the test suite Performance regression when using Run Keyword and keyword name contains a variable Oct 13, 2023
@pekkaklarck
Copy link
Member

pekkaklarck commented Oct 13, 2023

I tested the fix with the following simple example. Before the fix it run in ~1.4s on my machine and the fix took it down to ~0.8s. It runs in ~0.7s with RF 5.0 so there's still a measurable difference, but the difference isn't as massive as it used to be.

*** Variables ***
${KW}     No Operation

*** Test Cases ***
Example
    FOR    ${i}    IN RANGE    1000
        Run Keyword    ${KW}
    END

If I change Run Keyword ${KW} to No Operation i.e. avoid Run Keyword altogether, execution time drops to ~0.5s. As we can see, there's some overhead with Run Keyword. In addition to extra work finding keywords (including Run Keyword itself), it causes twice as many <kw> elements to be written to output.xml which also slows down generating the log file. Using library search order instead of Run Keyword ${lib}.Keyword Name as I recommended earlier is typically a good idea. Anyway, it's awesome that Run Keyword isn't as slow as it used to be!

@dinualx
Copy link
Author

dinualx commented Oct 16, 2023

It seems that the root cause has been fixed.
Time execution is close to the 5.0.1 level, as you said.
We ran a test suite with RF 6.1.1 to see if the problem still occurs and there isn't that big increase in time execution, compared to RF 5.0.1.
But it's like you said, more than 5.0.1 in time execution, but close.
And to draw a conclusion for our project we need to run several suites or the entire regression and we'll see the impact in 6.1.1.
I'll leave some results that we have for this suite available at the moment, some are for the last 5 releases, a comparison between what we have in production - RF 3.2.2 (we haven't upgraded yet :), big project...), RF 6.1.1 - which has the fix included, RF 5.0.1 - the one we decided to upgrade to and RF 6.0 - the one from which the bug started to happen.
Results are in minutes and seconds.

TEST SUITE1 (one test case included) - RF 3.2.2

  • RELEASE1 - 14:20
  • RELEASE2 - 12:51
  • RELEASE3 - 13:26
  • RELEASE4 - 13:51
  • RELEASE5 - 13:45
    average = 13.64 minutes

TEST SUITE1 (one test case included) - RF 5.0.1

  • RELEASE1 - 13:01

TEST SUITE1 (one test case included) - RF 6.0 ( RF version from where bug started to happen)

  • RELEASE1 - 23:02

TEST SUITE1 (one test case included) - RF 6.1.1 (fix included)

  • RELEASE1 - 14:03
  • RELEASE2 - 13:55
  • RELEASE3 - 14:05
  • RELEASE4 - 14:10
  • RELEASE5 - 14:08
    average = 14.07 minutes

The results speak for themselves and it is obvious that the problem has been fixed, which is wonderful news :). Good work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge To be acknowledged in release notes alpha 1 bug priority: high
Projects
None yet
Development

No branches or pull requests

3 participants