Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sporadic errors in test_results due to tutorial popup #10435

Merged
merged 1 commit into from Jun 19, 2020

Conversation

asdil12
Copy link
Member

@asdil12 asdil12 commented Jun 3, 2020

@okurz
Copy link
Member

okurz commented Jun 3, 2020

If you want to fix sporadic results I recommend you test more than one run – and the one you referenced failed anyway.

Could we please try to find a fix that is not going back to the old way of checking screen multiple times and relying on timing behaviour?

@okurz
Copy link
Member

okurz commented Jun 3, 2020

https://openqa.opensuse.org/tests/1285301#step/test_results/2 shows that "All Tests" is clicked but in the same screen "dont-notify-me" is detected. Either we add a wait_still_screen there or – probably better – include the "reload" button from the browser bar or the favicon area in the needle to not match while a screen is loading. So likely the problem can be fixed without any code changes, just proper needles

@asdil12 asdil12 force-pushed the go_away_tutorial branch 2 times, most recently from edb4d2b to c08cfa8 Compare June 4, 2020 10:16
@asdil12
Copy link
Member Author

asdil12 commented Jun 4, 2020

Detecting the browser loading indicatior doesn't sound very clean to me and could easily break.
I would rather waste the worker's time than the time of a human reviewer.

@okurz
Copy link
Member

okurz commented Jun 4, 2020

I would rather waste the worker's time than the time of a human reviewer.

No doubt about that. I am not sure your approach will be more stable though. But it's ok this way and we can try

LGTM

@asdil12
Copy link
Member Author

asdil12 commented Jun 4, 2020

@asdil12
Copy link
Member Author

asdil12 commented Jun 4, 2020

It looks much better now (I also fixed some needle problem).

@asdil12 asdil12 requested a review from foursixnine June 5, 2020 08:46
@@ -38,6 +38,12 @@ sub upload_autoinst_log {

sub handle_notify_popup {
return undef unless match_has_tag 'openqa-dont-notify-me';
for my $i (1 .. 5) {
assert_and_click 'openqa-dont-notify-me';
if (check_screen('openqa-tutorial-confirm')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check screen also for openqa-dont-notify-toggled, and if so, click the confirm? https://openqa.opensuse.org/tests/1288011#step/test_results/7

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confirm button is only visible when the checkbox is already toggled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but in the test that I reference, the button is shown already :P, and openQA is looking for the unchecked box...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - seems that check_screen doesn't have any timeout which leads to thes race condition.
Adding a timeout should do it.

Copy link
Member

@okurz okurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was fine with the check_screen without timeout but now again you add back a rather inefficient approach wasting lot of time waiting for nothing to happen. Please use proper needles which exclude each other and multi-tag assert_screen calls

@asdil12
Copy link
Member Author

asdil12 commented Jun 5, 2020

I was fine with the check_screen without timeout but now again you add back a rather inefficient approach wasting lot of time waiting for nothing to happen. Please use proper needles which exclude each other and multi-tag assert_screen calls

That might still lead to a race condition if the screen is not updated fast enough (which is exactly the reason of the current race condition).
Waiting a bit is exactly what needs to be done here and I honestly don't want to discuss more than 30 seconds about 30 seconds of waiting on a worker.

@okurz
Copy link
Member

okurz commented Jun 5, 2020

It's not just 30s if you consider TIMEOUT_SCALE and a for loop doing up to 5x check_screen. And there can be cases where even a waiting time of 30s is not enough, if the server is a bit slow to respond so to put it to extreme you would need to extend the waiting time even more which is not what I suggest to do. I would be ok if you put in a wait_still_screen call which I have proposed. Your approach shows bad practice in general software development: When not knowing what exactly to wait for you just wait (and waste) an arbitrary time which sums up and is also seen in other parts of this repo. If you want to invest your time discussing it or not is your choice. I already gave specific suggestions in code and regarding needle choices what to do.

@asdil12
Copy link
Member Author

asdil12 commented Jun 5, 2020

It's not just 30s if you consider TIMEOUT_SCALE and a for loop doing up to 5x check_screen. And there can be cases where even a waiting time of 30s is not enough, if the server is a bit slow to respond so to put it to extreme you would need to extend the waiting time even more which is not what I suggest to do. I would be ok if you put in a wait_still_screen call which I have proposed. Your approach shows bad practice in general software development: When not knowing what exactly to wait for you just wait (and waste) an arbitrary time which sums up and is also seen in other parts of this repo. If you want to invest your time discussing it or not is your choice. I already gave specific suggestions in code and regarding needle choices what to do.

I don't see any advantage in wait_still_screen as it would just waste time as well and I would still need to check for the needle afterwards.
As only ~5% of my verification runs even ran into this race condition, only in these cases there would even be a larger abount of the available timeout actually consumed.
Also the server doesn't play a role here as this checkbox doesn't trigger an XHR AFAIK.

It might make sence to reduce the timeout to 15 seconds as that might be sufficient, though.

@asdil12
Copy link
Member Author

asdil12 commented Jun 5, 2020

I wonder, how https://openqa.opensuse.org/tests/1289107#step/test_results/18 can happen.
The needle has an exclude area above the differing spot but is still not getting 100%.

@okurz
Copy link
Member

okurz commented Jun 5, 2020

hm, I also don't see any obvious difference in https://openqa.opensuse.org/tests/1289107#step/test_results/18 . Maybe there is very subtle change in shading. You might be able to avoid the exclude area with just covering the test module name and the green color of "passed" without the text

@SergioAtSUSE SergioAtSUSE added this to In progress in QE-Core Jun 9, 2020
@asdil12
Copy link
Member Author

asdil12 commented Jun 16, 2020

@asdil12 asdil12 added the WIP Work in progress label Jun 18, 2020
@asdil12
Copy link
Member Author

asdil12 commented Jun 18, 2020

@asdil12
Copy link
Member Author

asdil12 commented Jun 19, 2020

@asdil12 asdil12 removed the WIP Work in progress label Jun 19, 2020
@asdil12 asdil12 merged commit 85c83c1 into os-autoinst:master Jun 19, 2020
QE-Core automation moved this from In progress to Done Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
QE-Core
  
Done
3 participants