Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding send_key to ensure_unlock_desktop #10064

Closed
wants to merge 1 commit into from

Conversation

DrMullings
Copy link
Contributor

@DrMullings DrMullings commented Apr 20, 2020

Sometimes the screenbuffer does not refresh, to workaround this
issue we now send 'Shift' to the SUT

Please give Feedback and Input

lib/x11utils.pm Outdated
@@ -123,6 +123,7 @@ sub ensure_unlocked_desktop {
send_key 'esc'; # end screenlock
};
}
send_key 'shift';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be handled as a product bug. Please, find https://progress.opensuse.org/issues/64812#Suggestions

It is a security issue that the screen lock doesn't hide the previous screen.
We should have a conditional record_soft_fail if this issue is detected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the ticket, the bug is already open.
I would add a conditional soft_failure if I knew how to reliably detect that bug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this branch: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/10064/files#diff-f7cb81ed3ad3075100bf95d110f7e8adR75

It is detecting display-manager, in this branch it could check if the screen buffer is outdated by

  1. sending key
  2. waiting still screen
  3. then looping again
if (match_has_tag 'displaymanager') {
    send_key 'shift'; #<---------------------------------- here                                  
    wait_still_screen; #<--------------------------------- here
    next if (assert_screen('displaymanager', 0) != 0); #<- here

    if (check_var('DESKTOP', 'minimalx')) {
        type_string "$username";
        save_screenshot;
    }
    # Always select user if DM_NEEDS_USERNAME is set
    if ((!check_var('DESKTOP', 'gnome') || (is_sle('<15') || is_leap('<15.0'))) && !get_var('DM_NEEDS_USERNAME')) {
        send_key 'ret';
    }
    # On gnome, user may not be selected and using 'ret' is not enough in this case
    else {
        select_user_gnome($username);
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the screenbuffer frozen we do not know what will be asserted or did I get your point wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, so maybe we need a handling before the loop, instead of handling for each if-branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is basically what I am doing with the press of shift.
Of course we could add a wait_screen_change, but that would slow us down for the cases where everything works fine

Copy link
Member

@SergioAtSUSE SergioAtSUSE Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the shift is doing, but, unconditionally, and we need a condition to detect if it happens to record a soft fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which leads as back to the problem, that it could be basically any screen
My best guess would be 'if not matched any other needle, assume softfail'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go for it, I was just giving suggestions and requirements

@DrMullings
Copy link
Contributor Author

@SergioAtSUSE @foursixnine I updated it as we discussed, please have a look

lib/x11utils.pm Outdated
}


while ($counter--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is fishy with the indention here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tidy might be messing around again, will update my packages soonish

lib/x11utils.pm Outdated
Comment on lines 73 to 78
if check_screen 'screenlock' {
send_key 'shift';
if check_screen 'screenlock' {
record_soft_failure 'bsc#1168979 screenbuffer containing old data';
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this can work reliably as without any waiting time for the effect of the key press this is likely to cause false alarms. Also I doubt we have any new regression for "screenbuffer containing old data" here unless you can back this assumption with very good statistics proofing that any related update in the build really causes a problem now that was not there in before.

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/10064/files#diff-f7cb81ed3ad3075100bf95d110f7e8adL103 has more explanation for the rather old issue of "still old screen content" shown. Introducing a check at a second location in code is unlikely to work out correctly. If you insist on keeping this code then I suggest to safe-guard with an additional version check to not trigger likely false-alarms at least on product versions that can not be affected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrMullings, please always use check_screen 'something', 0; or check_screen('something, 0);` The timeout 0 is important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the 0 is not important. It's the default since some months

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/10064/files#diff-f7cb81ed3ad3075100bf95d110f7e8adL103 has more explanation for the rather old issue of "still old screen content" shown. Introducing a check at a second location in code is unlikely to work out correctly. If you insist on keeping this code then I suggest to safe-guard with an additional version check to not trigger likely false-alarms at least on product versions that can not be affected.

Actually this reminds me somehow of a bug on aarch64 regarding virto-gpu not being able to properly switch back and forth on aarch64. So I think it's more of an architecture + version than just the version alone.

Copy link
Member

@SergioAtSUSE SergioAtSUSE May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I state in the bug, it is reproducible on x86_64 ignore this. This comment is for another PR

lib/x11utils.pm Outdated
while ($counter--) {

# check if the screenbuffer was not updated and record a softfailure in that case
if check_screen 'screenlock' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send the shift key if no screenlock is shown. (possible outdated screenbuffer)
I also realized that we should only apply the workaround if desktop is gnome.

Suggested change
if check_screen 'screenlock' {
if (check_var('DESKTOP', 'gnome') && !check_screen('screenlock', 0) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not logic to me, if we have a lockscreen and press shift we should not get another lockscreen

lib/x11utils.pm Outdated
# check if the screenbuffer was not updated and record a softfailure in that case
if check_screen 'screenlock' {
send_key 'shift';
if check_screen 'screenlock' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if check_screen 'screenlock' {
wait_still_screen; # redrawing, turning on guest display or showing animation may take some seconds
if (check_screen 'screenlock', 0) {

lib/x11utils.pm Outdated
Comment on lines 73 to 78
if check_screen 'screenlock' {
send_key 'shift';
if check_screen 'screenlock' {
record_soft_failure 'bsc#1168979 screenbuffer containing old data';
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrMullings DrMullings force-pushed the screen_buffer branch 2 times, most recently from f6f55d8 to b234d76 Compare May 6, 2020 11:04
lib/x11utils.pm Outdated
@@ -68,6 +68,17 @@ all possible options should be handled within loop to get unlocked desktop
=cut
sub ensure_unlocked_desktop {
my $counter = 10;

# check if the screenbuffer was not updated and record a softfailure in that case
if (check_var('DESKTOP', 'gnome') && check_screen('screenlock')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negation is still missing:

Suggested change
if (check_var('DESKTOP', 'gnome') && check_screen('screenlock')) {
if (check_var('DESKTOP', 'gnome') && !check_screen('screenlock')) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above I cannot follow that logic

Copy link
Member

@SergioAtSUSE SergioAtSUSE May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not logic to me, if we have a lockscreen and press shift we should not get another lockscreen

check_screen('screenlock') will return "true" if the screenlock is found.
If screenlock is found, the bug didn't happen. -> nothing to do

If screenlock is not found, it may be the bug. -> workaround

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And of course, if you have lockscreen and press shift (you can try yourself on your laptop/PC) you still get the lockscreen.

@DrMullings
Copy link
Contributor Author

Still missing the syntax error, but updated according to suggestions

Copy link
Member

@okurz okurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if you think I am picky: Overall your change seems to be well intended but I think by doing it in the way you suggested you try to address a problem that we have in a slightly different way already in other context within the while loop. If you insist on adding it in this way then I recommend to guard it as @foursixnine suggested with a more specific version/arch/machine check. I also strongly recommend to rephrase the git commit message to make the reason/benefit obvious in the git commit message subject line and not state what you are doing.
Also please reference the ticket in the git commit message itself. I assume you are already aware of the external references but for completeness in case you haven't seen it yet: See https://commit.style/ or http://chris.beams.io/posts/git-commit/ as a helpful guide how to write good commit messages.

lib/x11utils.pm Outdated
# check if the screenbuffer was not updated and record a softfailure in that case
if (check_var('DESKTOP', 'gnome') && check_screen('screenlock')) {
send_key 'shift';
wait_still_screen; # redrawing, turning on guest display or showing animation may take some seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not do that. wait_still_screen without explicit still time value is a big time waste which you call here without checking for any error condition in before. Still my old comment still stays and I did not see a response (sorry if I missed it): You are adding another layer of complexity by doing some checks outside the while loop for an issue that is already addressed in a slightly different way within the following while-loop as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my old comment still stays and I did not see a response (sorry if I missed it)

My first suggestion was to only include the workaround in the general-desktop if-branch, but:
#10064 (comment)

You are adding another layer of complexity by doing some checks outside the while loop for an issue that is already addressed in a slightly different way within the following while-loop as well.

There are 5 if-branches. Only one of them rules out the bug (screenlock), the other four could reproduce the bug. That means that the check outside of the loop would need to be done inside each of those 4 if-branches, duplicating code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Well, let's try it your way then :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this again and have commented in #10064 (comment) so I am staying with my original assessment: Please do not add the additional logic outside the loop and no wait_still_screen with default stilltime.

Copy link
Member

@foursixnine foursixnine May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm siding with @okurz here. As I mentioned to @DrMullings earlier today: Initial suggestions to this ticket, were a bit hasty. In any case, I'm pairing with Joachim on this, to get it in a mergeable state

lib/x11utils.pm Outdated
# check if the screenbuffer was not updated and record a softfailure in that case
if (check_var('DESKTOP', 'gnome') && check_screen('screenlock')) {
send_key 'shift';
wait_still_screen; # redrawing, turning on guest display or showing animation may take some seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wait_still_screen; # redrawing, turning on guest display or showing animation may take some seconds
wait_still_screen 30; # redrawing, turning on guest display or showing animation may take some seconds

This should be more than enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a sleep of time of at least 30s, that's the opposite of what I would do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a sleep of time of at least 30s, that's the opposite of what I would do

Then be more helpful and state what you would instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would start with 2 seconds and increase it later if the bug is reproduced again and 2 seconds are not enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry if I was not clear. The problem of wait_still_screen is that it can lead to huge waste of time in non-obvious ways. Take a look in #10163 for a recent example of DimStar77 removing one innocent wait_still_screen 3 and replacing it with a properly designed needle. The result was a speedup of the yast2_control_center test module runtime from 39 minutes to just 3 minutes (!). My common observation is that wait_screen_change and wait_still_screen is added for two reasons: 1. people are afraid of bad performance due to new needles when it is actually the opposite, 2. people do not know how a proper specific needle check can look like

Here I think it is the second case. But unfortunately I myself did not understand the problem enough to be able to suggest you something more useful.

@DrMullings DrMullings force-pushed the screen_buffer branch 5 times, most recently from 1841ef1 to 5dc5866 Compare May 7, 2020 13:21
Sometimes the screenbuffer does not refresh, to workaround this
issue we now send 'Shift' to the SUT
@okurz
Copy link
Member

okurz commented May 7, 2020

Also please be reminded that the referenced bug has a severity specified as "normal" and no blocker flag set. You might want to change that.

@okurz
Copy link
Member

okurz commented May 7, 2020

@SergioAtSUSE because you mentioned the job link: In there I find https://openqa.suse.de/tests/4124163#step/shutdown/4 which looks exactly like why the "blackscreen" tag was introduced in #9610 , btw. only for aarch64 . Overall the test flow in https://openqa.suse.de/tests/4124163 looks like nothing special, just aarch64 behaving a bit different but the same happened in openSUSE already in before. So in https://openqa.suse.de/tests/4124163#step/shutdown/1 there is "old framebuffer content". This is why ensure_unlock_desktop does its magic and tries to start the "desktop runner" to check if the session is actually unlocked or not. https://openqa.suse.de/tests/4124163#step/shutdown/4 should have matched the screenlock/blackscreen but does not because "blackscreen" is not (yet) covered for your specific scenario. But in https://openqa.suse.de/tests/4124163#step/shutdown/7 you can clearly see that the system woke up again, probably as a result to the key presses in the post fail hook. So still I am convinced that you do not need to add any additional logic outside that isn't there already.

Now, why is this no problem on openSUSE gnome aarch64? Probably only because in https://openqa.opensuse.org/tests/1254487#step/updates_packagekit_gpk/10 we disable the screensaver in gnome for good. Another example where https://progress.opensuse.org/issues/40490 should help.

@foursixnine
Copy link
Member

So still I am convinced that you do not need to add any additional logic outside that isn't there already.

Indeed, it can all be handled within the loop, I was discussing this earlier today. In any case, thanks for the useful insight, specially the screensaver part, we also when discussing this ticket previously also came to the ticket you pointed at, but didn't do anything further :)

@SergioAtSUSE
Copy link
Member

Also please be reminded that the referenced bug has a severity specified as "normal" and no blocker flag set. You might want to change that.

I thought the severity is set by developers when they confirm the bug.
I am very cautious of setting flags since I always set them wrong and get complains later.

@SergioAtSUSE
Copy link
Member

I am convinced that you do not need to add any additional logic outside that isn't there already.

And I am convinced of the opposite. The only way to be sure that the bug is reproduced, is with

if GNOME and not screenlock
  shift
  if screenlock then soft-fail

no problem on openSUSE gnome aarch64?
we disable the screensaver in gnome for good. Another example where https://progress.opensuse.org/issues/40490 should help.

I agree to that, we should disable the screensaver.

But, this is not resolving the problem of detecting the bug to record a soft fail. I think you agree with me that a soft-fail should only be recorded when the bug is detected, not also in cases where it isn't.
Handling the black screen as a prove for the bug will also cause soft-failures when the bug is not present.

So, I come back to my initial suggestion to include the workaround (only) in the generic-desktop branch.

@foursixnine
Copy link
Member

Please see #10217

@okurz
Copy link
Member

okurz commented May 11, 2020

@SergioAtSUSE regarding:

#10064 (comment)

Better read https://bugzilla.opensuse.org/page.cgi?id=importance_matrix.html and https://wiki.suse.net/index.php?title=RD-OPS_QA/HowTos/Bug_Reporting#Severity . If you never set Severity in all those years then this is an interesting learning :)

I wonder, who do you expect to look into https://bugzilla.suse.com/show_bug.cgi?id=1168979 ? The bug is of very low quality and like this will very likely never be looked at properly by anyone next to you. Keep in mind that there are mere mortals on the "other side" of bugzilla as well :) If you don't give more details for reproduction, since when this really helps, statistics to prove your claim … – good luck :)

@SergioAtSUSE SergioAtSUSE marked this pull request as draft May 11, 2020 13:08
@foursixnine
Copy link
Member

foursixnine commented May 11, 2020

I think that this pr's name should be changed or closed altogether as it was superseded by #10217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants