Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS detection fails for Debian Bookworm #1606

Closed
dboehmer opened this issue Aug 24, 2023 · 5 comments
Closed

OS detection fails for Debian Bookworm #1606

dboehmer opened this issue Aug 24, 2023 · 5 comments

Comments

@dboehmer
Copy link

dboehmer commented Aug 24, 2023

Describe the bug

I upgraded a server of mine to Debian 12 Bookworm and started my Rexfile. After some steps Rex fails because it does not correctly detect Debian 12.

Output of test file from below (redacted):

$ rex -f test -u root -H bookworm.example test
[2023-08-24 07:57:17] INFO - Running task test on bookworm.example
[2023-08-24 07:57:18] INFO - OS not supported (No LSB modules are available.
Debian)
[2023-08-24 07:57:18] ERROR - Error executing task:
[2023-08-24 07:57:18] ERROR - OS/Provider not supported at .../lib/site_perl/5.34.0/Rex/Pkg.pm line 72, <> line 80.

[2023-08-24 07:57:18] ERROR - 1 out of 1 task(s) failed:
[2023-08-24 07:57:18] ERROR -   test failed on bookworm.example
[2023-08-24 07:57:18] ERROR -           OS/Provider not supported at .../lib/site_perl/5.34.0/Rex/Pkg.pm line 72, <> line 80.

Expected behavior

Old behavior with Debian 11 Bullseye (output redacted):

$ rex -f test -u root -H bullseye.example test
[2023-08-24 07:57:28] INFO - Running task test on bullseye.example
[2023-08-24 07:57:29] INFO - Installing sdfasdfasdfasfdafd-doesnt-exist.
[2023-08-24 07:57:29] WARN - Error installing sdfasdfasdfasfdafd-doesnt-exist.
[2023-08-24 07:57:29] ERROR - Error executing task:
[2023-08-24 07:57:29] ERROR - Error installing sdfasdfasdfasfdafd-doesnt-exist at .../lib/site_perl/5.34.0/Rex/Pkg/Base.pm line 94, <> line 80.

[2023-08-24 07:57:29] ERROR - 1 out of 1 task(s) failed:
[2023-08-24 07:57:29] ERROR -   test failed on bullseye.example
[2023-08-24 07:57:29] ERROR -           Error installing sdfasdfasdfasfdafd-doesnt-exist at .../lib/site_perl/5.34.0/Rex/Pkg/Base.pm line 94, <> line 80.

How to reproduce it

  1. have Debian 12 Bookworm server
  2. run Rexfile to manage that including package install

Code example

Test Rexfile:

task test => sub {
    install 'sdfasdfasdfasfdafd-doesnt-exist';
};

Additional context

if ( my $ret = i_run "lsb_release -s -i" ) {
looks like Rex asks lsb_release -s -i for detecting the OS. Output via SSH is the same Debian for both host OS versions. But when run in an interactive terminal the behavior changed:

$ ssh root@bullseye.example
# lsb_release -s -i
Debian
# exit
logout
Connection to bullseye.example closed.
$ ssh root@bookworm.example
# lsb_release -s -i
No LSB modules are available.
Debian
# exit
logout
Connection to bookworm.example closed.

The method is_debian() in lib/Rex/Commands/Gather.pm receives the string No LSB modules are available.\nDebian with a literal newline character as argument.

As a workaround I monkeypatched the method to always return true. That way I was able to successfully run my Rexfile.

Rex version

(R)?ex 1.14.3

Perl version

5.34.0

Operating system running rex

Debian 11 Bullseye

Operating system managed by rex

Debian 12 Bookworm

How rex was installed?

cpan client

@dboehmer dboehmer added the triage needed A potential bug that needs to be reproduced and understood label Aug 24, 2023
@dboehmer
Copy link
Author

See https://serverfault.com/questions/817005/debian-8-no-lsb-modules-are-available for why lsb_release might output that message. There is no package lsb-core available in Bookworm and I have lsb-release already installed.

@numberwhun
Copy link

This seems to be an issue with some systems (debian in this case specifically), not suppressing the error message regarding "No LSB modules are available". For instance on Ubuntu, that message is output, but if you specify the '-s -i' options, the message is suppressed.
Since this is an error message, I have submitted PR #1608 to fix this issue. Once reviewed and merged, it should fix this issue.

@ferki
Copy link
Member

ferki commented Oct 12, 2023

Thanks @dboehmer for the report, and @numberwhun for an initial triaging!

I finally could get around to obtaining an official Debian 12 image and reproduce the issue too at first. After much investigation, though, I don't believe it's a bug, but a missing feature flag for modern Rex features.

TL;DR

I only could reproduce the situation with use Rex; in the Rexfile, but not with my usual use Rex -feature => ['1.4', 'exec_autodie']; (which I recommend for everyone anyway). Apparently the provided samples did not include a full Rexfile or debug output which could have made it easier to track down 😓

For SSH connections, the original/default behavior is to allocate a tty, but that makes the distinction between remote STDOUT and STDERR streams impossible. Due to all the problems this causes, since Rex-1.0.0 the default is to don't allocate a tty.

This is not a backwards compatible change in behavior therefore one must opt-in for it via either a feature flag or feature bundle (much like how Perl features work too).

Please try the same Rex task with one of the following Rex imports:

use Rex -feature => ['1.4'];    # enable 1.4 feature bundle
use Rex -feature => ['no_tty']; # disable only tty allocation

and let me know how that works.

Also feel free to reach out on other support channels (Matrix/IRC, open source office hours, etc.) for possibly faster feedback loops.

Long version

First impressions

The problem is indeed about lsb_release returning No LSB modules are available. (followed by a newline character) on STDERR before returning the additional expected output like Debian.

Rex currently takes the output of this command literally without distinction between STDERR and STDOUT. The returned values are used to choose the matching package- and service-management provider modules from the Rex::Pkg and Rex::Service namespaces, respectively.

For example, in case of the example here it literally looks for a module named Rex::Pkg::No LSB modules are available.\nDebian, which of course doesn't exist, and thus get reported as unsupported OS for package management calls.

I'm not yet sure how to write a failing test for this case yet, but I expect one or more of the following would be interesting to check for a solution:

  • use no_stderr for the underlying i_run call to suppress STDERR output
  • use valid_retval and/or fail_ok options for the underlying i_run call depending on the return code associated with the error (I get 0/success, despite the error message, though)

What I don't entirely like about suppressing the STDERR output as a whole, is that it may mask other errors on every other OSes too, not just No LSB modules are available. on Debian 12.

Perhaps a viable alternative would be to specifically filter out this known-to-be-accepted error message from the output instead. It feels a bit dirty too at first, but at least it would allow meaningful failures through for other cases.

Deeper look

There is a an stderr_to_stdout flag for i_run too, so I wonder why do we get a combined STDERR+STDOUT result returned when we don't ask for it 🤔

i_run mixing STDERR and STDOUT might be the real underlying question to solve, instead of how we call lsb_release. It might be due to a bug, or some details in how exactly the SSH connection is made, and what parameters and feature flags are in effect.

The actual problem

Thinking more about the previous point, I noticed the provided example did not include a full Rexfile, but only a sample task. That means, I had to make a guess about the exact use Rex; line in effect for the failing cases.

With the vanilla use Rex;, I could reproduce the error. With 'use Rex -feature => ['1.4'];` it was working as expected.

Looking at the list of available feature flags, I was sure something between 0.0.0 and 1.4.0 causes the different behavior. Since 1.0, part of the default feature set is to don't allocate a tty for the SSH connections.

With tty enabled on the underlying connection based on Net::SSH2 or Net::OpenSSH, always the combined STDOUT+STDERR output is returned without any way to make a distinction between the separate streams. Hence the new default from 1.0 onwards to disable tty on SSH connections.

Due to preserving backwards compatibility, these improvements are always opt-ins via feature flags or feature bundles (much like how use feature and use v5.x works in Perl itself).

@ferki ferki added info needed Further information is needed before continuing and removed triage needed A potential bug that needs to be reproduced and understood labels Oct 12, 2023
@dboehmer
Copy link
Author

I confirmed that

use Rex -feature => ['1.4'];

solved the issue.

Sorry for not including the boilerplate in my test code. I didn’t think of it that this might have an effect on the issue.

@ferki
Copy link
Member

ferki commented Oct 20, 2023

I confirmed that

use Rex -feature => ['1.4'];

solved the issue.

Great! 🎉 Thanks for confirming!

Sorry for not including the boilerplate in my test code. I didn’t think of it that this might have an effect on the issue.

No worries, it was still a valuable learning experience to track down the root cause despite the less-than-ideal starting point. In case we bump into this situation more often, we might want to update the issue template to ask for more like a "fully copy-pasteable example to run" (like a minimal Rexfile), instead of the "shortest possible" one (which sometimes might be considered only a task without further context).

There are already tests for the no_tty behavior in t/no_tty.t, so I think we can close this issue without further changes after all.

@ferki ferki closed this as completed Oct 20, 2023
@ferki ferki removed the info needed Further information is needed before continuing label Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants