Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a retry on webui service for making sure it will boot up always #1884

Conversation

wjn740
Copy link

@wjn740 wjn740 commented Nov 22, 2018

On my testing box, this openqa-webui service can't boot up with
system boot. even 'systemctl enable openqa-webui' is done.
I saw the comments said our API a little bit expensive, so I guess
might be something isn't ready on first time trying.
If adding this retry opportunity, service could start up successful.

On my testing box, this openqa-webui service can't boot up with
system boot. even 'systemctl enable openqa-webui' is done.
I saw the comments said our API a little bit expensive, so I guess
might be something isn't ready on first time trying.
If adding this retry opportunity, service could start up successful.
@coolo
Copy link
Contributor

coolo commented Nov 22, 2018

why is not starting up? It shouldn't fail to startup to begin with.

@wjn740
Copy link
Author

wjn740 commented Nov 22, 2018

openqa:~ # systemctl -l status openqa-webui
● openqa-webui.service - The openQA web UI
   Loaded: loaded (/usr/lib/systemd/system/openqa-webui.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2018-11-22 18:12:15 CST; 1min 35s ago
  Process: 1508 ExecStart=/usr/share/openqa/script/openqa prefork -m production --proxy -i 100 -H 400 -w 20 (code=exited, status=22)
 Main PID: 1508 (code=exited, status=22)

Nov 22 18:12:15 linux openqa[1508]:         Mojo::Server::Daemon::start('Mojo::Server::Prefork=HASH(0x8420f50)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Server/Prefork.pm line 80
Nov 22 18:12:15 linux openqa[1508]:         Mojo::Server::Prefork::run('Mojo::Server::Prefork=HASH(0x8420f50)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious/Command/prefork.pm line 30
Nov 22 18:12:15 linux openqa[1508]:         Mojolicious::Command::prefork::run('Mojolicious::Command::prefork=HASH(0x841d3b0)', '--proxy', '-i', 100, '-H', 400, '-w', 20) called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious/Commands.pm line 54
Nov 22 18:12:15 linux openqa[1508]:         Mojolicious::Commands::run('Mojolicious::Commands=HASH(0x81257d8)', 'prefork', '-m', 'production', '--proxy', '-i', 100, '-H', 400, ...) called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious.pm line 192
Nov 22 18:12:15 linux openqa[1508]:         Mojolicious::start('OpenQA::WebAPI=HASH(0x12dfbd8)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious/Commands.pm line 71
Nov 22 18:12:15 linux openqa[1508]:         Mojolicious::Commands::start_app('Mojolicious::Commands', 'OpenQA::WebAPI') called at /usr/share/openqa/script/../lib/OpenQA/WebAPI.pm line 539
Nov 22 18:12:15 linux openqa[1508]:         OpenQA::WebAPI::run() called at /usr/share/openqa/script/openqa line 34
Nov 22 18:12:15 linux systemd[1]: openqa-webui.service: Main process exited, code=exited, status=22/n/a
Nov 22 18:12:15 linux systemd[1]: openqa-webui.service: Unit entered failed state.
Nov 22 18:12:15 linux systemd[1]: openqa-webui.service: Failed with result 'exit-code'.

@coolo
Copy link
Contributor

coolo commented Nov 22, 2018

This is on boot, right?

@wjn740
Copy link
Author

wjn740 commented Nov 22, 2018

Yes, on boot it's failed.
If retry it manually, it will be up.

@codecov
Copy link

codecov bot commented Nov 22, 2018

Codecov Report

Merging #1884 into master will decrease coverage by 0.06%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1884      +/-   ##
==========================================
- Coverage   90.58%   90.51%   -0.07%     
==========================================
  Files         148      148              
  Lines       10247    10247              
==========================================
- Hits         9282     9275       -7     
- Misses        965      972       +7
Impacted Files Coverage Δ
lib/OpenQA/Worker/Common.pm 81.15% <0%> (-2.18%) ⬇️
lib/OpenQA/Utils.pm 91.89% <0%> (-0.19%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 823c975...6b04649. Read the comment docs.

@Martchus
Copy link
Contributor

Can you show us the full log in the case it fails to start? (journalctl -u openqa-webui)

@wjn740
Copy link
Author

wjn740 commented Nov 22, 2018

Can you show us the full log in the case it fails to start? (journalctl -u openqa-webui)

Sorry, I using a wrong cmdline,,,this is full log:

-- Reboot --
Nov 22 18:12:14 linux systemd[1]: Started The openQA web UI.
Nov 22 18:12:15 linux openqa[1508]: Uncaught exception from user code:
Nov 22 18:12:15 linux openqa[1508]:         Can't create listen socket: Name or service not known at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop.pm line 126.
Nov 22 18:12:15 linux openqa[1508]:         Mojo::IOLoop::Server::listen('Mojo::IOLoop::Server=HASH(0x84343c0)', 'HASH(0x8434618)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop.pm line 126
Nov 22 18:12:15 linux openqa[1508]:         Mojo::IOLoop::server('Mojo::IOLoop=HASH(0x272c4d8)', 'HASH(0x8434618)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Server/Daemon.pm line 209
Nov 22 18:12:15 linux openqa[1508]:         Mojo::Server::Daemon::_listen('Mojo::Server::Prefork=HASH(0x8420f50)', 'http://localhost:9526/') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Server/Daemon.pm line 63
Nov 22 18:12:15 linux openqa[1508]:         Mojo::Server::Daemon::start('Mojo::Server::Prefork=HASH(0x8420f50)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Server/Prefork.pm line 80
Nov 22 18:12:15 linux openqa[1508]:         Mojo::Server::Prefork::run('Mojo::Server::Prefork=HASH(0x8420f50)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious/Command/prefork.pm line 30
Nov 22 18:12:15 linux openqa[1508]:         Mojolicious::Command::prefork::run('Mojolicious::Command::prefork=HASH(0x841d3b0)', '--proxy', '-i', 100, '-H', 400, '-w', 20) called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious/Commands.pm line 54
Nov 22 18:12:15 linux openqa[1508]:         Mojolicious::Commands::run('Mojolicious::Commands=HASH(0x81257d8)', 'prefork', '-m', 'production', '--proxy', '-i', 100, '-H', 400, ...) called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious.pm line 192
Nov 22 18:12:15 linux openqa[1508]:         Mojolicious::start('OpenQA::WebAPI=HASH(0x12dfbd8)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious/Commands.pm line 71
Nov 22 18:12:15 linux openqa[1508]:         Mojolicious::Commands::start_app('Mojolicious::Commands', 'OpenQA::WebAPI') called at /usr/share/openqa/script/../lib/OpenQA/WebAPI.pm line 539
Nov 22 18:12:15 linux openqa[1508]:         OpenQA::WebAPI::run() called at /usr/share/openqa/script/openqa line 34
Nov 22 18:12:15 linux systemd[1]: openqa-webui.service: Main process exited, code=exited, status=22/n/a
Nov 22 18:12:15 linux systemd[1]: openqa-webui.service: Unit entered failed state.
Nov 22 18:12:15 linux systemd[1]: openqa-webui.service: Failed with result 'exit-code'.
Nov 22 18:28:56 openqa systemd[1]: Stopped The openQA web UI.
Nov 22 18:28:57 openqa systemd[1]: Started The openQA web UI.
Nov 22 18:28:58 openqa openqa[1980]: Archive::Extract will be removed from the Perl core distribution in the next major release. Please install it from CPAN. It is being used at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Assets.pm, line 24.
Nov 22 18:28:58 openqa openqa[1980]: [1980:info] Listening at "http://localhost:9526/"
Nov 22 18:28:58 openqa openqa[1980]: [1980:info] Manager 1980 started
Nov 22 18:28:58 openqa openqa[1980]: Server available at http://localhost:9526/
Nov 22 18:28:58 openqa openqa[1980]: [1980:info] Creating process id file "/tmp/prefork.pid"
Nov 22 18:56:19 openqa openqa[1980]: [2003:warn] no products found, retrying version wildcard
Nov 22 18:56:19 openqa openqa[1980]: no products found for opneSUSE-Leap-42.3-DVD-x86_64 at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious.pm line 138.
Nov 22 18:56:50 openqa openqa[1980]: [1986:warn] no products found, retrying version wildcard
Nov 22 18:56:50 openqa openqa[1980]: no products found for opneSUSE-Leap-42.3-DVD-x86_64 at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious.pm line 138.
Nov 22 18:58:41 openqa openqa[1980]: [2000:warn] no products found, retrying version wildcard
Nov 22 18:58:41 openqa openqa[1980]: no products found for opneSUSE-Leap-42.3-DVD-x86_64 at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious.pm line 138.

@wjn740
Copy link
Author

wjn740 commented Nov 22, 2018

Service dependence issue?

@wjn740
Copy link
Author

wjn740 commented Nov 23, 2018

Hi, I upgrade all openQA and perl to the latest development version. Issue be fixed.


2  | devel-openQA-perl-modules       | devel-openQA-perl-modules                          | Yes     | (r ) Yes  | Yes     | rpm-md | https://download.opensuse.org/repositories/devel:/openQA:/Leap:/42.3/openSUSE_Leap_42.3
3  | devel_languages_perl            | perl modules (openSUSE_Leap_42.3)                  | Yes     | (r ) Yes  | No      | rpm-md | http://download.opensuse.org/repositories/devel:/languages:/perl/openSUSE_Leap_42.3/   
4  | devel_openQA                    | Providing openQA dependencies (openSUSE_Leap_42.3) | Yes     | (r ) Yes  | No      | rpm-md | http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Leap_42.3/            

@okurz
Copy link
Member

okurz commented Nov 23, 2018

If the issue is fixed for you that's good.

@coolo, still restart on fail might be good?

Also, does multi-user.target ensure network is up? Sounds like the same issue we had on openqaworker4@o3

@coolo
Copy link
Contributor

coolo commented Nov 23, 2018

I find restart=on-failure a workaround for things we should fix, so I'm kind of against it.

multi-user.target ensures network is up, but we're part of it (wantedby) not behind it.

@okurz
Copy link
Member

okurz commented Nov 23, 2018

Sure, so the really strict hard requirements would be IIUC

After=network-online.target
Wants=network-online.target

but I think this would delay the boot further waiting for DHCP to get adresses from outside, etc. As you said: The problem should be fixed in the code itself to be resilient

@coolo
Copy link
Contributor

coolo commented Nov 23, 2018

And what's the difference between systemd waiting for the network and openqa waiting for the network?

@okurz
Copy link
Member

okurz commented Nov 24, 2018

I guess no difference for a server but I am not sure for a desktop system if that would mean network->openQA->(some other target)->user can work?

@coolo
Copy link
Contributor

coolo commented Nov 24, 2018

Desktop users can work if the desktop is up - if they want to work with openqa, they will have to wait for it to be up, no matter where the delay is.

@coolo
Copy link
Contributor

coolo commented Dec 7, 2018

I don't see that as wanted - especially not as boot order workaround

@coolo coolo closed this Dec 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants