Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebooting an instance fails with Mitogen #370

Closed
mnaser opened this issue Sep 16, 2018 · 7 comments
Closed

Rebooting an instance fails with Mitogen #370

mnaser opened this issue Sep 16, 2018 · 7 comments
Labels

Comments

@mnaser
Copy link

@mnaser mnaser commented Sep 16, 2018

Following some of these instructions:

https://www.jeffgeerling.com/blog/2018/reboot-and-wait-reboot-complete-ansible-playbook

I ran into the following, post-reboot. One of the things to note that this is master and using stackable config (jumphost to ssh) with a simple mitogen_via.

TASK [Reboot server] *********************************************************************************************************************************************************************************************
Sunday 16 September 2018  19:11:02 -0400 (0:00:00.110)       0:00:10.608 ****** 
changed: [<HOSTNAME>]

TASK [Wait for the reboot to complete.] **************************************************************************************************************************************************************************
Sunday 16 September 2018  19:11:02 -0400 (0:00:00.397)       0:00:11.006 ****** 
ERROR! [pid 86864] 19:11:08.020708 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible'): no route to child
ERROR! [pid 86864] 19:11:08.022572 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils'): no route to child
ERROR! [pid 86864] 19:11:08.023217 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils._text'): no route to child
ERROR! [pid 86864] 19:11:08.023905 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible'): no route to child
ERROR! [pid 86864] 19:11:08.024471 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils'): no route to child
ERROR! [pid 86864] 19:11:08.024967 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils.basic'): no route to child
ERROR! [pid 86864] 19:11:08.025556 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible'): no route to child
ERROR! [pid 86864] 19:11:08.026097 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils'): no route to child
ERROR! [pid 86864] 19:11:08.030466 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils.parsing'): no route to child
ERROR! [pid 86864] 19:11:08.031423 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible'): no route to child
ERROR! [pid 86864] 19:11:08.032651 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils'): no route to child
ERROR! [pid 86864] 19:11:08.033647 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils.parsing'): no route to child
ERROR! [pid 86864] 19:11:08.034167 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils.parsing.convert_bool'): no route to child
ERROR! [pid 86864] 19:11:08.034578 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible'): no route to child
ERROR! [pid 86864] 19:11:08.034997 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils'): no route to child
ERROR! [pid 86864] 19:11:08.035371 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils.pycompat24'): no route to child
ERROR! [pid 86864] 19:11:08.035779 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible'): no route to child
ERROR! [pid 86864] 19:11:08.036139 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils'): no route to child
ERROR! [pid 86864] 19:11:08.036544 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: ModuleForwarder(Router(Broker(0x7f82c8cca410))): dropping FORWARD_MODULE(1007, u'ansible.module_utils.six'): no route to child
ERROR! [pid 86864] 19:11:08.036892 E mitogen: Router(Broker(0x105d2c630)): bad auth_id: got 0 via mitogen.ssh.Stream('ssh.<MITOGEN_VIA>'), not None: Message(1007, 4008, 0, 101, 1003, b'\x80\x02(X.\x00\x00\x00Mohammeds-MBP-86878-7fff94991380-576052bf4'..1429)
ERROR! [pid 86864] 19:11:08.119630 E mitogen.ctx.ssh.<MITOGEN_VIA>: mitogen: Router(Broker(0x7f82c8cca410)): invalid handle: Message(1005, 0, 0, 1004, 0, '\x80\x02cmitogen.core\n_unpickle_call_error\nq\x00Xc\x00\x00\x00Caller'..152)
@dw
Copy link
Member

@dw dw commented Oct 23, 2018

Hi Mohammed, thanks for reporting this! Hopefully should have a fix for it shortly.

@dw dw added the user-reported label Oct 23, 2018
@dw
Copy link
Member

@dw dw commented Oct 24, 2018

All the module does is:

  1. Run 'reboot' via _low_level_execute_command()
  2. In a loop,
    1. call Connection.set_option('..timeout..')
    2. Connection.reset() (#369)
    3. Call _low_level_execute_command

So basically this is about implementing .reset() and ensuring errors propagate reliably on exception

@dw
Copy link
Member

@dw dw commented Oct 24, 2018

Hrm :) The one VM I chose to try this against, it seems the reboot module is not working with vanilla Ansible..

This one looks fun!

@dw
Copy link
Member

@dw dw commented Oct 24, 2018

There are a bunch of issues here:

  • Mitogen notices the connection has gone away, but does not mark the Connection object in the worker process as 'closed'. This is the source of the hang you are witnessing
  • Mitogen does not support Connection.reset(), allowing the reboot action to explicitly mark the connection for a reconnect
  • OS scheduler prevents Mitogen from reading the reboot command output before SSH is torn down. Similar problem effects vanilla to a lesser degree. Running with a higher process priority "fixes" it
@mnaser
Copy link
Author

@mnaser mnaser commented Nov 1, 2018

For what it's worth, I've kinda just worked around it by running that play that involves a reboot without Mitogen.. but that's not always possible

@dw
Copy link
Member

@dw dw commented Nov 1, 2018

I don't want to tell you to re-test yet, but I /think/ this one is done on master, modulo a bunch of annoying new soft errors that get printed (still WIP).

I'm in the middle of migrating from Mac to PC, and thought it'd be a great idea to shred all my old VMs. So I haven't been able to test this just yet.

In any case, there is possibly a mandatory workaround required for Mitogen: reboot with no delay basically has no guarantee of success upstream, and with Mitogen the problem is worse because the reboot exit status must bounce around 2 more processes / 4 more threads before it makes it off the box, simultaneous to systemd tearing the machine apart. That doesn't seem fixable with the reboot module's current approach -- but I'm not done looking yet :)

dw added a commit that referenced this issue Nov 1, 2018
dw added a commit that referenced this issue Nov 1, 2018
Simply listen to RouteMonitor's Context "disconnect"  and forget
contexts according to RouteMonitor's rules, rather than duplicate them
(and screw it up).
dw added a commit that referenced this issue Nov 1, 2018
dw added a commit that referenced this issue Nov 1, 2018
- Context serialization fix
- #370: functioning reboot module.
@dw
Copy link
Member

@dw dw commented Nov 1, 2018

Hi Mohammed,

This appears to work now. There are lingering soft errors printed to the console that will be fixed before
the next release, and a pre_reboot_delay is required on systemd machines to ensure exit status has time to be reported (this has been recorded in the release notes), but otherwise, feel free to kick the tires on master and let me know if you have any more problems. :)

untitled

This is now on the master branch and will make it into the next release. To be updated when a new release is made, subscribe to https://networkgenomics.com/mail/mitogen-announce/

Thanks for reporting this!

@dw dw closed this Nov 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants