New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run() after reboot() fails #329

Closed
bitprophet opened this Issue Aug 19, 2011 · 8 comments

Comments

Projects
None yet
2 participants
@bitprophet
Member

bitprophet commented Aug 19, 2011

Description

Even after waiting long enough for the box to come up, the next command after reboot fails like:

[somehost.example.com] Waiting for reboot: ........................done.
[somehost.example.com] run: rm thatfile.txt
Traceback (most recent call last):
File "/Users/brainsik/.virtualenvs/sauce/lib/python2.7/site-packages/fabric/main.py", line 546, in main
commands[name](*args, **kwargs)
File "/Users/brainsik/src/sauce/admin/fabfile.py", line 571, in pre_install_node
run("rm pre-install-node.sh")
File "/Users/brainsik/.virtualenvs/sauce/lib/python2.7/site-packages/fabric/network.py", line 303, in host_prompting_wrapper
return func(*args, **kwargs)
File "/Users/brainsik/.virtualenvs/sauce/lib/python2.7/site-packages/fabric/operations.py", line 918, in run
return _run_command(command, shell, pty, combine_stderr)
File "/Users/brainsik/.virtualenvs/sauce/lib/python2.7/site-packages/fabric/operations.py", line 841, in _run_command
stdout, stderr, status = _execute(default_channel(), wrapped_command, pty,
File "/Users/brainsik/.virtualenvs/sauce/lib/python2.7/site-packages/fabric/state.py", line 274, in default_channel
return connections[env.host_string].get_transport().open_session()
AttributeError: 'NoneType' object has no attribute 'open_session'
Disconnecting from somehost.example.com... done.

Originally submitted by jeremy avnet / @brainsik (brainsik) on 2011-03-28 at 04:49pm EDT


Closed as Done on 2011-06-23 at 09:19pm EDT

@ghost ghost assigned bitprophet Aug 19, 2011

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Miquel Torres (tobami) posted:


I can confirm this bug with version 1.0.1 (and python 2.6)


on 2011-03-31 at 05:27am EDT

Member

bitprophet commented Aug 19, 2011

Miquel Torres (tobami) posted:


I can confirm this bug with version 1.0.1 (and python 2.6)


on 2011-03-31 at 05:27am EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Kirit Sælensminde (KayEss) posted:


This is still in Fabric 1.0.1. The workaround is to delete the connection from the connection cache after doing the reboot -- you'll need the hostname for that, which I guess should be in env:

from fabric.state import connections

def do_reboot():
    reboot(30)
    del connections[env.host_name]

on 2011-06-22 at 03:55am EDT

Member

bitprophet commented Aug 19, 2011

Kirit Sælensminde (KayEss) posted:


This is still in Fabric 1.0.1. The workaround is to delete the connection from the connection cache after doing the reboot -- you'll need the hostname for that, which I guess should be in env:

from fabric.state import connections

def do_reboot():
    reboot(30)
    del connections[env.host_name]

on 2011-06-22 at 03:55am EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


reboot is doing this already, see here. However something is clearly amiss because a handful of folks have reported that it's not actually clearing out the connection.

So either the connection cache class is broken and returning None when it should be establishing a new connection, or somehow the value for that host string is getting overwritten with None in some other fashion.

If one of you is able to consistently reproduce this, I'd appreciate some debugging around the connection cache class and its global instance (connections) to see what is going on. Last time I tried I was unable to reproduce this problem, though I'll try again real quick now.


on 2011-06-22 at 12:22pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


reboot is doing this already, see here. However something is clearly amiss because a handful of folks have reported that it's not actually clearing out the connection.

So either the connection cache class is broken and returning None when it should be establishing a new connection, or somehow the value for that host string is getting overwritten with None in some other fashion.

If one of you is able to consistently reproduce this, I'd appreciate some debugging around the connection cache class and its global instance (connections) to see what is going on. Last time I tried I was unable to reproduce this problem, though I'll try again real quick now.


on 2011-06-22 at 12:22pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


Oh good, I can replicate it now on a VM (on both 0.9.6 and 1.0 release branch). Will see what I can find.


on 2011-06-22 at 12:34pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Oh good, I can replicate it now on a VM (on both 0.9.6 and 1.0 release branch). Will see what I can find.


on 2011-06-22 at 12:34pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


It's pretty dumb and simple, the host_string variable is different in the connections mapping and at the time one calls reboot. Looking now to see why this discrepancy exists and how best to fix it, hopefully without breaking other stuff.


on 2011-06-22 at 02:10pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


It's pretty dumb and simple, the host_string variable is different in the connections mapping and at the time one calls reboot. Looking now to see why this discrepancy exists and how best to fix it, hopefully without breaking other stuff.


on 2011-06-22 at 02:10pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


My bad, that's not entirely accurate, it's that the connection cache itself is normalizing the key given to it in __getitem__ -- fleshing it out via normalize().

So omitting user and/or port in your input host string (e.g. fab -H foo mytask or @hosts('foo')) results in env.host_string = "foo" but state.connections.keys() returning eg ["jforcier@foo:22"] (in my case.)

This is/was not a large problem because any later requests for client objects (such as when the SFTP code grabs the connection object) passes through the normalization process, because they use __getitem__; only in reboot, where we actively try to change the values of the connection cache (__delitem__?), does the mismatch arise.

And any situation where you are specifying all 3 components would also work OK -- which is probably how I was testing the original reboot implementation.

The sensible solution seems to be to update HostConnectionCache so it performs normalization on all key accesses, or at least on deletes, or at least update reboot to call normalize, in order of long-term usefulness vs speed of implementation.


on 2011-06-23 at 07:04pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


My bad, that's not entirely accurate, it's that the connection cache itself is normalizing the key given to it in __getitem__ -- fleshing it out via normalize().

So omitting user and/or port in your input host string (e.g. fab -H foo mytask or @hosts('foo')) results in env.host_string = "foo" but state.connections.keys() returning eg ["jforcier@foo:22"] (in my case.)

This is/was not a large problem because any later requests for client objects (such as when the SFTP code grabs the connection object) passes through the normalization process, because they use __getitem__; only in reboot, where we actively try to change the values of the connection cache (__delitem__?), does the mismatch arise.

And any situation where you are specifying all 3 components would also work OK -- which is probably how I was testing the original reboot implementation.

The sensible solution seems to be to update HostConnectionCache so it performs normalization on all key accesses, or at least on deletes, or at least update reboot to call normalize, in order of long-term usefulness vs speed of implementation.


on 2011-06-23 at 07:04pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


Applied in changeset commit:3f5503e4e21e5e88c0b39fb159fff315bf41dc98.


on 2011-06-23 at 09:19pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Applied in changeset commit:3f5503e4e21e5e88c0b39fb159fff315bf41dc98.


on 2011-06-23 at 09:19pm EDT

@bitprophet bitprophet closed this Aug 19, 2011

ramonvanalteren pushed a commit to ramonvanalteren/fabric that referenced this issue Aug 31, 2011

ramonvanalteren pushed a commit to ramonvanalteren/fabric that referenced this issue Aug 31, 2011

Add test re fabric#329
Conflicts:

	tests/test_network.py

ramonvanalteren added a commit to ramonvanalteren/fabric that referenced this issue Aug 31, 2011

Merge remote-tracking branch 'upstream/master' into logging
* upstream/master: (123 commits)
  Remove confusing, extraneous note re: example
  Fix main loop to look for Task.run()
  Fix up docs.push
  Update tag list for manually generated docs
  Task decorator must be first
  Enhance docs on Task subclass usage
  Dev version for 1.2
  Dev version for 1.1.1
  Version bump for 1.0.2
  Silly/shitty little sanity-test runner
  Fixes fabric#345, contains() returns boolean, not retval.
  Fix I/O race condition
  Formatting
  Add test re fabric#329
  Add 1.0.2 note to 1.1 release docs
  Note that 1.0.2 will contain 0.9.7
  Fixes fabric#329: reboot() broken for partial host strings
  Dogfooding: use new-style tasks, namespaces in core fabfile
  Re fabric#56, don't allow leaf classic modules to pollute new-style trees
  Use FabricTest to isolate namespace tests
  ...

Conflicts:
	fabric/main.py
	fabric/network.py
	fabric/operations.py
@mungayree

This comment has been minimized.

Show comment
Hide comment
@mungayree

mungayree Jun 14, 2016

I am trying the below on latest fabric and it fails, the systems are super slow and take 500 seconds for ssh to come up.

  @task
  def rebootandwait():
        reboot(wait=500)
        run('date')

mungayree commented Jun 14, 2016

I am trying the below on latest fabric and it fails, the systems are super slow and take 500 seconds for ssh to come up.

  @task
  def rebootandwait():
        reboot(wait=500)
        run('date')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment