Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad packet length #442

Open
dcramer opened this issue Nov 18, 2014 · 3 comments
Open

Bad packet length #442

dcramer opened this issue Nov 18, 2014 · 3 comments

Comments

@dcramer
Copy link

dcramer commented Nov 18, 2014

Hitting this with current (master) of Fabric + paramiko.

Not sure what details are useful, but we're using a gateway, and this reproduces itself under a specific case 100% of the time.

Nov 18 10:57:05 crimson sshd[21873]: pam_unix(sshd:session): session opened for
user build by (uid=0)
Nov 18 10:57:17 crimson sshd[22008]: Bad packet length 3588285689.
Nov 18 10:57:17 crimson sshd[22008]: Disconnecting: Packet corrupt

Strangely, it only happens on this case (a push to a staging node) and not others (i.e. our production push).

From the Fabric side of things this is a run command:

[stage] Executing task 'release_non_web_services'
[stage] Executing task 'release_web_services'
[stage] sudo: supervisorctl stop all
Traceback (most recent call last):
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/main.py", line 743, in main
    *args, **kwargs
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/tasks.py", line 384, in execute
    multiprocessing
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/tasks.py", line 274, in _execute
    return task.run(*args, **kwargs)
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/Users/dcramer/Development/getsentry/fabric_support/decorators.py", line 47, in wrapped
    return func(*args, **kwargs)
  File "/Users/dcramer/Development/getsentry/fabfile.py", line 296, in deploy
    execute(release_web_services, version)
  File "/Users/dcramer/Development/getsentry/fabric_support/utils.py", line 31, in execute
    return execute_func(*args, **kwargs)
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/tasks.py", line 384, in execute
    multiprocessing
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/tasks.py", line 274, in _execute
    return task.run(*args, **kwargs)
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/decorators.py", line 53, in inner_decorator
    return func(*args, **kwargs)
  File "/Users/dcramer/Development/getsentry/fabric_support/decorators.py", line 29, in wrapped
    return func(*args, **kwargs)
  File "/Users/dcramer/Development/getsentry/fabfile.py", line 386, in release_web_services
    stop_services()
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/tasks.py", line 171, in __call__
    return self.run(*args, **kwargs)
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/decorators.py", line 53, in inner_decorator
    return func(*args, **kwargs)
  File "/Users/dcramer/Development/getsentry/fabric_support/decorators.py", line 29, in wrapped
    return func(*args, **kwargs)
  File "/Users/dcramer/Development/getsentry/fabfile.py", line 420, in stop_services
    sudo('supervisorctl stop %s' % services)
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/network.py", line 647, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/operations.py", line 1101, in sudo
    stderr=stderr, timeout=timeout, shell_escape=shell_escape,
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/operations.py", line 915, in _run_command
    channel=default_channel(), command=wrapped_command, pty=pty,
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/state.py", line 397, in default_channel
    chan = _open_session()
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/fabric/state.py", line 389, in _open_session
    return connections[env.host_string].get_transport().open_session()
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/paramiko/transport.py", line 615, in open_session
    max_packet_size=max_packet_size)
  File "/Users/dcramer/.virtualenvs/getsentry/lib/python2.7/site-packages/paramiko/transport.py", line 731, in open_channel
    raise e
EOFError
Disconnecting from stage... done.
Disconnecting from bastion... done.
@bitprophet
Copy link
Member

Strangely, it only happens on this case (a push to a staging node) and not others (i.e. our production push).

Are there any other differences between those two cases? Configs or other fabfile related data differing, different network path, etc?

Otherwise, this is new to me, I'd be curious if you can reproduce it with older Paramiko versions. For example it might be due to the changes in #372 (released in 1.15.0) so if you can try 1.14.x or 1.13.x, that'd be a great start.

@dcramer
Copy link
Author

dcramer commented Nov 19, 2014

There's really nothing different about the servers. They're basically identical. Probably different subnets, but nothing more than that. This actually worked fine at some point too, so I'm not really sure what started causing this.

Also confirmed the issue happens both in 1.14.x and 1.13.x, as well as Fabric 1.9.1 and 1.8.5.

It's also not something that's local to my system as it happens both on our build server as well as my local OS X machine.

If it helps the code looks like this:

@runs_once
def deploy():
   execute(foo)
   execute(bar)
   execute(baz)

With some of the execute tasks being @parallel and some not, and specifically the one it fails on being @serial. The runs_once stuff isn't the built-in code but that shouldn't matter. In this case, the "bar" command doesnt actually run as the node isnt in the list of roles and the command it fails on is "baz"

When I removed @serial on it it actually worked fine (I switched it to @parallel), so maybe it's a bug in Fabric somehow?

As a shitty workaround I'm going to @parallel(pool_size=1)

@bitprophet
Copy link
Member

If that parallel trick works I suspect state shenanigans, since moving to parallel should be recreating the connection/client (vs serial, where it caches a single Paramiko client object and reuses it - which normally works fine but if something is getting hinky at some point...).

I bet you'd see the same 'it suddenly works now' if you went back to the @serial, but added an explicit from fabric.network import disconnect_all; disconnect_all() prior to your execute(baz). Doing so would have roughly the same effect - a new Paramiko level transport/client would get set up for baz to use.

This would prove that something odd is going on re: the client object inside foo (or inside bar if it's running, though you say it isn't.)

If you can share what foo does that might shed a clue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants