Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NX-OS device terminating connection in check_config_mode #1759

Closed
lazhao123 opened this issue May 26, 2020 · 41 comments
Closed

NX-OS device terminating connection in check_config_mode #1759

lazhao123 opened this issue May 26, 2020 · 41 comments

Comments

@lazhao123
Copy link

Hello,

I got connection closed exception when I send many interface commands to cisco nexus device.
On CLI, device can handle this many commands, but on script, it gives exception.

If I give keep_alive, why connection still be closed by remote?

Socket exception: An existing connection was forcibly closed by the remote host (10054)
Traceback (most recent call last):
File "d:.......\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 1739, in send_config_set
output += self.exit_config_mode()
File "d:.......\Python\Python37\lib\site-packages\netmiko\cisco_base_connection.py", line 44, in exit_config_mode
return super().exit_config_mode(exit_config=exit_config, pattern=pattern)
File "d:.......\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 1615, in exit_config_mode
if self.check_config_mode():
File "d:.......\Python\Python37\lib\site-packages\netmiko\cisco\cisco_nxos_ssh.py", line 27, in check_config_mode
return super().check_config_mode(check_string=check_string, pattern=pattern)
File "d:.......\Python\Python37\lib\site-packages\netmiko\cisco_base_connection.py", line 30, in check_config_mode
return super().check_config_mode(check_string=check_string, pattern=pattern)
File "d:.......\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 1582, in check_config_mode
output = self.read_until_pattern(pattern=pattern)
File "d:.......\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 618, in read_until_pattern
return self._read_channel_expect(*args, **kwargs)
File "d:.......\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 543, in _read_channel_expect
raise EOFError("Channel stream closed by remote device.")
EOFError: Channel stream closed by remote device.

['interface Ethernet1/1',
'no switchport',
'no description',
'no speed auto',
... omit ...
'interface Ethernet1/60',
'no switchport',
'no description',
'no speed auto']

net = ConnectHandler(device_type=self.deviceType, ip=self.host, username=self.username,
password=self.password, global_delay_factor=3, keepalive=10)
output = net.send_config_set(commands, cmd_verify=False)

@ktbyers
Copy link
Owner

ktbyers commented May 26, 2020

@lazhao123
Copy link
Author

lazhao123 commented May 26, 2020 via email

@ktbyers
Copy link
Owner

ktbyers commented May 26, 2020

@lazhao123 I don't see any log-file?

@lazhao123
Copy link
Author

I email comments with attaching log, but it didn't attached. Now I attached.

@lazhao123
Copy link
Author

re-attach the log in case you did not get.
test.log

@lazhao123
Copy link
Author

Hi Kirk,

Do you need more info on this issue?

Thanks in advance for any advice

@ktbyers ktbyers changed the title get exception of connection closed by remote host NX-OS device terminating connection in check_config_mode May 27, 2020
@ktbyers
Copy link
Owner

ktbyers commented May 27, 2020

What happens with cmd_verify=True?

@lazhao123
Copy link
Author

lazhao123 commented May 27, 2020 via email

@lazhao123
Copy link
Author

Sorry to keep asking advice.
Is this a netmiko bug? Any workaround?

@ktbyers
Copy link
Owner

ktbyers commented Jun 2, 2020

I need the full exception stack trace and Netmiko-log for cmd_verify=True.

I think you are in a context when you probably require cmd_verify (i.e. very large number of configuration commands where you are going to run into input buffer issues without cmd_verify).

That being said, I didn't really anything all that useful in the log file.

Regards, Kirk

@lazhao123
Copy link
Author

============ stack trace:
Traceback (most recent call last):
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\paramiko\channel.py", line 699, in recv
out = self.in_buffer.read(nbytes, self.timeout)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\paramiko\buffered_pipe.py", line 164, in read
raise PipeTimeout()
paramiko.buffered_pipe.PipeTimeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 541, in _read_channel_expect
new_data = self.remote_conn.recv(MAX_BUFFER)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\paramiko\channel.py", line 701, in recv
raise socket.timeout()
socket.timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Switch\WinPython-64bit-3.6.3.0Zero\scripts\switches.py", line 65, in runconfig
output = net_connect.send_config_set(commands, cmd_verify=True)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 1735, in send_config_set
new_output = self.read_until_pattern(pattern=pattern)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 618, in read_until_pattern
return self._read_channel_expect(*args, **kwargs)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 552, in _read_channel_expect
"Timed-out reading channel, data not available."
netmiko.ssh_exception.NetmikoTimeoutException: Timed-out reading channel, data not available.

============ The source code is:
net_connect = ConnectHandler(device_type=self.deviceType, ip=self.host, username=self.user, password=self.pwd)
output = net_connect.send_config_set(commands, cmd_verify=True)

============ The CLI commands are:
[ 'interface Ethernet1/1',
'no channel-group',
'no switchport',
'no description',
'switchport',
----- omit ---
'interface Ethernet1/54',
'no channel-group',
'no switchport',
'no description',
'switchport' ]

============ Netmiko-log file attached.
test.log

@ktbyers
Copy link
Owner

ktbyers commented Jun 3, 2020

Really no clues at the end of the log...the end of the log looks odd from a Netmiko perspective (i.e. there is nothing there or significance):

The log file doesn't match your code (are you just appending to the end of a log file that has other things in it)?

How many total lines of configuration changes are you trying to make?

@lazhao123
Copy link
Author

Hi Kirk,
test.log is 7.8M. I think it is truncated.
I re-ran script, please find attached test.log and console.log. console.log has stack trace. But test.log is still truncated.
In the script, I want to reset around 52 interfaces, the commands can be found in console.log at line# 9405.
console.log
test.log

Thank you for the help.

@ktbyers
Copy link
Owner

ktbyers commented Jun 3, 2020

Have you tried global_delay_factor=2 (or 4)?

@lazhao123
Copy link
Author

EOFError: Channel stream closed by remote device.
2 files attached.
console.log
test.log

============= The code is:
net_connect = ConnectHandler(device_type=self.device_type, ip=self.host, username=self.username,
password=self.password, global_delay_factor=4)
output = net_connect.send_config_set(commands, cmd_verify=True)

@ktbyers
Copy link
Owner

ktbyers commented Jun 4, 2020

ERROR:paramiko.transport:Socket exception: An existing connection was forcibly closed by the remote host (10054)

Is there any intermediate firewall that might be terminating the connection?

Are there any log messages on the network device indicating it is closing the connection?

@lazhao123
Copy link
Author

The switch and windows server are connected in same rack with direct connection. There is no firewall in between. Device configuration are mostly default.

dump-cap.log is packet capture on device mgmt port, please rename it and open in wireshark.
I found "192.168.0.50 -> 192.168.0.100 SSH [TCP Retransmission] Encrypted response packet len=1460". That's where connection closed.
192.168.0.50 is server IP, 192.168.0.100 is switch mgmt IP.

dump-cap.log
console.log

console log also attached.

@ktbyers
Copy link
Owner

ktbyers commented Jun 4, 2020

Okay, I am a bit at a loss here...the evidence indicates the network device is closing the connection.

Have you tried setting the VTY timeout to a large value?

I guess another option is to use a NAPALM-style file merge for your large configuration set (that uses Netmiko under the hood).

@lazhao123
Copy link
Author

The default nexus inactive vty timeout is 30 minutes. That's enough time to complete the operation.

My script is sending a few commands at a time and driven by user. So I can't use NAPALM. Script works fine except this issue.

It happens when I send many config commands to device.

Google for "Socket exception: An existing connection was forcibly closed by the remote host (10054)" returns many links for this issue.

One is: https://community.developers.refinitiv.com/questions/38738/connection-aborted-connectionreseterror10054.html

I use "net_connect.disconnect()" to disconnect session after script complete sending config. Is Netmiko gracefully close the session? I also tried keep_alive=10, still the same.

@ktbyers
Copy link
Owner

ktbyers commented Jun 5, 2020

How about trying to increase the blocking_timeout to 40? This is an argument to ConnectHandler?

I am going to keep circling back to the network device or something in between is closing the connection for some reason (as that is what the data/log messages indicate).

So does it work reliably for smaller configuration changes?

Is there any packet loss between you and the network device (i.e. are there reliability issues on the connection)?

@lazhao123
Copy link
Author

Added blocking_timeout=40 to ConnectHandler. Still get exception.

The server to switch is direct connect, switch is basic configuration and new. It is the only server to the switch.
script works for smaller configuration command.

Is it possible to add timestamp on Netmiko log? I modified init.py, but two streams are output to console so the console has duplicate output.

By watching the console, I noticed script stops 3 times and finally close stream by remote.

2020-06-05 21:00:48,192: Pattern is: (?:R1-93180YC-EX-A|#)
it paused for 31 sec then continue
2020-06-05 21:01:24,317: Pattern is: (?:R1-93180YC-EX-A|#)
it paused 31 sec then continue
2020-06-05 21:02:34,333: Pattern is: no\ switchport
it paused then exception

The script runs from: 2020-06-05 20:58:21,567: read_channel:
stop at: 2020-06-05 21:02:34,333: Pattern is: no\ switchport
Total script time is: 4m14s

When script send few commands, then wait sometime, send again. It is fine.
test.log

@ktbyers
Copy link
Owner

ktbyers commented Jun 6, 2020

Can you post the exact code you are using to test with (failing case).

You can obfuscate any passwords/confidential information.

@lazhao123
Copy link
Author

To reproduce it, I send config in loop. interface is from 1/1 to 1/54.

commands = ['interface Ethernet1/1',
'no channel-group',
'no switchport',
'no description',
'switchport',
.....
'interface Ethernet1/54',
'no channel-group',
'no switchport',
'no description',
'switchport',
]
for i in range(5):
net_connect = ConnectHandler(device_type='cisco_nxos_ssh', ip=xxx, username=xxx,
password=xxx, blocking_timeout=40)
output = net_connect.send_config_set(commands, cmd_verify=True)
net_connect.disconnect()

@lazhao123
Copy link
Author

Hi Kirk,

Have you reproduced it?

TIA

@ktbyers
Copy link
Owner

ktbyers commented Jun 16, 2020

I haven't got to this...

@ktbyers
Copy link
Owner

ktbyers commented Jun 19, 2020

Okay, I did this test, configuring roughly 50 interfaces repeated five times on one of my NX-OS switches and didn't see an issue.

@lazhao123
Copy link
Author

Cool, can I test the fix?

@ktbyers
Copy link
Owner

ktbyers commented Jun 19, 2020

There is no fix...I didn't see any issue. I was just using the develop branch (or something very similar to it).

@ktbyers
Copy link
Owner

ktbyers commented Jun 19, 2020

i.e. I couldn't reproduce the problem.

@ktbyers
Copy link
Owner

ktbyers commented Jun 19, 2020

But you might want to test with the develop branch (if you aren't already and see if the issue still persists there).

@lazhao123
Copy link
Author

lazhao123 commented Jun 20, 2020

I have narrowed down the issue.
When the port is up, sending 'switchport' to config the port will cause the remote device disconnect. The down port doesn't have the issue.

here is the port status, it is default configuration.

Ethernet VLAN Type Mode Status Reason Speed Port
Interface Ch #
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Eth1/41 1 eth access up none 10G(D) --

Here is the code:

def run_debug_config(config_commands):
net_connect = ConnectHandler(device_type='cisco_nxos_ssh', ip='xxx', username='admin',
password='xxx', global_delay_factor=2)
output = net_connect.send_config_set(config_commands, cmd_verify=False)
net_connect.disconnect()

config_commands=['int Ethernet1/41', 'no switchport', 'switchport']
run_debug_config(config_commands)

Socket exception: An existing connection was forcibly closed by the remote host (10054)
Traceback (most recent call last):
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 10, in
run_debug_config(config_commands)
File "", line 6, in run_debug_config
output = net_connect.send_config_set(config_commands, cmd_verify=False)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 1770, in send_config_set
output += self.exit_config_mode()
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\cisco_base_connection.py", line 44, in exit_config_mode
return super().exit_config_mode(exit_config=exit_config, pattern=pattern)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 1642, in exit_config_mode
if self.check_config_mode():
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\cisco\cisco_nxos_ssh.py", line 28, in check_config_mode
return super().check_config_mode(check_string=check_string, pattern=pattern)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\cisco_base_connection.py", line 30, in check_config_mode
return super().check_config_mode(check_string=check_string, pattern=pattern)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 1606, in check_config_mode
output = self.read_until_pattern(pattern=pattern)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 627, in read_until_pattern
return self._read_channel_expect(*args, **kwargs)
File "C:\Users\lzhao\AppData\Local\Programs\Python\Python37\lib\site-packages\netmiko\base_connection.py", line 552, in _read_channel_expect
raise EOFError("Channel stream closed by remote device.")
EOFError: Channel stream closed by remote device.

@ktbyers
Copy link
Owner

ktbyers commented Jun 20, 2020

Why are you toggling "switchport -> no switchport"?

What model Nexus is this?

Kirk

@lazhao123
Copy link
Author

I use "no switchport", "switchport" to reset the port. "default" command has same issue.
I tested on Nexus 3048, 93180YC, 93180LC. OS updated.

@lazhao123
Copy link
Author

lazhao123 commented Jun 23, 2020

Hi Kirk,
Is there any workaround that you can suggest? It frequently caused disconnect if sending "[no] switchport " on active interface. I can't think of how it affect Netmiko since the connection is with mgmt port.

@ktbyers
Copy link
Owner

ktbyers commented Jun 23, 2020

What happens if you try this?

net_connect = ConnectHandler(
  device_type='cisco_nxos_ssh', 
  ip=xxx, 
  username=xxx,
  password=xxx, 
  global_delay_factor=4,
  blocking_timeout=40
)

@lazhao123
Copy link
Author

Tried, still same error. Thank you for helping me on this.

log.txt

@lazhao123
Copy link
Author

Are you able to reproduce the issue?

@ktbyers
Copy link
Owner

ktbyers commented Jun 24, 2020

No, I couldn't reproduce it.

It looks like an NX-OS issue and not a Netmiko issue.

I would probably work-around it by transferring the config file via Secure Copy and then copying the changes into running-config (but I know you said that wouldn't work for you).

@lazhao123
Copy link
Author

I will try it on arista switch sometime and post update.

@lazhao123
Copy link
Author

arista switch doesn't have issue. So I need to change my code.
Since I make connection once, and config all ports. When one port disconnect and throw exception, is it possible to know which port caused exception? I will re-establish connection, continue to set rest of ports.
If I connect for every port, it will take too much time since there are more than 50 ports.

@ktbyers
Copy link
Owner

ktbyers commented Jul 4, 2020

I am going to close this issue as I don't think it is a Netmiko issue.

My solution to the problem would be to use a file merge operation similar to what is done in NAPALM using the nxos_ssh operation. I know you said that solution would not work for you (earlier), but that is how I would likely work around the problem.

You would have to put in some error handling in your code to find where the error happened. This might require you to write your own custom method using write_channel and read_channel. We do have this PR here which might help, but we haven't completed it:

#1609

Regards, Kirk

@ktbyers ktbyers closed this as completed Jul 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants