Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcap_next_ex not returning -1? #217

Closed
gabbiccino opened this issue Aug 4, 2020 · 5 comments
Closed

pcap_next_ex not returning -1? #217

gabbiccino opened this issue Aug 4, 2020 · 5 comments

Comments

@gabbiccino
Copy link

@gabbiccino gabbiccino commented Aug 4, 2020

Hi,

We are seeing an issue currently where our application is not handling a network interface reconnection as well as it used to, and believe it may be related to a recent change in npcap.

With npcap 0.9991, we found that pcap_next_ex returned -1 when a network device was disabled. Our app then exited the loop and created a new handle and capture for the device once re-enabled.

With 0.9995 (we've also tried 0.9993-0.9994), it seems to only return 0 (timeout) when a device is disconnected.

I've recreated the same issue by using the basic_dump_ex example from the npcap repo. When testing with 0.9991 installed, the code reaches this line and exits when the monitored device is disabled: https://github.com/nmap/npcap/blob/master/Examples-pcap/basic_dump_ex/basic_dump_ex.c#L127

With 0.9995 I find that the example code continues indefinitely.

Please could someone advise on this issue? It could be related to our setup, but from our testing there appears to be a difference in how the two versions handle disconnected devices.

Many thanks!

@guyharris
Copy link

@guyharris guyharris commented Aug 4, 2020

We are seeing an issue currently where our application is not handling a network interface reconnection as well as it used to, and believe it may be related to a recent change in npcap.

I can state, from bugs filed against libpcap on Linux, that views differ on what is "good" handling of interface outages.

On the *BSDs, if an interface is configured down, BPF does not report an error; only if the interface is removed is an error delivered to clients such as libpcap.

On Linux, if an interface is configured down, PF_PACKET sockets do deliver an error to clients such as libpcap; people filed bug against libpcap, because it causes capture to terminate if an interface is configured down, which can happen briefly during some network events, and they want capture to continue when the interface comes back up. (This is made more painful because no error is reliably delivered when an interface disappears! I need to suggest a patch to the Linux PF_PACKET code to provide different errors for those two conditions.)

With npcap 0.9991, we found that pcap_next_ex returned -1 when a network device was disabled.

"Disabled" in what sense? Removed, or just the Windows equivalent of a UN*X "ifconfig XXX down"?

Our app then exited the loop and created a new handle and capture for the device once re-enabled.

Unfortunately, 1) the libpcap API provides no mechanism for reporting "the interface is down" errors, as distinct from "the interface disappeared" errors, so that applications know the difference, and 2) as a result, other applications just stop capturing - which is the cause of the bugs filed against libpcap.

If, for example, an error is delivered when a machine sleeps and is then woken up, that's unacceptable, as evidenced by the number of bugs filed against Wireshark-on-Windows when that happened. That was fixed in Npcap.

@gabbiccino
Copy link
Author

@gabbiccino gabbiccino commented Aug 5, 2020

Thanks for the response!

"Disabled" in what sense? Removed, or just the Windows equivalent of a UN*X "ifconfig XXX down"?

The latter - we're specifically focusing on VPN disconnect/reconnects and their effect on packet capturing from our app. When I disconnect the VPN that I use (Cisco AnyConnect), using the VPN GUI, the interface shows as "Disabled" within the Network Connections display, but is still present in the system, so I believe that's closer to an ifconfig XXX down than a complete removal of the interface.
image

When capturing, this change in the interface state results in different behaviour when using 0.9991 or later versions. E.g. I performed a test where I added a console log line saying "Timeout elapsed" for each "timeout" return value from pcap_next_ex to the basic_dump_ex example (purely to show that we're still in the while loop), then disconnected the VPN and afterwards tried running the example again to show the available interfaces.

With 0.9991 I see this when disconnecting the VPN and then running the example again. The Cisco connection does not show at all in the second run of the example app.
image

With 0.9995 it continues to run after the VPN is disconnected, and logs timeouts until cancelled manually, but the second run of the example app shows that the interface is not present:
image

I see no issues around the machine going to sleep and not recovering afterwards - the capturing continues as normal, as the interface has never gone into that "disabled" state. If it does go into that "disabled" state, we find that no packets are detected after the reconnect - which is the main reason the -1 response and exiting of the while loop was convenient, as it allowed pcap_open to be run after the device was found again and the handle to be re-established.

I'm unsure if the above is expected behaviour now given the changes you helpfully detailed, but if it is - is there a suggested method for dealing with these scenarios that we could implement? I believe I could throw an error and exit the while loop for 0.9995 by attempting to run pcap_open_live again on the device after a specified number of timeouts.

Thanks again.

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Aug 6, 2020

Thanks for this bug report. I believe this is a bug in the fix to nmap/nmap#2036, which Driver Verifier identified for me just the other day in testing. When we complete an IRP with an error informational status, the IRP handler should also return that status, but we were returning STATUS_SUCCESS instead of STATUS_DEVICE_REMOVED. The fix is in bd0fad0 and will be in the next release, which is undergoing final testing.

@gabbiccino
Copy link
Author

@gabbiccino gabbiccino commented Aug 6, 2020

Thanks! We'll look forward to the release.

@gabbiccino
Copy link
Author

@gabbiccino gabbiccino commented Aug 10, 2020

Thanks for all of your support, I can confirm that 0.9996 resolves the issue we were seeing. I'll close this issue now.

@gabbiccino gabbiccino closed this Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants