Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Npcap 0.993-0.9986 spontaneously stops capturing; all packets go to ps_drop #1891

Open
akontsevoy opened this issue Jan 13, 2020 · 2 comments
Open

Comments

@akontsevoy
Copy link

@akontsevoy akontsevoy commented Jan 13, 2020

Greetings,

We have an intermittent issue with Npcap running on the customer's VMware Windows Server 2016 machines. Initially everything works well and captures the traffic as expected, but eventually, despite pcap_dispatch() being called regularly and not returning errors, the traffic capture stops and all new packets seen by the adapter end up in pcap_stat::ps_drop. We have not been able to identify what triggers this problem.

Broadly, our product operates as follows:

  1. A capture thread is started, which discovers network adapters with pcap_findalldevs(); filters out devices with PCAP_IF_LOOPBACK flag.
  2. Open devices (in this case only one) with pcap = pcap_create(name, ...), pcap_set_buffer_size(pcap, 3 * 1024 * 1024), pcap_activate(pcap), pcap_set_snaplen(pcap, 0xFFFF), pcap_setnonblock(pcap, 1, ...), apply BPF (where BPF looks like not host <single-IP>), pcap_setmintocopy(pcap, 8000), and finally event = pcap_getevent(pcap). All of this completes without errors.
  3. Call pcap_dispatch(pcap, -1, handler, NULL) every time WaitForMultipleObjects(events_size, events, FALSE, 200) returns (whether with WAIT_OBJECT_x or with WAIT_TIMEOUT, i.e. either by activity on the handle, or after 200 ms of inactivity). The handler function always returns locally (doesn't throw exceptions). It does grab a mutex, but we've double-checked that it's getting released properly by other threads in the application (i.e. there's no deadlock condition). Again, pcap_dispatch() returns without errors (always a number >= 0), but after a while stops returning data (i.e. handler no longer gets called).
  4. Roughly every 30 seconds, pcap_findalldevs() is called again, without closing the open pcap handles, to see if the network device list has changed. If it has, all pcap handles are closed and reopened; otherwise no action is taken.
  5. Roughly every second, pcap_stats(pcap, &ps) is called; this again returns without errors, but after a while every new packet goes into ps.ps_drop.

A few observations:

  1. Npcap driver is running (not stopped), whether or not Npcap is currently affected by the problem.
  2. Closing and reopening the Npcap handle temporarily mitigates the problem (the capture starts again, but after a while the same thing happens).
  3. Doesn't reproduce everywhere (we have only a few users affected by the issue).
  4. We've tried Npcap versions 0.993, 0.9985 and 0.9986.

Output of systeminfo.exe:

Host Name:                 <redacted>
OS Name:                   Microsoft Windows Server 2016 Standard
OS Version:                10.0.14393 N/A Build 14393
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Member Server
OS Build Type:             Multiprocessor Free
Registered Owner:          <redacted>
Registered Organization:   <redacted>
Product ID:                <redacted>
Original Install Date:     10/11/2019, 2:14:57 PM
System Boot Time:          10/14/2019, 10:18:35 AM
System Manufacturer:       VMware, Inc.
System Model:              VMware Virtual Platform
System Type:               x64-based PC
Processor(s):              2 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 79 Stepping 1 GenuineIntel ~2600 Mhz
                           [02]: Intel64 Family 6 Model 79 Stepping 1 GenuineIntel ~2600 Mhz
BIOS Version:              Phoenix Technologies LTD 6.00, 9/17/2015
Windows Directory:         C:\Windows
System Directory:          C:\Windows\system32
Boot Device:               \Device\HarddiskVolume1
System Locale:             en-us;English (United States)
Input Locale:              en-us;English (United States)
Time Zone:                 (UTC-06:00) Central Time (US & Canada)
Total Physical Memory:     16,384 MB
Available Physical Memory: 13,457 MB
Virtual Memory: Max Size:  19,980 MB
Virtual Memory: Available: 13,769 MB
Virtual Memory: In Use:    6,211 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    <redacted>
Logon Server:              N/A
Hotfix(s):                 4 Hotfix(s) Installed.
                           [01]: KB3199986
                           [02]: KB4033393
                           [03]: KB4521858
                           [04]: KB3200970
Network Card(s):           1 NIC(s) Installed.
                           [01]: vmxnet3 Ethernet Adapter
                                 Connection Name: Ethernet
                                 DHCP Enabled:    No
                                 IP address(es)
                                 [01]: <Private IPv4 redacted>
                                 [02]: <Link-local IPv6 redacted>
Hyper-V Requirements:      A hypervisor has been detected. Features required for Hyper-V will not be displayed.

Relevant output of reg.exe query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318} /s:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\0001
    DriverDesc    REG_SZ    vmxnet3 Ethernet Adapter
    ProviderName    REG_SZ    VMware, Inc.
    DriverDateData    REG_BINARY    0040D958AC65D401
    DriverDate    REG_SZ    10-17-2018
    DriverVersion    REG_SZ    1.8.10.0
    InfPath    REG_SZ    oem9.inf
    InfSection    REG_SZ    vmxnet3.ndis630.x64.ndi.NT
    IncludedInfs    REG_MULTI_SZ    machine.inf\0pci.inf
    MatchingDeviceId    REG_SZ    PCI\VEN_15AD&DEV_07B0
    EnableMonitorMode    REG_SZ    0
    *IfType    REG_DWORD    0x6
    *MediaType    REG_DWORD    0x0
    *PhysicalMediaType    REG_DWORD    0xe
    BusType    REG_SZ    5
    Characteristics    REG_DWORD    0x84
    *SpeedDuplex    REG_SZ    0
    *PriorityVLANTag    REG_SZ    3
    *JumboPacket    REG_SZ    1514
    *InterruptModeration    REG_SZ    1
    OffloadVlanEncap    REG_SZ    1
    EnableWakeOnLan    REG_SZ    1
    *WakeOnPattern    REG_SZ    1
    *WakeOnMagicPacket    REG_SZ    1
    *IPChecksumOffloadIPv4    REG_SZ    3
    *TCPChecksumOffloadIPv4    REG_SZ    3
    *UDPChecksumOffloadIPv4    REG_SZ    3
    OffloadIpOptions    REG_SZ    1
    OffloadTcpOptions    REG_SZ    1
    EnableAdaptiveRing    REG_SZ    1
    *TCPChecksumOffloadIPv6    REG_SZ    3
    *UDPChecksumOffloadIPv6    REG_SZ    3
    *LsoV1IPv4    REG_SZ    1
    *LsoV2IPv4    REG_SZ    1
    *LsoV2IPv6    REG_SZ    1
    *RSS    REG_SZ    0
    IfTypePreStart    REG_DWORD    0x6
    NetworkInterfaceInstallTimestamp    REG_QWORD    0x1d5806e43d780ff
    InstallTimeStamp    REG_BINARY    E3070A0006000C0002000D000F003700
    DeviceInstanceID    REG_SZ    PCI\VEN_15AD&DEV_07B0&SUBSYS_07B015AD&REV_01\FF290C00DDE810FE00
    ComponentId    REG_SZ    PCI\VEN_15AD&DEV_07B0
    NetCfgInstanceId    REG_SZ    {0C2A9A0D-3001-4519-B5FC-3CF3B8A006A2}
    NetLuidIndex    REG_DWORD    0x8001
    *RscIPv4    REG_SZ    1
    *RscIPv6    REG_SZ    1
    CoalSchemeMode    REG_SZ    4
@fyodor

This comment has been minimized.

Copy link

@fyodor fyodor commented Jan 16, 2020

Hi Alexey. Thanks for the very detailed report. We're investigating now how this could happen and of course what can be done to fix it!

@dmiller-nmap

This comment has been minimized.

Copy link

@dmiller-nmap dmiller-nmap commented Jan 21, 2020

Thanks again for the report. I have a few questions to help us narrow down what might be happening, since we have not yet been able to reproduce the issue here:

  1. What kind of network traffic is going on or typical in the cases where the issue occurs? That is, is it many small packets, fewer large packets, etc.?
  2. Does WaitForMultipleObjects stop returning WAIT_OBJECT_0? In other words, does the event still get signaled even if pcap_dispatch() does not retrieve any packets, or does it consistently time out or return WAIT_FAILED?
  3. Is the pcap handle (pcap_t) shared between multiple threads or processes?
  4. Are any packets dropped before pcap_dispatch() stops delivering packets? That is, does ps_recv increase without a corresponding increase in ps_drop for any amount of time after the first increase to ps_drop?

Thanks for any additional info you can provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.