Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Npcap 0.993-0.9986 spontaneously stops capturing; all packets go to ps_drop #119

Open
akontsevoy opened this issue Jan 13, 2020 · 22 comments
Open

Comments

@akontsevoy
Copy link

@akontsevoy akontsevoy commented Jan 13, 2020

Greetings,

We have an intermittent issue with Npcap running on the customer's VMware Windows Server 2016 machines. Initially everything works well and captures the traffic as expected, but eventually, despite pcap_dispatch() being called regularly and not returning errors, the traffic capture stops and all new packets seen by the adapter end up in pcap_stat::ps_drop. We have not been able to identify what triggers this problem.

Broadly, our product operates as follows:

  1. A capture thread is started, which discovers network adapters with pcap_findalldevs(); filters out devices with PCAP_IF_LOOPBACK flag.
  2. Open devices (in this case only one) with pcap = pcap_create(name, ...), pcap_set_buffer_size(pcap, 3 * 1024 * 1024), pcap_activate(pcap), pcap_set_snaplen(pcap, 0xFFFF), pcap_setnonblock(pcap, 1, ...), apply BPF (where BPF looks like not host <single-IP>), pcap_setmintocopy(pcap, 8000), and finally event = pcap_getevent(pcap). All of this completes without errors.
  3. Call pcap_dispatch(pcap, -1, handler, NULL) every time WaitForMultipleObjects(events_size, events, FALSE, 200) returns (whether with WAIT_OBJECT_x or with WAIT_TIMEOUT, i.e. either by activity on the handle, or after 200 ms of inactivity). The handler function always returns locally (doesn't throw exceptions). It does grab a mutex, but we've double-checked that it's getting released properly by other threads in the application (i.e. there's no deadlock condition). Again, pcap_dispatch() returns without errors (always a number >= 0), but after a while stops returning data (i.e. handler no longer gets called).
  4. Roughly every 30 seconds, pcap_findalldevs() is called again, without closing the open pcap handles, to see if the network device list has changed. If it has, all pcap handles are closed and reopened; otherwise no action is taken.
  5. Roughly every second, pcap_stats(pcap, &ps) is called; this again returns without errors, but after a while every new packet goes into ps.ps_drop.

A few observations:

  1. Npcap driver is running (not stopped), whether or not Npcap is currently affected by the problem.
  2. Closing and reopening the Npcap handle temporarily mitigates the problem (the capture starts again, but after a while the same thing happens).
  3. Doesn't reproduce everywhere (we have only a few users affected by the issue).
  4. We've tried Npcap versions 0.993, 0.9985 and 0.9986.

Output of systeminfo.exe:

Host Name:                 <redacted>
OS Name:                   Microsoft Windows Server 2016 Standard
OS Version:                10.0.14393 N/A Build 14393
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Member Server
OS Build Type:             Multiprocessor Free
Registered Owner:          <redacted>
Registered Organization:   <redacted>
Product ID:                <redacted>
Original Install Date:     10/11/2019, 2:14:57 PM
System Boot Time:          10/14/2019, 10:18:35 AM
System Manufacturer:       VMware, Inc.
System Model:              VMware Virtual Platform
System Type:               x64-based PC
Processor(s):              2 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 79 Stepping 1 GenuineIntel ~2600 Mhz
                           [02]: Intel64 Family 6 Model 79 Stepping 1 GenuineIntel ~2600 Mhz
BIOS Version:              Phoenix Technologies LTD 6.00, 9/17/2015
Windows Directory:         C:\Windows
System Directory:          C:\Windows\system32
Boot Device:               \Device\HarddiskVolume1
System Locale:             en-us;English (United States)
Input Locale:              en-us;English (United States)
Time Zone:                 (UTC-06:00) Central Time (US & Canada)
Total Physical Memory:     16,384 MB
Available Physical Memory: 13,457 MB
Virtual Memory: Max Size:  19,980 MB
Virtual Memory: Available: 13,769 MB
Virtual Memory: In Use:    6,211 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    <redacted>
Logon Server:              N/A
Hotfix(s):                 4 Hotfix(s) Installed.
                           [01]: KB3199986
                           [02]: KB4033393
                           [03]: KB4521858
                           [04]: KB3200970
Network Card(s):           1 NIC(s) Installed.
                           [01]: vmxnet3 Ethernet Adapter
                                 Connection Name: Ethernet
                                 DHCP Enabled:    No
                                 IP address(es)
                                 [01]: <Private IPv4 redacted>
                                 [02]: <Link-local IPv6 redacted>
Hyper-V Requirements:      A hypervisor has been detected. Features required for Hyper-V will not be displayed.

Relevant output of reg.exe query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318} /s:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\0001
    DriverDesc    REG_SZ    vmxnet3 Ethernet Adapter
    ProviderName    REG_SZ    VMware, Inc.
    DriverDateData    REG_BINARY    0040D958AC65D401
    DriverDate    REG_SZ    10-17-2018
    DriverVersion    REG_SZ    1.8.10.0
    InfPath    REG_SZ    oem9.inf
    InfSection    REG_SZ    vmxnet3.ndis630.x64.ndi.NT
    IncludedInfs    REG_MULTI_SZ    machine.inf\0pci.inf
    MatchingDeviceId    REG_SZ    PCI\VEN_15AD&DEV_07B0
    EnableMonitorMode    REG_SZ    0
    *IfType    REG_DWORD    0x6
    *MediaType    REG_DWORD    0x0
    *PhysicalMediaType    REG_DWORD    0xe
    BusType    REG_SZ    5
    Characteristics    REG_DWORD    0x84
    *SpeedDuplex    REG_SZ    0
    *PriorityVLANTag    REG_SZ    3
    *JumboPacket    REG_SZ    1514
    *InterruptModeration    REG_SZ    1
    OffloadVlanEncap    REG_SZ    1
    EnableWakeOnLan    REG_SZ    1
    *WakeOnPattern    REG_SZ    1
    *WakeOnMagicPacket    REG_SZ    1
    *IPChecksumOffloadIPv4    REG_SZ    3
    *TCPChecksumOffloadIPv4    REG_SZ    3
    *UDPChecksumOffloadIPv4    REG_SZ    3
    OffloadIpOptions    REG_SZ    1
    OffloadTcpOptions    REG_SZ    1
    EnableAdaptiveRing    REG_SZ    1
    *TCPChecksumOffloadIPv6    REG_SZ    3
    *UDPChecksumOffloadIPv6    REG_SZ    3
    *LsoV1IPv4    REG_SZ    1
    *LsoV2IPv4    REG_SZ    1
    *LsoV2IPv6    REG_SZ    1
    *RSS    REG_SZ    0
    IfTypePreStart    REG_DWORD    0x6
    NetworkInterfaceInstallTimestamp    REG_QWORD    0x1d5806e43d780ff
    InstallTimeStamp    REG_BINARY    E3070A0006000C0002000D000F003700
    DeviceInstanceID    REG_SZ    PCI\VEN_15AD&DEV_07B0&SUBSYS_07B015AD&REV_01\FF290C00DDE810FE00
    ComponentId    REG_SZ    PCI\VEN_15AD&DEV_07B0
    NetCfgInstanceId    REG_SZ    {0C2A9A0D-3001-4519-B5FC-3CF3B8A006A2}
    NetLuidIndex    REG_DWORD    0x8001
    *RscIPv4    REG_SZ    1
    *RscIPv6    REG_SZ    1
    CoalSchemeMode    REG_SZ    4
@fyodor
Copy link
Member

@fyodor fyodor commented Jan 16, 2020

Hi Alexey. Thanks for the very detailed report. We're investigating now how this could happen and of course what can be done to fix it!

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Jan 21, 2020

Thanks again for the report. I have a few questions to help us narrow down what might be happening, since we have not yet been able to reproduce the issue here:

  1. What kind of network traffic is going on or typical in the cases where the issue occurs? That is, is it many small packets, fewer large packets, etc.?
  2. Does WaitForMultipleObjects stop returning WAIT_OBJECT_0? In other words, does the event still get signaled even if pcap_dispatch() does not retrieve any packets, or does it consistently time out or return WAIT_FAILED?
  3. Is the pcap handle (pcap_t) shared between multiple threads or processes?
  4. Are any packets dropped before pcap_dispatch() stops delivering packets? That is, does ps_recv increase without a corresponding increase in ps_drop for any amount of time after the first increase to ps_drop?

Thanks for any additional info you can provide.

@akontsevoy
Copy link
Author

@akontsevoy akontsevoy commented Jan 30, 2020

3] No, the pcap handle is used from only one thread in the process (although it's not the main thread).
2] No, the event keeps getting signalled when new packets arrive and WaitForMultipleObjects() keeps returning WAIT_OBJECT_0, but all subsequent calls to pcap_dispatch() return 0 and don't call the handler. By the way, we never call pcap_breakloop() from the handler.
4] After pcap_dispatch() stops delivering packets, ps_recv keeps increasing despite pcap_dispatch() constantly returning 0; ps_drop is initially 0 at that point. At some point (presumably when the internal buffer fills), ps_drop becomes non-0 and starts increasing.
1] It's hard to tell; the last packets successfully captured are always small packets, from what I can tell -- either 60, 61, or 171, or 200-something bytes -- much less than the MTU; and there is always a bunch of them -- like 4-20 in that last non-0 pcap_dispatch(). But this could just be a coincidence. Yet, if we divide the internal capture buffer size (3000000) by [(ps_recv when ps_drop starts increasing) - (ps_recv when this condition first triggers)] to obtain the average packet size in the full capture buffer -- that average is always over 2000. So maybe it is the jumbo frames that we trip on. However, I've seen frames as large as 66546 bytes (pcap_pkthdr::caplen == pcap_pkthdr::len) successfully captured. Not sure why that happens either, as we explicitly call pcap_set_snaplen(65535) -- so that in theory pcap_pkthdr::caplen should not be larger than 65535 (as far as I understand).

Additional observations which may or may not be helpful:
a) Our handler function transports the captured data on the network, using a socket on the same network adapter from which we capture. We use a BPF to prevent the transported packets from being captured again (thus avoiding a feedback loop), but maybe the fact that the handler function potentially calls out into the network driver (albeit indirectly) plays a role here.
b) The capture process is running with low priority; however, we tried resetting it to normal and it did not help.
c) Even on healthy hosts (not subject to this issue), every now and then it happens that the handle event is signalled (WaitForMultipleObjects() returns WAIT_OBJECT_0) but the subsequent pcap_dispatch() returns 0, indicating no data to capture. (But on healthy hosts, subsequent calls to pcap_dispatch() return data.) As far as I understand, this should not be happening. Perhaps the event gets signalled before BPF is applied? This is preventing me from generically applying the workaround I had in mind (if the event is signalled but no data is captured, close and reopen the handle).

@guyharris
Copy link

@guyharris guyharris commented Jan 30, 2020

c) Even on healthy hosts (not subject to this issue), every now and then it happens that the handle event is signalled (WaitForMultipleObjects() returns WAIT_OBJECT_0) but the subsequent pcap_dispatch() returns 0, indicating no data to capture. (But on healthy hosts, subsequent calls to pcap_dispatch() return data.) As far as I understand, this should not be happening. Perhaps the event gets signalled before BPF is applied? This is preventing me from generically applying the workaround I had in mind (if the event is signalled but no data is captured, close and reopen the handle).

Capture mechanisms that do buffering of packets, with a timeout to keep packets from remaining buffered for too long (because the incoming packet rate is low so that the buffer takes a long time to fill up), have two sorts of timeout:

  1. timeouts that expire only if there's at least one packet in the buffer (e.g., because the timer starts when the first packet is put in the buffer);

  2. timeouts that expire even if there are no packets in the buffer (e.g., because the timer starts when an attempt is made to read from the buffer).

For capture mechanisms with the second type of timer, pcap_dispatch() may return 0, because the timer timed out when there were no packets in the buffer.

As I remember, the WinPcap and Npcap NPF driver is the second type, so you may get 0 packets from a wakeup because the timer went off and no packets had arrived since the last time the buffer was read.

@guyharris
Copy link

@guyharris guyharris commented Jan 30, 2020

1] ... However, I've seen frames as large as 66546 bytes (pcap_pkthdr::caplen == pcap_pkthdr::len) successfully captured. Not sure why that happens either, as we explicitly call pcap_set_snaplen(65535) -- so that in theory pcap_pkthdr::caplen should not be larger than 65535 (as far as I understand).

At least as I read the bpf_filter() call and the "Add MDLs" loop in NPF_TapExForEachOpen() in packetWin7\npf\npf\Read.c, the code 1) shouldn't copy more bytes than the return value of bpf_filter() and 2) should set the code to the total number of bytes copied; given that the BPF compiler should generate code to return the snapshot length for packets that match the filter, and should always provide a filter program even if it's just a trivial program doing a ret instruction returning the snapshot length, that shouldn't happen.

@guyharris
Copy link

@guyharris guyharris commented Jan 30, 2020

3] No, the pcap handle is used from only one thread in the process (although it's not the main thread).

So in that thread, are you in a loop that just calls pcap_dispatch(), with the pcap_t not being in non-blocking mode, or are you in an event loop that waits for multiple objects, including the event handle for the pcap_t, calling pcap_dispatch() if the event handle was signaled or something such as that?

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Jan 30, 2020

There's a lot of information here to go through. This in particular intrigues me:

1] It's hard to tell; the last packets successfully captured are always small packets, from what I can tell -- either 60, 61, or 171, or 200-something bytes -- much less than the MTU; and there is always a bunch of them -- like 4-20 in that last non-0 pcap_dispatch(). But this could just be a coincidence.

If there's some condition that is causing the buffer to fill up or the driver to think that the buffer is filling up, then it would start dropping packets that are too large for the remaining space in the buffer. This would lead to smaller and smaller packets being captured until they too fill up the buffer and there is no remaining space.

This is just a hunch at this point. I need to read through the code again with this idea in mind.

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Jan 30, 2020

I haven't been able to figure out anywhere where we might be losing account of free space remaining. I'm guessing it has something to do with calculating how much space a packet takes up when writing it to the buffer or how much space to free up when reading a packet out of the buffer. So instead of constantly incrementing and decrementing the free space counter, I'm going to try a change that calculates the free space based on the size of the buffer and the positions of the consumer and producer pointers. We'll have to acquire a lock on the buffer in the Read handler in order to ensure we don't calculate based on an outdated position, but that's probably best anyway to avoid concurrency issues if some software somewhere is sharing an adapter handle between multiple threads.

Please let us know if the issue persists in the next release.

@akontsevoy
Copy link
Author

@akontsevoy akontsevoy commented Jan 30, 2020

There's a lot of information here to go through. This in particular intrigues me:

1] It's hard to tell; the last packets successfully captured are always small packets, from what I can tell -- either 60, 61, or 171, or 200-something bytes -- much less than the MTU; and there is always a bunch of them -- like 4-20 in that last non-0 pcap_dispatch(). But this could just be a coincidence.

If there's some condition that is causing the buffer to fill up or the driver to think that the buffer is filling up, then it would start dropping packets that are too large for the remaining space in the buffer. This would lead to smaller and smaller packets being captured until they too fill up the buffer and there is no remaining space.

This is just a hunch at this point. I need to read through the code again with this idea in mind.

But if what you said were the case and we thought the buffer was full, wouldn't ps_drop start increasing immediately after the condition triggers? As it stands, we see ps_recv increase by several hundreds, or thousands (3 MB buffer) after the problem appears, and before we see ps_drop increase. That would suggest we are having problems reading from the buffer, not writing to the buffer (why is pcap_dispatch() returning 0, when there are obviously things there we can read). Without knowing much else, but given the fact that we sometimes capture packets larger than snaplen (which shouldn't be happening), I'd look for potential buffer overwrites or overreads.

Again, without knowing anything else, if you are using a circular buffer as your response seems to suggest, I'd double check the edge/overflow conditions:

  1. Are the cursors adjusted correctly when an incoming packet aligns exactly with the buffer boundaries?
  2. What happens if an incoming packet would write past the current read position (write cursor overruns the read cursor)?
  3. What happens if an incoming packet is larger than the full buffer (write cursor overruns both the read cursor and the write cursor)?

P.S. As part of troubleshooting I also tried to remove the pcap_setmintocopy() call, but the problem persisted.

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Feb 4, 2020

Npcap 0.9987, released today, includes the above commit that may address this issue. Please let us know one way or the other so we know if we need to continue to investigate this issue.

@akontsevoy
Copy link
Author

@akontsevoy akontsevoy commented Feb 7, 2020

Instructed one of our users to try the new version, will confirm one way or another.

@akontsevoy
Copy link
Author

@akontsevoy akontsevoy commented Apr 6, 2020

Unfortunately, a deployment of version 0.9989 on one of the affected machines indicates that the problem still persists -- with roughly the same symptoms.

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Apr 7, 2020

@akontsevoy Thanks for letting us know. I'll take another look.

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Apr 7, 2020

Ok, here's an interesting thing: the processors listed appear to be Intel XEONs with 56 cores each. Npcap has some weirdness surrounding number of processors (#1967) that might explain the problem here:

The kernel capture buffer for an instance is split into g_NCpu segments, which is the return value from NdisSystemProcessorCount(). This function is documented on another page as only returning "the processor count for processor group 0." For the sake of argument, let's guess that the number is 56, meaning all the cores on processor 0. So Npcap will split the buffer into 56 segments, and whenever a thread is scheduled to deal with incoming packets, it will try to write to the segment corresponding to its processor number.

HOWEVER! In order to determine the processor number, Npcap uses KeGetCurrentProcessorNumberEx(), which returns the systemwide processor index of the current processor. So if the thread is scheduled on core 10 of processor 1, the index would be 55+10=65, which is greater than the number of segments the buffer is split into.

So why do we not see a BSoD due to buffer overrun? Well, each OPEN_INSTANCE has an array of 128 CpuPrivateData structures to keep track of position in the buffer segment and statistics. It is 0-initialized, including the Free counter. So when the thread grabs CpuPrivateData number 65, it's there and it claims that there is no space for the packet, so we record a drop (in a place that won't be checked) and signal the Read event to tell the application to clear out some space. But of course there's nothing that reading more packets will do to solve things as long as the thread remains scheduled on a core without a corresponding buffer segment. This is why you are likely seeing WAIT_OBJECT_0 but not getting any packets: a packet drop happened, but there weren't any new saved packets to clear out. So you can definitely reopen the handle in this case because it's a clear case that something went wrong.

Unfortunately, none of this explains why the ps_drop counter keeps increasing. The stats gathering routines only reference CpuPrivateData structures less than g_NCpu, so the count should only increase when there's a valid buffer for that thread. I'm still thinking about that one, and I don't see any bugs in the way the sequence numbers are written/read from the buffer.

@akontsevoy
Copy link
Author

@akontsevoy akontsevoy commented Apr 8, 2020

@dmiller-nmap Yes, I thought too this issue might be related, but it doesn't appear to be. Here's the output of wmic.exe get cpu from an affected system:

AddressWidth  Architecture  AssetTag  Availability  Caption                               Characteristics  ConfigManagerErrorCode  ConfigManagerUserConfig  CpuStatus  CreationClassName  CurrentClockSpeed  CurrentVoltage  DataWidth  Description                           DeviceID  ErrorCleared  ErrorDescription  ExtClock  Family  InstallDate  L2CacheSize  L2CacheSpeed  L3CacheSize  L3CacheSpeed  LastErrorCode  Level  LoadPercentage  Manufacturer  MaxClockSpeed  Name                                       NumberOfCores  NumberOfEnabledCore  NumberOfLogicalProcessors  OtherFamilyDescription  PartNumber  PNPDeviceID  PowerManagementCapabilities  PowerManagementSupported  ProcessorId       ProcessorType  Revision  Role  SecondLevelAddressTranslationExtensions  SerialNumber  SocketDesignation  Status  StatusInfo  Stepping  SystemCreationClassName  SystemName  ThreadCount  UniqueId  UpgradeMethod  Version  VirtualizationFirmwareEnabled  VMMonitorModeExtensions  VoltageCaps
64            9                       3             Intel64 Family 6 Model 79 Stepping 1                                                                    1          Win32_Processor    2600               33              64         Intel64 Family 6 Model 79 Stepping 1  CPU0                                                2                    0                          0            0                            6      1               GenuineIntel  2600           Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz  3                                   3                                                                                                        FALSE                     1FABFBFF000406F1  3              20225     CPU   FALSE                                                  CPU socket #0      OK      3                     Win32_ComputerSystem     <redacted>                         4                       FALSE                          FALSE                    2
64            9                       3             Intel64 Family 6 Model 79 Stepping 1                                                                    1          Win32_Processor    2600               33              64         Intel64 Family 6 Model 79 Stepping 1  CPU1                                                2                    0                          0            0                            6      4               GenuineIntel  2600           Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz  3                                   3                                                                                                        FALSE                     1FABFBFF000006F1  3              20225     CPU   FALSE                                                  CPU socket nmap/nmap#1      OK      3                     Win32_ComputerSystem     <redacted>                         4                       FALSE                          FALSE                    2

This is running on VMware as far as I know. The underlying hardware may well have 56 logical CPUs, but this particular VM has only 6 logical CPUs over 2 virtual sockets -- nothing extraordinary.

Besides, if what you say were the case, the issue would have been intermittent and resulted in a fixed percentage of capture loss -- after the capture stops because the thread was scheduled on a high-numbered logical CPU, eventually that thread would be scheduled on a low-numbered CPU again, and the capture would resume. And this never happens -- once the capture stops, it never resumes until I close and reopen the pcap handle.

I'm not denying this is a problem, but it doesn't seem to be the problem. On a side note, old WinPcap is known to BSOD on AWS machines with more than 16 or 32 cores. I wonder if that problem is analogous to what you're describing here (memory arena optimization gone sideways).

In your place, the next thing I'd turn my attention to is the fact that sometimes pcap_dispatch() receives frames whose caplen is greater than the configured snaplen (in my case 65535). From what I understand, this should never happen -- yet it does. So I would still suspect a buffer overwrite or overread here somewhere.

As for the suggested workaround, as I mentioned before it doesn't work reliably -- even on healthy systems, sometimes the event gets set, yet pcap_dispatch() returns 0 despite never calling pcap_breakloop() or throwing exceptions. And by the way, with old WinPcap (which we still have to support because our software needs to support Windows versions prior to 7), this happens even more frequently than with Npcap. The only workaround I found that seems to work more or less reliably (avoids false positives) is, after the event fires and pcap_dispatch() returns 0, to also call pcap_stats() and check if ps_drop has increased since the prior call of pcap_dispatch(). If it did (that is, we dropped packets but the last call to pcap_dispatch() delivered nothing), then we can be sure something is wrong.

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Apr 8, 2020

Thanks for the additional insight. I'm going ahead with a fix for nmap/nmap#1967 that reorganizes a lot of the internals of the ring buffer, and I'm adding some additional assertions for our testing to check for boundary conditions reading and writing from the ring buffer. Between the two, I think we'll eliminate this issue. One thing to note about my earlier problem description is that there is a difference between the maximum number of processors and the active number of processors on systems that support hot-add of CPU cores, like some VMware systems. There are just too many problems with the current mishmash of Ke* and Ndis* functions in use to determine CPU core numbers.

The new approach is pretty exciting, as I've found several ways to reduce the time spinlocks are held and to improve the utilization of the ring buffer. Still working on it, and of course it'll require significant testing.

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented May 5, 2020

Npcap 0.9991 changes most of the code affecting this issue. We no longer keep per-CPU state, so the number of CPU cores should not have any bearing on how Npcap functions. Only a single thread is responsible for writing to the buffer, and free space is updated using interlocked-exchange functions, so there should be no reason for it to drift. Please let us know if the problem appears fixed so that we can close this issue.

@fyodor fyodor transferred this issue from nmap/nmap May 20, 2020
@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Jun 15, 2020

Npcap 0.9991 had some problems, but Npcap 0.9994 seems very stable and fast, and more importantly for this issue it changes all the code related to counting drops and free space. Any misbehavior related to capture stats in newer versions will be a completely separate issue, so we will close this one. Please let us know if there are any further problems.

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Oct 14, 2020

We received a private report of this issue also affecting Npcap 0.9997 (and by extension Npcap 1.00, since the driver code is essentially the same between the two). We have not received any confirmation of a fix, so I am copying the diagnostic questionnaire I provided via email here. If anyone is still experiencing this issue in Npcap 0.9997 or newer, please fill out as much information as possible:

It is difficult to begin diagnosing this without more complete information about the capture. Specifically, to model the behavior of Npcap and identify a bug, I need to know the following:

Capture parameters:

  • Kernel buffer size (parameter to pcap_set_buffer_size() or non-portable pcap_setbuff())
  • MinToCopy value (parameter to pcap_setmintocopy())
  • User buffer size (parameter to pcap_setuserbuffer())
  • Read timeout (parameter to pcap_set_timeout())
  • Blocking/Non-blocking mode (parameter to pcap_setnonblock())
  • Immediate mode (parameter to pcap_set_immediate_mode())
  • BPF filter (parameter to pcap_set_filter())

Information about how captured packets are processed:

  • Are you using pcap_next(), pcap_next_ex(), pcap_dispatch(), or pcap_loop()? What parameters?
  • Are you sharing the capture handle (pcap_t) between multiple threads or processes? None of the above functions are reentrant.
  • Are you modifying the captured packet data in-place, including modifications to the bpf_hdr or writes past the end of the captured data?
  • Are there any other capture handles running at the time, such as Wireshark or Nmap or other software in use concurrently? You can run "NPFInstall.exe -check_dll" to determine this.

Information about the packets being captured:

  • Can you compare the number of received packets (ps_recv) minus the number of dropped packets (ps_drop) to the number of packets processed by your application? If they are diverging, your application is not keeping up with the network traffic rate and dropped packets are inevitable.
  • What are the packets like immediately before they begin to be dropped? Are they larger than normal, smaller than normal, or ordinary? Is some traffic conspicuously absent even if pcap_stats() does not report any dropped packets yet?

Information about the system state when packets are being dropped:

  • When packets are being dropped, what is the return value of the pcap_next/dispatch/loop function? What is returned by pcap_geterr()?
  • What is the system memory and CPU usage?
  • Is there any observable network interruption or delay during this time?

Troubleshooting steps:

  • If you reapply the current BPF filter with pcap_setfilter() without restarting the capture, do packets begin to be received again?
  • If you begin another capture with the same parameters without closing the first one, does the second capture receive packets or are they dropped?
  • If you have a consistent reproducible bug on a non-production system, enable Driver Verifier for npcap.sys, reboot, and try again. This can provoke a bugcheck (BSoD) in some cases, which makes finding the error much easier. Standard settings are sufficient.

Thanks for any information you can provide.

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Oct 27, 2020

I believe this issue may not have been related to the kernel driver code that was rewritten, but may instead be a bad interaction between the snaplen and user buffer size. I've opened the-tcpdump-group/libpcap#975 to address part of the issue, but we will need to investigate solutions here. Meanwhile, here is my description of the problem as I see it, along with some workarounds for programs using the Npcap API:

  1. libpcap (wpcap.dll) enforces a maximum snaplen of MAXIMUM_SNAPLEN, defined in pcap-int.h as 262144 bytes (0x40000).
  2. libpcap also sets a default user buffer (used to communicate between Packet.dll and the npcap.sys driver) size of 256000.
  3. The default kernel buffer size is 1MB.
  4. A packet arrives that is 255983 or more bytes. This packet plus the struct bpf_hdr of 18 bytes will require 256001 bytes in each buffer.
  5. There is room in the kernel buffer, so the packet is added to that buffer.
  6. When pcap_next_ex() is called, it in turn calls PacketReceivePacket() with a buffer of 256000 bytes.
  7. The driver's Read handler retrieves the packet from the kernel buffer, compares its length to the user-supplied buffer, determines it will not fit, and returns it to the kernel buffer.
  8. All subsequent read requests will fail in the same way, because there is no way for the packet to leave the kernel buffer.
  9. Since the kernel buffer is a queue and the first packet cannot be dequeued, it fills up and packets are dropped.

Potential solutions:

  1. Npcap driver checks whether the first packet exceeds the size of the user buffer and truncates it to fit. Pro: no workaround needed. Cons: modifies captured traffic. May result in a short read just prior to such a packet being read.
  2. libpcap silently increases user buffer size to accommodate maximum storage for the current snaplen. Pro: works with any version of npcap driver. Con: minor (6126 bytes) increase in application memory use per pcap handle.
  3. libpcap silently decreases the current snaplen to fit within the user buffer size. Cons: silent modification of capture behavior. Incompatibility with behavior on other platforms.
  4. Warn or error in pcap_activate_npf() if user buffer size cannot accommodate the current snaplen. Pro: no silent change in behavior. Con: requires programmers to encounter the error and modify their code.

Potential workarounds:

  1. Set user buffer using pcap_setuserbuffer() to Packet_WORDALIGN(snaplen + sizeof(struct bpf_hdr));
  2. Set snaplen to a smaller value such as 65535 or at most user buffer size - sizeof(struct bfp_hdr) - sizeof(int);

@dmiller-nmap dmiller-nmap reopened this Oct 27, 2020
@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Oct 27, 2020

Further information from the user reporting this issue in Npcap 0.9997: the issue appeared after receiving 2 frames of 65775 and 65859 bytes. This contradicts my analysis above, though I still believe that to be an issue. We will continue to investigate.

@guyharris
Copy link

@guyharris guyharris commented Oct 27, 2020

libpcap silently increases user buffer size to accommodate maximum storage for the current snaplen. Pro: works with any version of npcap driver. Con: minor (6126 bytes) increase in application memory use per pcap handle.

$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.7
BuildVersion:	19H2
$ sysctl debug.bpf_maxbufsize
debug.bpf_maxbufsize: 524288

And pcap-bpf.c 1) defaults to the maximum BPF buffer size and 2) makes the user buffer the same size as the kernel buffer, so we already have user buffers bigger than 256000. An extra 6KB isn't worth worrying about here.

(It used to use, as I remember, the default kernel buffer size:

$ sysctl debug.bpf_bufsize
debug.bpf_bufsize: 4096

That made sense in the early 1990's, when BPF was first introduced - back in 1989, I seem to remember that Sun debated whether 4MB or 8MB was the right minimum memory size for a SPARCstation 1 - but memory sizes have gotten a LOT bigger since then. I cranked it up when a coworker in the remote file system group at Apple got Apple to set the default snaplen for tcpdump to the maximum - 65535 at the time - so that captures without -s would get the full packet by default, which we needed for AFP/SMB/NFS, and then somebody else at Apple complained that this caused too many dropped packets; I took at look at the code and decided that 1992 called and they wanted their memory sizes back. :-))

Is there any reason not to have the user buffer be >= the kernel buffer in size? Is there any reason to have it be > the kernel buffer in size? Should it be a (fixed?) multiple of the kernel buffer size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants