Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Npcap 0.9995 leaks NonPagedPool, causes system hang #213

Closed
dmiller-nmap opened this issue Jul 24, 2020 · 4 comments
Closed

Npcap 0.9995 leaks NonPagedPool, causes system hang #213

dmiller-nmap opened this issue Jul 24, 2020 · 4 comments
Labels
bug current

Comments

@dmiller-nmap
Copy link
Contributor

@dmiller-nmap dmiller-nmap commented Jul 24, 2020

The memory tag "4321" (represented in code as '1234', a.k.a. NPF_ALLOC_TAG) is leaking about 5% of allocations, which leads to an eventual out-of-memory system hang in high-traffic or long-running captures. We have had one private report, and it is also mentioned in this comment: #206 (comment)

The memory tag in question is used in 4 places:

  1. For the NetBufferListPool NPCAP_FILTER_MODULE::PacketPool, which is used for packet injection and loopback capture.
  2. For the NetBufferPool NPCAP_FILTER_MODULE::TapNBPool, which is used to copy captured packet data.
  3. For cloning OID requests that pass through NPF_OidRequest() LWF callback.
  4. For allocating extra buffer space for packets that are larger than will fit into the default NetBuffers from TapNBPool (NPF_NBCOPY_INITIAL_DATA_SIZE, currently 256 bytes).

Code for cases 1 and 3 have not changed in recent releases, so they can most likely be ruled out. Allocations for 2 and 4 are both freed in NPF_FreeNBCopies(), which is a callback invoked from NPF_ObjectPoolReturn(), and which is implicated in the crash reported in #212, which currently looks like a double-free. This function is called based on a reference-counting system that is intended to be safe from concurrency problems, but obviously something is wrong.

@dmiller-nmap dmiller-nmap added bug current labels Jul 24, 2020
@dmiller-nmap
Copy link
Contributor Author

@dmiller-nmap dmiller-nmap commented Jul 24, 2020

My current plan is to redefine the allocation tags to be more unique so we can better understand which part is leaking. I have already added additional assertions and will be making a debug build to hopefully catch an inconsistency earlier. Other changes have already been put in place such as a memory barrier to avoid possible concurrency issues due to instruction reordering (cc788c9). I will also be testing on a multi-core system to better exercise the concurrency controls, since prior to this our testing has been done on single-core virtual machines.

@dmiller-nmap
Copy link
Contributor Author

@dmiller-nmap dmiller-nmap commented Jul 24, 2020

Private report of pool tracking shows we are looking at case 4.

@dmiller-nmap
Copy link
Contributor Author

@dmiller-nmap dmiller-nmap commented Jul 24, 2020

Unsure if this is the underlying case, but it certainly looks suspicious. This allocation is always accompanied by an MDL allocation, but only the buffer seems to be leaking. If under low-resources conditions we fail to map the system address for this buffer, we are unable to free it, which could lead to a runaway out-of-resources condition.

dmiller-nmap pushed a commit that referenced this issue Jul 29, 2020
For packets less than 256B, NDIS's NetBufferPool is efficient enough.
For larger ones, we need to allocate more buffer space and MDLs, and we
were freeing those as soon as we were done; very inefficient. Instead,
we'll put them on a stack and pull them back off when we need them. A
separate thread will clear the stack when all capture handles are
closed.
Related to #213	because by avoiding frequent small (257 to 1258 bytes)
allocations we reduce the chance of heap fragmentation. Also, this
changes pretty much all the code implicated in that issue, so if it
persists it's something new.
@dmiller-nmap
Copy link
Contributor Author

@dmiller-nmap dmiller-nmap commented Aug 26, 2020

This issue was partially addressed in Npcap 0.9996 and completely resolved in Npcap 0.9997.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug current
Projects
None yet
Development

No branches or pull requests

1 participant