Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncat transfer rate incredibly slow #1026

Open
Vhati opened this issue Oct 4, 2017 · 38 comments
Open

ncat transfer rate incredibly slow #1026

Vhati opened this issue Oct 4, 2017 · 38 comments

Comments

@Vhati
Copy link

Vhati commented Oct 4, 2017

I first noticed incredibly slow rates across my LAN a while back, but lazily switched tools and forgot about the bug - assuming it'd be obvious and get fixed eventually. Oops.

I finally decided to run some loopback tests to report it properly.

ncat -l --recv-only 5000 > out.bin
ncat --send-only 127.0.0.1 5000 < in.bin

Transfer times (via a wristwatch) on Windows 7, for an arbitrary a 9MB file, with matching versions of ncat.
6.40 sending to 6.40 - Instantaneous
6.45 sending to 6.45 - Four minutes!
Every released version thereafter* (until 7.50) is also slow.

If I mix versions to tease apart whether sending or receiving is the bottleneck...
6.40 sending to 6.45 - Instantaneous
6.45 sending to 6.40 - Four minutes
So 6.45 can receive quickly, but it is slow to send.

@Vhati
Copy link
Author

Vhati commented Oct 4, 2017

Mixed test for the latest working release.
6.40 sending to 7.50 - Instantaneous
7.50 sending to 6.40 - Four minutes
Same problem, slow to send.

@dmiller-nmap
Copy link

possibly related #1025

@dmiller-nmap
Copy link

It sounds like it only affects the client/connect code, not the server/listen code, right? Can you try running the transfer in the opposite direction and let us know the times when 7.50 is the client and when 6.40 is the client? Example:

ncat -l --send-only 5000 < in.bin
ncat --recv-only 127.0.0.1 5000 > out.bin

I'm going to try it here on Linux and see if I get similar results.

@dmiller-nmap
Copy link

Current ncat on Linux does not have this issue with localhost. Have not tested over the network yet.

@hdoreau The only change in the Changelog between 6.40 and 6.45 that might affect this is r31737 (853aaff), Manage expiration timers via a heap queue. Can you think of any reason this would have bad performance on Windows?

@Vhati
Copy link
Author

Vhati commented Oct 4, 2017

Can you try running the transfer in the opposite direction and let us know the times

ncat750 -l --send-only 5000 < in.bin
ncat640 --recv-only 127.0.0.1 5000 > out.bin
Four minutes

ncat640 -l --send-only 5000 < in.bin
ncat750 --recv-only 127.0.0.1 5000 > out.bin
Instantaneous

@Vhati
Copy link
Author

Vhati commented Oct 4, 2017

Unless I'm mixed up, I think 7.50 is just slow sending, regardless of whether it connected/listened.

@Vhati
Copy link
Author

Vhati commented Oct 4, 2017

I tried piping an endless stream.
python -c "while(True): print 'z'" | ncat750 -l --send-only 5000
ncat640 --recv-only 127.0.0.1 5000 > out.bin
Slow
(spot checked growth of out.bin, ex: being 64k after a second)

Whether it's a filesystem redirect or a pipe, sending is still slow.

@hdoreau
Copy link

hdoreau commented Oct 5, 2017

Hello, I cannot think of anything in the switch from lists to heap in timer management that would cause such a slowdown, esp. since the problem appears to be windows-specific.

I have no windows machine to test on. Vhati, can you reproduce both normal & slow cases with nsock logging level set to the maximum? I believe this may help...

@Vhati
Copy link
Author

Vhati commented Oct 5, 2017

@hdoreau

can you reproduce both normal & slow cases with nsock logging level set to the maximum?

I wasn't sure what you meant, so I used ncat's -vvvv arg.
Logs are zipped and attached below.

Create a 256 KB file.
Still noticably slow at 7 seconds but less log spam.

python -c "print 'z' * (1024*256)" > in.bin

Test 01 (Transfer from 6.40 to 6.40, sender connects, fast):

ncat640 -vvvv -l --recv-only 5000 > out.bin 2>test_01-listen.txt

ncat640 -vvvv --send-only 127.0.0.1 5000 < in.bin 2>test_01-connect.txt

Test 02 (Transfer from 7.50 to 7.50, sender connects, slow):

ncat750 -vvvv -l --recv-only 5000 > out.bin 2>test_02-listen.txt

ncat750 -vvvv --send-only 127.0.0.1 5000 < in.bin 2>test_02-connect.txt

Test 03 (Transfer from 6.40 to 7.50, sender connects, fast):

ncat750 -vvvv -l --recv-only 5000 > out.bin 2>test_03-listen.txt

ncat640 -vvvv --send-only 127.0.0.1 5000 < in.bin 2>test_03-connect.txt

Test 04 (Transfer from 6.40 to 7.50, sender listens, fast):

ncat640 -vvvv -l --send-only 5000 < in.bin 2>test_04-listen.txt

ncat750 -vvvv --recv-only 127.0.0.1 5000 > out.bin 2>test_04-connect.txt

Test 05 (Transfer from 7.50 to 6.40, sender listens, slow):

ncat750 -vvvv -l --send-only 5000 < in.bin 2>test_05-listen.txt

ncat640 -vvvv --recv-only 127.0.0.1 5000 > out.bin 2>test_05-connect.txt

Summary for comparison...
7.50 slow as listening-receiver in test 02.
7.50 fast as listening-receiver in test 03.
7.50 fast as connecting-receiver in test 04.

7.50 was slow as connecting-sender in test 02.
7.50 was slow as listening-sender in test 05.
7.50 didn't have any fast sender log to compare.

ncat_issue_1026_transfer_rate_tests.zip

@hdoreau
Copy link

hdoreau commented Oct 5, 2017

thanks, I would need exactly this but with -vvvvvvv (that's 7 v's) instead so as to get as much information as possible from nsock.

@Vhati
Copy link
Author

Vhati commented Oct 5, 2017

@hdoreau

I would need exactly this but with -vvvvvvv (that's 7 v's) instead

Ha! Okay, here ya go.

ncat_issue_1026_transfer_rate_tests_verbose.zip

@dmiller-nmap
Copy link

My first observation is that timeouts are reported in an odd way. Note that the READ events have very odd timeout values:

libnsock nsock_pool_add_event(): NSE #18: Adding event (timeout in 327100488ms)

This is because in event_new, if the timeout passed in is -1, then the event's timeout is not changed from 0. The nsock_pool_add_event and process_event functions simply subtract the current time from the timeout time. On Linux, this ends up with a large negative value; I'm not sure why it's positive on Windows. The actual timeout check in event_timedout checks for a 0 timeout, so everything works fine, but the debug output is wrong.

Sorry for the distraction, but I hadn't found anything relevant to the transfer speed yet.

@Vhati
Copy link
Author

Vhati commented Feb 18, 2019

ncat 7.70 is affected.
No new info to report. Just confirming the bug hasn't gone away.

I subjected the current version to Test 02. Same OS, same payload.
(Transfer from 7.70 to 7.70, sender connects, slow)

Ncat: 262146 bytes sent, 0 bytes received in 7.36 seconds.

Logs look the same, including having weirdly huge timeouts, through the timeout values varied (29672103ms).

@benrg
Copy link

benrg commented Jul 30, 2019

I am having the same problem sending from a Windows 10 machine to a Windows 7 machine, with very similar speeds. Some observations:

  • Downgrading to version 6.40 did NOT fix the problem. Everything below applies to 7.70 and 6.40 (the only two versions I tested).

  • Replacing ncat on the sending side with a trivial Python script that calls stdin.read and sock.sendall in a loop, while still using ncat on the receiving side, did fix the problem.

  • Using ncat on the sending side, and replacing the receiving side with a Python script that records the time of each received packet, I see that I'm getting exactly one 4608-byte packet every 125ms (±0.3ms). This can't be an accident.

  • 9 MiB / (4608 B / 125 ms) = 4 minutes and 16 seconds, which is suspiciously close to the 4 minutes that Vhati said it took to send that 9 MB file.

@tjoanp
Copy link

tjoanp commented Nov 16, 2019

My experience is that 6.40 on the connecting side works flawlessly, and 6.45 and upwards is slow.

I ran some tests on localhost (Win10 1903) using 6.40, 6.45 and 7.80 to compare. Run 1, 2 and 3 uses 6.40 to connect, all of them transfers a 1 MB file in around 0.15 seconds. All other tests took around 31 seconds.

Please let me know if there are other tests and logs that might help taking this issue to a solution.

ncat_transfer_tests.zip

@tjoanp
Copy link

tjoanp commented Oct 17, 2020

Just tried 7.91, and the issue is still not resolved.

Receiver: ncat -vvvvvvv -l -p 31337 > out 2>run10_ncat791_l_out.txt
Sender: ncat -4 -vvvvvvv --send-only localhost < in 2>run10_ncat791_send-only_in.txt

This is on Win10 2004 (10.0.19041.450)

Logfiles attached.
ncat791_transferlog_1MB_file.zip

@electron4692
Copy link

I have the same problem. PC1: Windows 10 1709. PC2: Windows 10 20H2.

nmap 6.40

PC1: ncat -luv 192.168.1.67 2023 > file
PC2: ncat --send-only -uv 192.168.1.67 2023 < file
0,4 sec for 8549 KByte -> 167 MBit/sec

PC2: ncat -luv 192.168.1.11 2023 > file
PC1: ncat --send-only -uv 192.168.1.11 2023 < file
0,4 sec for 8549 KByte -> 167 MBit/sec

nmap 7.91

PC1: ncat -l -u 192.168.1.66 2023 > file
PC2: ncat --send-only -u 192.168.1.66 2023 < file
42 seconds for 1,31 MB -> 32 KB/sec = 255 KBit/sec

It does not make a difference whether I start ncap as administrator or not. For my use-case the high data rate is necessary.

@pplupo
Copy link

pplupo commented May 31, 2021

I have the same problem here. :-( The only solution was to downgrade to 6.40. I have tried versions 6.45, 6.46, 6.47, 7.00, and 7.91. The problem definitely started after 6.40 and is still happening.

@tjoanp
Copy link

tjoanp commented Aug 26, 2021

Just tried 7.92. Still same slow speed.

$ ncat --send-only -v localhost < file
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connected to ::1:31337.
Ncat: 325958 bytes sent, 0 bytes received in 9.30 seconds.

@scriptjunkie
Copy link

I was trying to transfer a bunch of files between Windows boxes with ncat today. It's really slow. This has bugged me for years but usually the transfer just finishes and I go about my life. Today I had a really big transfer though. So while I was waiting for it to run, I started digging. The experiments in this bug report suggest it's a bug on the sending end. I started to look at the ncat source code and saw that rather than use a read/write loop like I'd expect, it's using an event-based loop. But after going like 4 functions down I tabled reading source. @benrg's comment seems very enlightening (fixed buffer every 125ms) so I'm assuming it's blocking on something for 125ms. I first looked at procmon traces but they didn't show me any promising call stacks, so I attached windbg while ncat.exe was still running, assuming breaking all threads when the debugger attaches at essentially a random time would probably land while the relevant thread is blocked and ncat is unnecessarily waiting. There are several threads blocked, so I pick the thread that I saw active in sending/receiving in the procmon events, assuming that's the relevant one. That thread is blocked in a wait call that traces back to a select(). Which is set for 0x1e848 timeout (125000us = 125ms)

Back to the source looking for select takes me to the fselect function in nbase: https://github.com/nmap/nmap/blob/master/nbase/nbase_misc.c#L446

I see it's waiting on reading from the socket, so I try the whole thing again with --send-only but still see the select called. This time there's no FD's in the read set, but the socket is still in the write set and exception set. And the call seems to take the full timeout before returning. Windows can't select on stdin so it's been excluded from the select and select does not quickly return since it does not see waiting data on stdin. Only after the select returns is waiting data on stdin checked, via PeekNamedPipe in win_stdin_ready.

So I opened up the memory window to confirm, and edited the instruction setting the timeout to write 0x00e848 (removing the one), resumed, and the transfer immediately began running at double its previous rate. I broke in again, edited out the next byte so the delay was only 0x48 (72us) and the transfer speeds absolutely skyrocketed, completing the transfer at likely 1000 times as fast as before.

SOOOOOO, as I see it we have two options:

  1. Change the timeout on the select on Windows to something very small, like 100us. CPU's are fast, this loop is pretty tight, and the Windows code isn't going to run on an rpi or something that's as power/load sensitive. Or change your timeout to a less extreme 50ms but make the buffer you read and send 100x bigger. But this still feels dirty.
  2. Move the win_stdin_ready first, before the select. It should not block (assuming another thread isn't doing something nuts with stdin, which they shouldn't be, and even if they did, you'd still end up calling it anyway) and if it returns true, run the select with the timeout to zero or near-zero instead of 125ms.

@fyodor
Copy link
Member

fyodor commented Jan 16, 2022

Thanks @scriptjunkie . That's useful data. We're a bit swamped with important Nmap (and included tools like Ncat) tasks, but I'm bumping up the priority on this one.

scriptjunkie added a commit to scriptjunkie/nmap that referenced this issue Jan 16, 2022
@scriptjunkie
Copy link

scriptjunkie commented Jan 17, 2022

I have a PR in (#2426 ) that fixes the issue, and a second (#2427 ) that includes an additional fix I needed (adding an explicit cast) to get it to build in VS 2019 which I was then able to build and test and verify it fixed the bug. Feel free to merge either, or treat them as 2 successive and separate merges in case you prefer the separation.

@pplupo
Copy link

pplupo commented Sep 8, 2022

Who do we need to reach out to see these PR merged?
@cldrn @fyodor @hsluoyz

@frederik-hoeft
Copy link

I have a PR in (#2426 ) that fixes the issue, and a second (#2427 ) that includes an additional fix I needed (adding an explicit cast) to get it to build in VS 2019 which I was then able to build and test and verify it fixed the bug. Feel free to merge either, or treat them as 2 successive and separate merges in case you prefer the separation.

Both of these PRs have been closed without merging. Is there any update on this? These slow transfer rates have been an issue for years now on windows.

@fyodor
Copy link
Member

fyodor commented Dec 8, 2022

Hi @frederik-hoeft . I think they were merged in commits 8dbefeb and c5a7c60 and thus they should be included in the next Nmap release. Am I missing something?

@frederik-hoeft
Copy link

@fyodor ahh okay nevermind, wasn't familiar to the way you guys merge branches, only saw that the PRs were both closed 🙈
Glad to see these improvements finally coming :)

@KirillYatsenko
Copy link

KirillYatsenko commented Mar 24, 2023

I've build the ncat statically with these commits, and it helped me.
The perfomance drastically improved on Windows 10.
Thanks guys!

@fyodor
Copy link
Member

fyodor commented Jul 20, 2023

Thanks for the update, folks. And especially thanks to everyone who contributed and evaluated and worked on the patches. I'm closing this issue since it seems to be resolved as of the recent Nmap 7.94 release.

@fyodor fyodor closed this as completed Jul 20, 2023
@celan69
Copy link

celan69 commented Jul 28, 2023

I can not confirm that the ncat from Nmap 7.94 on Windows does perform any better. For me, it maxes out at 4-8 MB/sec when I send from Windows to Solaris and I obtain full network speed (~1 GBit/sec) when I send from Solaris to Windows.

I even compiled NMap 7.94 from source and backed out commit ed19887, which was not part of the original suggestion by "scriptjunkie", but found no improvement.

@celan69
Copy link

celan69 commented Jul 28, 2023

In the case Windows->Solaris the packets sent over the wire are way too small (~44% between 40 and 79 bytes, ~40% between 640 and 1279 bytes and ~15% larger than 1280 bytes). The MTU is 1500. I will investigate further.

@celan69
Copy link

celan69 commented Jul 28, 2023

I see the same 125 ms interval between sends as seen before. Most likely this has the same root cause as already analyzed and it has nothing to do with the TCP stack. In fact, sending via UDP exhibits the same behaviour. But why does the provided fix (which is in the code in 7.94) not work?

@roelandschoukens
Copy link

sending data with ncat on Windows is still incredibly slow with 7.95, it is literally hundreds of times slower compared to other tools like scp — only achieving about 200 kB/sec to another computer on the LAN.

@celan69
Copy link

celan69 commented Jul 11, 2024

@roelandschoukens
Have a look at https://github.com/vi/websocat, it performs pretty good on windows and might fight your requirements.

@mweeksaurora
Copy link

sending data with ncat on Windows is still incredibly slow with 7.95, it is literally hundreds of times slower compared to other tools like scp — only achieving about 200 kB/sec to another computer on the LAN.

@roelandschoukens are you getting CPU usage equivalent to about 1 CPU core being pegged on the sender? I'm still seeing 7.95 run as fast as, well, polling with about 8k buffers will run. Ncat will certainly still be significantly slower than tools that do things like optimize with sendfile/TransmitFile functionality on sending end, much larger receive buffers, etc. but to match those that would require some more architectural changes and complexity. I don't know if the maintainers would want to take that on, but I would think it would be more welcome as a well thought out patch or at least a different issue.

@pplupo
Copy link

pplupo commented Jul 12, 2024

sending data with ncat on Windows is still incredibly slow with 7.95, it is literally hundreds of times slower compared to other tools like scp — only achieving about 200 kB/sec to another computer on the LAN.

@roelandschoukens are you getting CPU usage equivalent to about 1 CPU core being pegged on the sender? I'm still seeing 7.95 run as fast as, well, polling with about 8k buffers will run. Ncat will certainly still be significantly slower than tools that do things like optimize with sendfile/TransmitFile functionality on sending end, much larger receive buffers, etc. but to match those that would require some more architectural changes and complexity. I don't know if the maintainers would want to take that on, but I would think it would be more welcome as a well thought out patch or at least a different issue.

The problem is that it is still hundreds of times slower than ncat v. 6.40. The problem was simply not resolved. It's marked as resolved, but any test will show you it's absolutely not.

I would definitely reopen it if I had permission to do it because it's NOT FIXED.

@fyodor
Copy link
Member

fyodor commented Jul 12, 2024

Thanks folks. I'm sorry to hear that people are still experiencing the slow transfer times on Windows. I'm re-opening the issue so we can further investigate.

@fyodor fyodor reopened this Jul 12, 2024
@roelandschoukens
Copy link

roelandschoukens commented Jul 17, 2024

Hi, I don't have a core pegged.

I did some quick investigation, and, since my use case involves reading stdin, it is hitting the 125ms timeout in nbase_misc.c. Making the timeout smaller makes the transfer much faster. The patch above doesn't solve the problem because win_stdin_ready quite often returns false.

@roelandschoukens
Copy link

roelandschoukens commented Jul 17, 2024

After some tests, it seems CreatePipe by default creates a buffer of only a few kilobytes, if I pass a larger buffer size hitting the timeout is almost entirely avoided.

I don't know whether this has other downsides, it seems if input arrives byte per byte it still gets sent.

diff --git a/nbase/nbase_winunix.c b/nbase/nbase_winunix.c
index 3445480aa..f14f4d71e 100644
--- a/nbase/nbase_winunix.c
+++ b/nbase/nbase_winunix.c
@@ -152,7 +152,7 @@ int win_stdin_start_thread(void) {

     /* Create the pipe that win_stdin_thread_func writes to. We reassign the
        read end to be the new stdin that the rest of the program sees. */
-    if (CreatePipe(&stdin_pipe_r, &stdin_pipe_w, NULL, 0) == 0)
+    if (CreatePipe(&stdin_pipe_r, &stdin_pipe_w, NULL, 1024*1024*64) == 0)
         return 0;

     /* Make a copy of the stdin handle to be used by win_stdin_thread_func.  It

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests