-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lost timestamp packets at high data rates #91
Comments
It shouldn't lose packets in that scenario. I can't reproduce this with the l3-load-latency example which still works at full line rate as expected. Command line: ./build/MoonGen examples/l3-load-latency.lua 14 15 0 1 60 Can you post the script you are using? BTW: what you are trying to do is probably not a good idea. Latency on full line rate is often problematic: every small pause causes buffers to fill up and these buffers can never be emptied as the packets are coming in at the same rate that you can send them out on. The latency will just be a function of the buffer size after a short time. |
I do reproduce the bug with the example script:
|
Did you update to 6d6cc3b or later? Try to update or use the default packet size (124). |
Thank you for your answers by the way :)
|
woops no, I was not at the latest version. It seems to work better now! |
I still have problems with my own script but not with the example. I will see what you have changed. Thank you. |
My recent changes shouldn't affect your code |
Indeed they do not. I see that you use UDP packets instead of PTP packets for your example. Is there a reason to prefer one over the other? While using measureLatency, I have also noted that you do not set pktLength in the UDP or PTP packets, resulting in broken packets that work anyway, but can be misinterpreted by some equipments. |
It doesn't really matter here since we don't modify any of the PTP fields. The timestamper uses a PTP packet internally. You are right, the timestamper could set the size which would avoid problems when someone sets the wrong size here (note: my script does that in fillPacket()) |
I am trying to find the differences between your script and mine, as mine loses all timestamp packets. I see that you declare 3 TX and 3 RX queues for both devices. Are there reasons why you need more than 2 TX for load and timestamp sending, and 1 RX for timestamp reception? |
One of the queues is for ARP rx/tx |
The example l3-load-latency.lua does not report lost timestamps (the number of times measureLatency returns nil). When adding the reporting, I see that it is not 0 (about 5%), although the count of sent and received packets is exactly identical (same problem as with my script). |
Okay, that means that timestamping fails for some reason. I guess 5% loss of timestamping information at full line rate is okay, since latency measurements at full linerate are usually pointless anyways (see my previous comment). I'll keep this issue open and I'll have a look at the timestamping logic which determines whether the timestamping was sucessful or not using sequence numbers. |
Indeed it is very acceptable. But I do not understand how the filtering works in your example. I believe that packets matching a filter will be sent to the chosen queue, but other packets may also be sent to that queue. In your example, you have no such rule, which should mean that you receive all packets on this core. Moreover, the filter you use (filterTimestamps) seems to match only PTP packets, not PTP/UDP packets, so it should have no effect in your case. Am I misunderstanding something? |
All packets go to queue 0 by default. Only filters and RSS can redirect packets. RSS is disabled by default, it has to enabled explicitly when configuring the device. And you would probably use a different set of queues for RSS. Regarding the timestamp filter: this can actually be improved, yes. It currently just checks the PTP version at a specific offset in IP packets (mask.only_ip_flow) and it actually ignores the L4 protocol. |
Commit 7e68758 changes the timestamp filter to check the L4 protocol. |
I cannot reproduce the packet loss at full line rate. What NIC are you using? I tested this with an Intel X540. |
I use an Intel 82599ES. I made the test again and still have lost timestamps. |
I cannot reproduce the packet loss at full line rate. But I'm using an Intel X540 which is basically the same NIC (datasheets are almost identical, same driver) just with 10GBase-T instead of SFP+ and a lot of bug fixes. I would not be surprised if this is just a hardware problem in the 82599 NIC. I've seen some strange problems with that NIC that just don't happen on X540 NICs. |
Hi,
I am using MoonGen to fill a 10 Gb/s link with TCP SYN flooding, and I try to measure the latency by using the measureLatency function.
But most or all timestamp packets are lost when the load is important. I use two ports directly connected by a cable, so the loss is not due to an external equipment.
When counting packets, I see the same number of sent and received packets.
I have setup filters to get only PTP packets in the receiving loop. This way the supported load gets higher, but I still lose most or all packets at full link rate.
I do not know if packets are lost during sending or reception.
Do you have an idea of what could cause the problem?
The text was updated successfully, but these errors were encountered: