Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add write socket buffer to fix CoPP rx performance issue #1092

Closed
wants to merge 1 commit into from
Closed

Add write socket buffer to fix CoPP rx performance issue #1092

wants to merge 1 commit into from

Conversation

cytsai0409
Copy link
Contributor

@cytsai0409 cytsai0409 commented Oct 30, 2017

Add write socket buffer to fix CoPP rx performance issue

- What I did

Add write socket buffer to fix packet drop issue in ptf_nn_agent.py

- How I did it

Add set command in the file build_debian.sh

- How to verify it

Run the CoPP tests in the testbed and verify the test results are passed

- Description for the changelog

Add write socket buffer to fix CoPP rx performance issue

Add write socker buffer to fix CoPP rx performance issue

- What I did

Add write socker buffer to fix packet drop issue in ptf_nn_agent.py

- How I did it

Add set command in the file build_debian.sh

- How to verify it

Run the CoPP tests in the testbed

- Description for the changelog

Add write socker buffer to fix CoPP rx performance issue
@cytsai0409 cytsai0409 changed the title Add write socker buffer to fix CoPP rx performance issue Add write socket buffer to fix CoPP rx performance issue Oct 31, 2017
@cytsai0409
Copy link
Contributor Author

More detailed discussion:
sonic-mgmt#308

@pavel-shirshov
Copy link
Contributor

Hi Jason,

Thank you for your PR.

I'd like to understand the reason of increasing the write buffer.

I increased the read buffer previously because ingress packets comes with the ASIC line rate which is impossible to reach by python read. So I increased socket memory buffer to ask Linux kernel preserve unread packets for ptf_nn_agent.

In your case contrary increasing write buffer means that your application writes packets with the rate higher than your NIC/ASIC can reach. But in your case a writer is a python and consumer is Linux kernel with ASIC. But as I understand python is much slower than Linux kernel + ASIC.

Can you please explain how your change can help you? I think increasing read buffer is enough.

Thanks

Copy link
Contributor

@pavel-shirshov pavel-shirshov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't request any change. I'd like to understand why the changes were offered.

@cytsai0409
Copy link
Contributor Author

Hi, Pavel:

Thanks for your review.

In the CoPP tests, the ptf_nn_agent will send packets back from DUT to test server to count matched packets and insufficient write socket buffer will cause ingress packet drop on DUT.

If the ptf_nn_agent does not send packets back from DUT to test server, I think increasing read socket buffer on DUT is enough. However this action is required by ptf_nn_agent on the test server.

When sending packet back to testbed server, I think the bottleneck is the between python and linux kernel, not on the ASIC.

For example, the CoPP test for DHCP is to send 100000 DHCP packets from test server to DUT without CoPP policy applied and expected low packet loss rate (<10%) on DUT.

Before increasing write socket buffer, the CoPP test for DHCP is failed due to more packet loss than expected.

  • Ex: 2017-10-20 01:28:20 : total_rcv_pkt_cnt (83072) > pkt_rx_limit (90000): False

After increasing socket buffer, the CoPP test for DHCP is passed.

  • Ex: 2017-10-20 01:46:07 : total_rcv_pkt_cnt (98733) > pkt_rx_limit (90000): True

Below is how we increase write buffer on Linux kernel and python.

To add write buffer on Linux kernel, we add the following line in the file /etc/sysctl.conf on DUT

  • net.core.wmem_max = 2097152

To add write buffer on Python, we add the options below in the ptf_nn_agent command on DUT

  • python ptf_nn_agent.py --device-socket 1@tcp://[DUT_MGMT_IP]:10900 -i 1-3@Ethernet12 --set-nn-rcv-buffer=10000000 --set-iface-rcv-buffer=10000000 --set-nn-snd-buffer=10000000 --set-iface-snd-buffer=10000000

As we tested the CoPP test will pass only when write buffer on Linux kernel and Python are both increased.

@pavel-shirshov
Copy link
Contributor

Honestly I don't understand how Linux kernel could be slower then python?
How write buffer helps in your case?

@cytsai0409
Copy link
Contributor Author

Fix this issue by increasing read socket buffer instead of write buffer:

  1. sysctl.conf: net.core.rmem_max=109430400
  2. ptf_nn_agent.conf: --set-nn-rcv-buffer=109430400 --set-iface-rcv-buffer=109430400

See more discussion: sonic-mgmt#308

@cytsai0409 cytsai0409 closed this Nov 22, 2017
lguohan pushed a commit that referenced this pull request Oct 20, 2019
Sonic-swss-common:

aaa8133 - 2019-10-12 : Add VRF object table in state_db (#312) [Tyler Li]
91aceb1 - 2019-10-11 : [schema] Update schema to support debug counters (#308) [Danny Allen]
9bcd5ca - 2019-09-28 : [multi-DB] fix vs test, should NOT replace old DBConnector API with new DBConnector API since vs test docker has no database_config.josn (#311) [Dong Zhang]
599155a - 2019-09-25 : [multi-DB] Part 2: C++ interface API changes / swsscommon unit test / LOGLEVEL_DB apply new API (#301) [Dong Zhang]
379ac73 - 2019-09-20 : add bulkremove for consumer_table_pops.lua (#306) [Dong Zhang]
6b805d3 - 2019-09-19 : timerfd return 0 with errno =0 - handle as False alarm. (#302) [Renuka Manavalan]
e455891 - 2019-09-03 : Add VLAN_SUB_INTERFACE in CONFIG_DB schema (#284) [Wenda Ni]

Sonic-swss

731a8f5 - 2019-10-17 : [copporch]: fix the endless loop problem when removing copp table group. (#1038) [wangshengjun]
1623219 - 2019-10-14 : Enable C++ unit test during build (#1092) [Qi Luo]
629c9d3 - 2019-10-14 : [vstest]: Revert back to 2 sec, and check if we got more than expected number of syslogs (#1091) [Prince Sunny]
80b2ace - 2019-10-11 : sonic-swss/orchagent: Add new protocol trap name support (#1087) [jpxjlrldgit]
9f765f7 - 2019-10-11 : [aclorch]: Check for existing mirror table only when creating a new table (#1089) [Danny Allen]
4c10260 - 2019-10-11 : [vstest]: Update Route test to check for added entry (#1088) [Prince Sunny]
e658b64 - 2019-10-11 : [chassisorch]: Add everflow feature for chassis (#1024) [Ze Gan]
5b13387 - 2019-10-10 : [changelog]: Revert changelog that was done for passing VS test. (#1080) [Prince Sunny]
90a690d - 2019-10-10 : [aclorch]: Simplify the TCP flags matching code and support exact value match (#1072) [Shuotian Cheng]
3461710 - 2019-10-09 : Single VRF for ingress and egress flows, skip route replication (#1045) [Prince Sunny]
953474a - 2019-10-03 : [swss]: Do not use namespace in header files (#1081) [Wenda Ni]
bd36751 - 2019-10-03 : Change nexthop key to ip & ifname (#977) [tylerlinp]
fee1aaa - 2019-10-02 : [teamsyncd]: Check if LAG exists before removing (#1069) [Shuotian Cheng]
175f3de - 2019-09-30 : Update ECMP NHopGroup for Port Channel oper down (#1030) [Sumukha Tumkur Vani]
182940d - 2019-09-26 : [mirrororch]: Remove mirror session state after it is remvoed (#1066) [Shuotian Cheng]
d823dd1 - 2019-09-20 : [MirrorOrch]: Mirror Session Retention across Warm Reboot (#1054) [Shuotian Cheng]
a5b6e7c - 2019-09-19 : Ignore link local neighbors (#1065) [Prince Sunny]
0ddaba3 - 2019-09-19 : Adopt to signature change of Selectable::readData, which switched (#1061) [Renuka Manavalan]
543bd98 - 2019-09-18 : [aclorch]: Fix table name in counter table for mirror rules (#1060) [Shuotian Cheng]
12c29b4 - 2019-09-19 : Cannot ping to link-local ipv6 interface address of the switch. (#774) [Kiran Kumar Kella]
4d8e08d - 2019-09-18 : change in fpmsyncd to skip the lookup for the Master device name if the route object table value is zero (#1048) [Arvindsrinivasan Lakshmi narasimhan]
da514f5 - 2019-09-18 : Do not update lag mtu from teamsyncd (netlink) (#1053) [Prince Sunny]
3fb22e1 - 2019-09-16 : Check warmboot flag during initialization (#1057) [Prince Sunny]
d98d1e9 - 2019-09-16 : [aclorch]: Egress mirror action support and action ASIC support check (#963) [Stepan Blyshchak]
313ef5c - 2019-09-09 : Warmboot Vlan neigh restore fix (#1040) [Prince Sunny]
5841e06 - 2019-09-06 : Add dot1p to tc mapping support (#871) [Wenda Ni]
39fe568 - 2019-08-30 : [aclorch]: Revise ACL rule creation/removal logs (#1042) [Shuotian Cheng]
c461911 - 2019-08-27 : [copporch]: Fix the typo - mld_v1_done (#1037) [wangshengjun]
34915de - 2019-08-22 : [portsyncd]: Add default catch block in portsyncd (#1033) [SuvarnaMeenakshi]
dc81a21 - 2019-08-20 : [vnet]: Fix FDB related failure in "vnet_bitmap" virtual switch test (#1034) [Volodymyr Samotiy]
5ae4226 - 2019-08-19 : [test]: Adjust stale timer for warm-reboot neighborsync test cases (#1031) [zhenggen-xu]
65cbd55 - 2019-08-16 : [build]: Fix compiling warnings using ARM 32 bit compiler (#1015) [arheneus@marvell.com]
b611808 - 2019-08-16 : [Orchagent]: Fixbug segmentfault at routeorch (#1025) [Ze Gan]
madhanmellanox pushed a commit to madhanmellanox/sonic-buildimage that referenced this pull request Mar 23, 2020
* Enable test during build
* Exclude `tests` in the deb package
dmytroxshevchuk pushed a commit to dmytroxshevchuk/sonic-buildimage that referenced this pull request Aug 31, 2020
- Update SAI VoQ support (sonic-net#1107) …
- Voq system (sonic-net#1081) …
- [meta] Add support for ignored enum values (sonic-net#1099)
- TPID SAI proposal (sonic-net#1089) …
- ACL GRE key match (sonic-net#1076) …
- Add IPv6 NS and NA Traps (sonic-net#1092) …
- MACsec flow list attribute added in MACsec object (sonic-net#1095) …
- Add Enterprise Number for IPFIX Report Type (sonic-net#1072) …
- Provide TTL and QoS treatment during MPLS encap and decap (sonic-net#1079)
- Create and Set for Tunnel Attributes (sonic-net#1086) …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants