Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PM: server: accept subflows #203

Closed
matttbe opened this issue Jun 11, 2021 · 4 comments
Closed

PM: server: accept subflows #203

matttbe opened this issue Jun 11, 2021 · 4 comments
Assignees
Labels

Comments

@matttbe
Copy link
Member

matttbe commented Jun 11, 2021

With a simple setup, we can notice that no additional subflow is accepted by the server when the 'subflow' limit is set to 1.

Setup:

ip netns add ns-a
ip netns add ns-b

ip link add ns-a-eth1 netns ns-a type veth peer name ns-b-eth1 netns ns-b
ip link add ns-a-eth2 netns ns-a type veth peer name ns-b-eth2 netns ns-b

ip -net ns-a link set lo up
ip -net ns-b link set lo up

ip -net ns-a addr add 10.0.1.1/24 dev ns-a-eth1
ip -net ns-b addr add 10.0.1.2/24 dev ns-b-eth1
ip -net ns-a addr add 10.0.2.1/24 dev ns-a-eth2
ip -net ns-b addr add 10.0.2.2/24 dev ns-b-eth2

ip -net ns-a link set ns-a-eth1 up
ip -net ns-a link set ns-a-eth2 up
ip -net ns-b link set ns-b-eth1 up
ip -net ns-b link set ns-b-eth2 up

ip -net ns-a mptcp limits set subflow 1 add_addr_accepted 1
ip -net ns-b mptcp limits set subflow 1 add_addr_accepted 1

tc -net ns-a qdisc add dev ns-a-eth1 root netem delay 250ms
tc -net ns-b qdisc add dev ns-b-eth1 root netem delay 250ms
tc -net ns-a qdisc add dev ns-a-eth2 root netem delay 250ms
tc -net ns-b qdisc add dev ns-b-eth2 root netem delay 250ms

ip -net ns-a mptcp endpoint add 10.0.2.1 dev ns-a-eth2 subflow backup
ip -net ns-b mptcp endpoint add 10.0.2.2 dev ns-b-eth2 signal backup

According to the doc, the subflow limit is only for additional subflows:

       SUBFLOW_NR
              specifies the maximum number of additional subflows
              allowed for each MPTCP connection. Additional subflows can
              be created due to: incoming accepted ADD_ADDR option,
              local subflow endpoints, additional subflows started by
              the peer.

       ADD_ADDR_ACCEPTED_NR
              specifies the maximum number of ADD_ADDR suboptions
              accepted for each MPTCP connection. The MPTCP path manager
              will try to create a new subflow for each accepted
              ADD_ADDR option, respecting the SUBFLOW_NR limit.

Here, we observe something different:

The client tries to create one addition subflow, that's good!

# ip netns exec ns-a tcpdump -i any -n -c 40 "tcp and (tcp[tcpflags] & tcp-syn) != 0"  
tcpdump: data link type LINUX_SLL2                                                                                                                                                                                                                                      
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode                                                                                                                                                                                               
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes                                                                                                                                                                                  
08:05:04.289871 ns-a-eth1 Out IP 10.0.1.1.54436 > 10.0.1.2.10000: Flags [S], seq 1743131229, win 64240, options [mss 1460,sackOK,TS val 3872195649 ecr 0,nop,wscale 7,mptcp capable v1], length 0
08:05:04.540173 ns-a-eth1 In  IP 10.0.1.2.10000 > 10.0.1.1.54436: Flags [S.], seq 1723177912, ack 1743131230, win 65160, options [mss 1460,sackOK,TS val 1672054206 ecr 3872195649,nop,wscale 7,mptcp capable v1 {0xfa2e56877b8e3909}], length 0
08:05:05.291249 ns-a-eth2 Out IP 10.0.2.1.40839 > 10.0.2.2.10000: Flags [S], seq 2350443746, win 64240, options [mss 1460,sackOK,TS val 202456646 ecr 0,nop,wscale 7,mptcp join id 0 token 0x7890d4bc nonce 0x702812f1], length 0

But it got rejected:

# ip netns exec ns-a tcpdump -i ns-a-eth2 -n -c 40                                                                                                                            
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode                                                                                                                                                                                               
listening on ns-a-eth2, link-type EN10MB (Ethernet), snapshot length 262144 bytes                                                                                                                                                                                       
07:59:20.853254 IP 10.0.2.1.50923 > 10.0.2.2.10000: Flags [S], seq 1219028326, win 64240, options [mss 1460,sackOK,TS val 202112208 ecr 0,nop,wscale 7,mptcp join id 0 token 0x174dbcbb nonce 0xba84e555], length 0                                                     
07:59:21.103506 IP 10.0.2.2.10000 > 10.0.2.1.50923: Flags [R.], seq 0, ack 1219028327, win 0, length 0

If I remove the second endpoint on the server, I have the same situation:

# ip -net ns-b mptcp endpoint del id 1
# ip netns exec ns-a tcpdump -i ns-a-eth2 -n -c 40
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ns-a-eth2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:01:50.373861 IP 10.0.2.1.49885 > 10.0.1.2.10000: Flags [S], seq 2524051691, win 64240, options [mss 1460,sackOK,TS val 3488003866 ecr 0,nop,wscale 7,mptcp join backup id 1 token 0xac10dea9 nonce 0xded38634], length 0
09:01:50.960790 IP 10.0.1.2.10000 > 10.0.2.1.49885: Flags [R.], seq 0, ack 2524051692, win 0, length 0

If I increase the subflow limit to 2 on the server side a new subflow has been created:

# ip -net ns-b mptcp limits set subflow 2 add_addr_accepted 1
# ip netns exec ns-a tcpdump -i ns-a-eth2 -n -c 40
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ns-a-eth2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:02:34.028751 IP 10.0.2.1.42277 > 10.0.1.2.10000: Flags [S], seq 710352881, win 64240, options [mss 1460,sackOK,TS val 3488047521 ecr 0,nop,wscale 7,mptcp join backup id 1 token 0x4008562d nonce 0xf82d7342], length 0
09:02:34.278974 IP 10.0.1.2.10000 > 10.0.2.1.42277: Flags [S.], seq 472761946, ack 710352882, win 65160, options [mss 1460,sackOK,TS val 2181970087 ecr 3488047521,nop,wscale 7,mptcp join backup id 0 hmac 0xdc985ffce998cc31 nonce 0xb039852c], length 0
09:02:34.529091 IP 10.0.2.1.42277 > 10.0.1.2.10000: Flags [.], ack 1, win 502, options [nop,nop,TS val 3488048021 ecr 2181970087,mptcp join hmac 0x8732d5da3c0bf2591c677994cba9d9c6363c91a6], length 0

If I re-add the endpoint on the server still with the limit of 2 subflows:

# ip -net ns-b mptcp endpoint add 10.0.2.2 dev ns-b-eth2 signal backup
# ip netns exec ns-a tcpdump -i ns-a-eth2 -n -c 40
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ns-a-eth2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
08:06:34.662066 IP 10.0.2.1.43577 > 10.0.2.2.10000: Flags [S], seq 1752600173, win 64240, options [mss 1460,sackOK,TS val 202546017 ecr 0,nop,wscale 7,mptcp join id 0 token 0xa853124a nonce 0x9d9b6ea8], length 0
08:06:34.911930 ARP, Reply 10.0.1.2 is-at 36:07:4e:35:e1:15, length 28
08:06:34.912175 IP 10.0.2.2.10000 > 10.0.2.1.43577: Flags [R.], seq 0, ack 1752600174, win 0, length 0
08:06:35.162020 IP 10.0.2.1.48795 > 10.0.1.2.10000: Flags [S], seq 2178933091, win 64240, options [mss 1460,sackOK,TS val 3484688154 ecr 0,nop,wscale 7,mptcp join backup id 1 token 0xa853124a nonce 0xad6e6be5], length 0
08:06:35.412370 IP 10.0.1.2.10000 > 10.0.2.1.48795: Flags [S.], seq 1888284745, ack 2178933092, win 65160, options [mss 1460,sackOK,TS val 2178611220 ecr 3484688154,nop,wscale 7,mptcp join backup id 0 hmac 0x5e2fb2b44d51a1c2 nonce 0x3ff21c62], length 0

Note that if I add/remove endpoints, sometimes I have to increase the limit to >2, e.g. with 4:

# ip -net ns-b mptcp limits set subflow 4 add_addr_accepted 1
# ip netns exec ns-a tcpdump -i ns-a-eth2 -n -c 40
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ns-a-eth2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:08:42.811092 IP 10.0.2.1.59115 > 10.0.1.2.10000: Flags [S], seq 1098747445, win 64240, options [mss 1460,sackOK,TS val 3488416303 ecr 0,nop,wscale 7,mptcp join backup id 1 token 0x4c780941 nonce 0x2e31bdbe], length 0
09:08:42.811104 IP 10.0.2.1.51205 > 10.0.2.2.10000: Flags [S], seq 4037540084, win 64240, options [mss 1460,sackOK,TS val 206274166 ecr 0,nop,wscale 7,mptcp join id 0 token 0x4c780941 nonce 0x472dc1a8], length 0
09:08:43.066630 IP 10.0.1.2.10000 > 10.0.2.1.59115: Flags [S.], seq 542777452, ack 1098747446, win 65160, options [mss 1460,sackOK,TS val 2182338869 ecr 3488416303,nop,wscale 7,mptcp join backup id 0 hmac 0x42f909c4f917b3ab nonce 0xac1c3f8c], length 0
09:08:43.066654 IP 10.0.2.2.10000 > 10.0.2.1.51205: Flags [R.], seq 0, ack 4037540085, win 0, length 0
09:08:43.316724 IP 10.0.2.1.59115 > 10.0.1.2.10000: Flags [.], ack 1, win 502, options [nop,nop,TS val 3488416809 ecr 2182338869,mptcp join hmac 0xd3bf98959e72df4b72aad0e31a5114ee682b0ee5], length 0
09:08:43.566872 IP 10.0.1.2.10000 > 10.0.2.1.59115: Flags [.], ack 1, win 510, options [nop,nop,TS val 2182339375 ecr 3488416809,mptcp dss ack 6666903446578750322], length 0

And sometimes, even with higher limits, I don't have any new subflows on the 2nd interface:

# ip -net ns-b mptcp limits set subflow 8 add_addr_accepted 8
# ip netns exec ns-a tcpdump -i ns-a-eth2 -n -c 40
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ns-a-eth2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:10:41.166253 IP 10.0.2.1.58471 > 10.0.2.2.10000: Flags [S], seq 3478336444, win 64240, options [mss 1460,sackOK,TS val 206392521 ecr 0,nop,wscale 7,mptcp join id 0 token 0xa188b163 nonce 0xc8a4974e], length 0
09:10:41.416536 IP 10.0.2.2.10000 > 10.0.2.1.58471: Flags [R.], seq 0, ack 3478336445, win 0, length 0
09:10:46.286161 ARP, Request who-has 10.0.2.2 tell 10.0.2.1, length 28
09:10:46.536271 ARP, Reply 10.0.2.2 is-at 36:07:4e:35:e1:15, length 28
09:10:46.797424 ARP, Request who-has 10.0.2.1 tell 10.0.2.2, length 28
09:10:47.047545 ARP, Reply 10.0.2.1 is-at 6e:2c:26:df:2b:ca, length 28

(maybe here, the client established a second subflow over the other link because I don't see a second SYN on the second interface)

@matttbe matttbe added the bug label Jun 11, 2021
@matttbe matttbe added this to Needs triage in MPTCP Bugs via automation Jun 11, 2021
@pabeni
Copy link

pabeni commented Jun 15, 2021

With a simple setup, we can notice that no additional subflow is accepted by the server when the 'subflow' limit is set to 1.

Setup:

[...]

ip -net ns-a mptcp limits set subflow 1 add_addr_accepted 1
ip -net ns-b mptcp limits set subflow 1 add_addr_accepted 1

The above is buggy. Either subflow 2 on both ends or remove one of the endpoint: 'subflow 1' means at most a single subflow will be created. With the 2 specified endpoint, the client is asked to create 2 subflows, more or less in random order.

But it got rejected:

# ip netns exec ns-a tcpdump -i ns-a-eth2 -n -c 40                                                                                                                            
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode                                                                                                                                                                                               
listening on ns-a-eth2, link-type EN10MB (Ethernet), snapshot length 262144 bytes                                                                                                                                                                                       
07:59:20.853254 IP 10.0.2.1.50923 > 10.0.2.2.10000: Flags [S], seq 1219028326, win 64240, options [mss 1460,sackOK,TS val 202112208 ecr 0,nop,wscale 7,mptcp join id 0 token 0x174dbcbb nonce 0xba84e555], length 0                                                     
07:59:21.103506 IP 10.0.2.2.10000 > 10.0.2.1.50923: Flags [R.], seq 0, ack 1219028327, win 0, length 0

I get this behavior when using 'nc' for both the server and the client. This behavior is actually the correct/expected one, as 'nc' closes the listening socket just after accepting the first connection. Additional subflows have a to enter a thin racing window to be allowed.

If you are using 'nc', too the correct fix is appending '-k' to the server command line option.

@matttbe
Copy link
Member Author

matttbe commented Jun 15, 2021

The above is buggy. Either subflow 2 on both ends or remove one of the endpoint: 'subflow 1' means at most a single subflow will be created.

We should probably update the doc because it says the subflow limit is only for "additional" subflows. So if I want to have 2 subflows in total, I should set 1, no?

This behavior is actually the correct/expected one, as 'nc' closes the listening socket just after accepting the first connection. Additional subflows have a to enter a thin racing window to be allowed.

Mmh, that's a bit annoying for apps not build for MPTCP, no? :-/
I was using mptcp_connect

@pabeni
Copy link

pabeni commented Jun 15, 2021

The above is buggy. Either subflow 2 on both ends or remove one of the endpoint: 'subflow 1' means at most a single subflow will be created.

We should probably update the doc because it says the subflow limit is only for "additional" subflows. So if I want to have 2 subflows in total, I should set 1, no?

Sorry, I was not clear nor accurate. 'subflow 1' means at most a single additional subflow will be created. The endpoint configuration attempts to create 2 additional subflows.

This behavior is actually the correct/expected one, as 'nc' closes the listening socket just after accepting the first connection. Additional subflows have a to enter a thin racing window to be allowed.

Mmh, that's a bit annoying for apps not build for MPTCP, no? :-/

Well, it depends on the specific app. Most [all?] applications accepting connections keep the listener socket open. If some want to enable mptcp on an unmodified application closing the listener socket after the first connection, port-based endpoint comes to the rescue.

I was using mptcp_connect

Same problem of 'nc'. Solvable adding '-j' on the command line

fengguang pushed a commit to 0day-ci/linux that referenced this issue Jan 12, 2022
When ADD_ADDR announcements use the port associated with an
active subflow, this change ensures that a listening socket is
bound to the announced address and port for subsequently
receiving MP_JOINs from the remote end. In case there's
a recorded lsk bound to that address+port, it is reused.
But if a listening socket for this address is already held by the
application then no further action is taken.

When a listening socket is created, it is stored in
struct mptcp_pm_add_entry and released accordingly.

Closes: multipath-tcp/mptcp_net-next#203

v2: fixed formatting

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
@matttbe matttbe removed this from Needs triage in MPTCP Bugs Feb 1, 2022
@matttbe matttbe added this to To do in MPTCP Next (5.18) via automation Feb 1, 2022
@matttbe matttbe moved this from To do to In progress in MPTCP Next (5.18) Feb 1, 2022
fengguang pushed a commit to 0day-ci/linux that referenced this issue Feb 3, 2022
When ADD_ADDR announcements use the port associated with an
active subflow, this change ensures that a listening socket is bound
to the announced addr+port in the kernel for subsequently receiving
MP_JOINs. But if a listening socket for this address is already held
by the application then no action is taken.

A listening socket is created (when there isn't a listener)
just prior to the addr advertisement. If it is desired to not create
a listening socket in the kernel for an address, then this can be
requested by including the MPTCP_PM_ADDR_FLAG_NO_LISTEN flag
with the address.

When a listening socket is created, it is stored in
struct mptcp_pm_add_entry and released accordingly.

Closes: multipath-tcp/mptcp_net-next#203
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Feb 3, 2022
When ADD_ADDR announcements use the port associated with an
active subflow, this change ensures that a listening socket is bound
to the announced addr+port in the kernel for subsequently receiving
MP_JOINs. But if a listening socket for this address is already held
by the application then no action is taken.

A listening socket is created (when there isn't a listener)
just prior to the addr advertisement. If it is desired to not create
a listening socket in the kernel for an address, then this can be
requested by including the MPTCP_PM_ADDR_FLAG_NO_LISTEN flag
with the address.

When a listening socket is created, it is stored in
struct mptcp_pm_add_entry and released accordingly.

Closes: multipath-tcp/mptcp_net-next#203
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
@matttbe
Copy link
Member Author

matttbe commented Mar 17, 2022

From @mjmartineau

Ok, after this community discussion today, I think we should go with the "userspace (mptcpd) handles the listening sockets" approach. Kishen and Ossama are on board with this too.

In addition to the points just above (complexity, possible hidden issues, flexibility), this will leave the in-kernel PM implementation as-is for now. It also remains possible to change the kernel listening behavior for MPTCP joins at some point in the future, but will unblock the userspace PM.

Kishen is working on modifying the userspace PM patch set to add selftest support for these listening sockets in pm_nl_ctl. Ossama is looking at the mptcpd changes we'll need.

Paolo brought up and advocated this solution today, so it seems likely that he will concur. Considering the active participants in these discussions in recent meetings and email threads, I think we have a good consensus on this approach to the listening sockets.

So closing this ticket. Progress can be followed on: multipath-tcp/mptcpd#223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

3 participants