Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy out-bound data into pre-shared pages #17

Merged
merged 2 commits into from Jan 15, 2015
Merged

Conversation

@talex5
Copy link
Contributor

@talex5 talex5 commented Jan 12, 2015

The original motivation for this is to make TLS support work. Currently, TLS copies the encrypted buffers into a large Io_page, and TCP then sends segment-sized Cstruct views of this buffer to the network. If any of these overlap a page boundary, the send will fail.

With this patch, TLS can send ordinary unaligned buffers to TCP (avoiding a copy). The copy now happens in Netif.

This some other advantages:

  • It avoids splitting requests across multiple grants (before, we sent
    the IP header and payload in separate pages).
  • It avoids the security problem of sharing unrelated data that happens
    to be in the same page. Now, netback will only ever see data
    explicitly sent to it.
  • It allows the sender to reuse pages as soon as Netif.write returns.
    Before, the pages were queued and the application could change them
    even as netback was reading them. Behaviour should now be
    deterministic.
  • It's faster (132 MB/s -> 181 MB/s for an x86_64 unikernel running
    under Xen in VirtualBox on my laptop).
@talex5
Copy link
Contributor Author

@talex5 talex5 commented Jan 13, 2015

Minor correction: it still doesn't allow the sender to reuse the pages because it needs to retransmit them if it gets suspended.

@hannesm
Copy link
Member

@hannesm hannesm commented Jan 13, 2015

👍

@talex5
Copy link
Contributor Author

@talex5 talex5 commented Jan 13, 2015

Performance of UDP benchmark on ARM:

  • With old code (including the wait fix in the first patch) : 13 MB/s
  • With this code : 16 MB/s

Notes:

  • These tests were compiled with Lwt tracing support, which slows things down slightly. But recompiling everything on ARM takes ages and the relative speeds should be OK.
  • I averaged fewer results for the old case, because 2 of the 9 runs caused dom0 Linux kernel panics.

The panic may be because I'm running a patched kernel to work around previous crashes, so I don't think it's worth reporting it. We should upgrade the ARM image builder to a newer stock kernel at some point. I also saw the old code once cause a dom0 kernel panic on x86_64 (which, unfortunately, I didn't capture because serial logging wasn't on), but that's also running a patched kernel.

So, I think the new code should be better for older (and possibly newer) Linux kernels too.

let rest_hd = Cstruct.shift hd first in
(n + avail, rest_hd :: tl)
) in
aux dst 0 src

This comment has been minimized.

@samoht

samoht Jan 13, 2015
Member

could be useful to upstream this into Cstruct (more generally we need more Cstruct functions working on list of Cstructs I think)

This comment has been minimized.

@talex5 talex5 mentioned this pull request Jan 15, 2015
14 of 14 tasks complete
@djs55
Copy link
Member

@djs55 djs55 commented Jan 15, 2015

Finished, 1 target (0 cached) in 00:00:00.
+ /home/travis/.opam/system/bin/ocamlfind ocamlc -c -g -annot -package cstruct -syntax camlp4o -package cstruct.syntax -package ipaddr -syntax camlp4o -package lwt.syntax -package mirage-profile -package mirage-xen -package xen-evtchn -package xen-gnt -w A-27-32-34-37 -I lib -o lib/netif.cmo lib/netif.ml
File "lib/netif.ml", line 165, characters 15-17:
Error: Unbound value |>
Command exited with code 2.
Compilation unsuccessful after building 4 targets (0 cached) in 00:00:01.
E: Failure("Command ''/usr/bin/ocamlbuild' lib/mirage-net-xen.cma lib/mirage-net-xen.cmxa lib/mirage-net-xen.a lib/mirage-net-xen.cmxs -tag debug' terminated with error code 10")
make: *** [build] Error 1
@talex5 talex5 force-pushed the talex5:page-pool branch from b9ee5cd to 7d2777d Jan 15, 2015
@talex5
Copy link
Contributor Author

@talex5 talex5 commented Jan 15, 2015

Are we still supporting OCaml 4.00? If not, I'll just update the Travis tests.

talex5 added 2 commits Jan 9, 2015
The original motivation for this is to make TLS support work. Currently,
TLS copies the encrypted buffers into a large Io_page and TCP sends
segment-sized Cstruct views of this buffer to the network. If any of
these overlap a page boundary, the send will fail.

With this patch, TLS can send ordinary unaligned buffers to TCP
(avoiding a copy). The copy now happens in Netif.

This some other advantages:

- It avoids splitting requests across multiple grants (before, we sent
  the IP header and payload in separate pages).
- It avoids the security problem of sharing unrelated data that happens
  to be in the same page. Now, netback will only ever see data
  explicitly sent to it.
- It allows the sender to reuse pages as soon as Netif.write returns.
  Before, the pages were queued and the application could change them
  even as netback was reading them. Behaviour should now be
  deterministic.
- It's faster (132 MB/s -> 181 MB/s for an x86_64 unikernel running
  under Xen in VirtualBox on my laptop).
@djs55
Copy link
Member

@djs55 djs55 commented Jan 15, 2015

Hm, since

mirage-xen.2.1.2/opam:ocaml-version: [>= "4.01.0" & < "4.02.0"]

I think we can drop 4.00.

On Thu, Jan 15, 2015 at 3:11 PM, Thomas Leonard notifications@github.com
wrote:

Are we still supporting OCaml 4.00? If not, I'll just update the Travis
tests.


Reply to this email directly or view it on GitHub
#17 (comment).

Dave Scott

@talex5
Copy link
Contributor Author

@talex5 talex5 commented Jan 15, 2015

OK, dropped 4.00. Travis is now passing!

@djs55
Copy link
Member

@djs55 djs55 commented Jan 15, 2015

One day we'll have a nice set of unit tests for this stuff (should be easier when I've completed functorising it like vchan)... One day

djs55 added a commit that referenced this pull request Jan 15, 2015
Copy out-bound data into pre-shared pages
@djs55 djs55 merged commit dfa574a into mirage:master Jan 15, 2015
1 check passed
1 check passed
continuous-integration/travis-ci The Travis CI build passed
Details
@talex5 talex5 deleted the talex5:page-pool branch Jan 15, 2015
@djs55
Copy link
Member

@djs55 djs55 commented Jan 15, 2015

Would a quick release of this be useful?

On Thu, Jan 15, 2015 at 4:11 PM, Thomas Leonard notifications@github.com
wrote:

OK, dropped 4.00. Travis is now passing!


Reply to this email directly or view it on GitHub
#17 (comment).

Dave Scott

@talex5
Copy link
Contributor Author

@talex5 talex5 commented Jan 15, 2015

Yeah, being able to unit-test ring code would be pretty useful! I'd hold off doing a release until #15 is fixed too (looks trivial, but someone else should confirm).

avsm added a commit to avsm/ocaml-cstruct that referenced this pull request Jan 23, 2015
```
val blitv: t list -> t -> int * t list
(** [blitv src dst] will copy the list of [src] buffers into [dst] and
    return a tuple of the total number of bytes written and a list of
    any uncopied buffers.  [blitv] will never raise an exception, since
    it returns any uncopied bytes if the [dst] buffer is not big enough
    to fit the full list of [src] buffers. *)
```

via @talex5 in mirage/mirage-net-xen#17
)
(** Copy from src to dst until src is exhausted or dst is full.
* Returns the number of bytes copied and the remaining data from src, if any. *)
(* TODO: replace this with Cstruct.buffer once that's released. *)

This comment has been minimized.

@avsm avsm mentioned this pull request Jan 24, 2015
@samoht

This comment has been minimized.

Copy link
Member

@samoht samoht commented on 7d2777d Feb 8, 2015

I think this code contains a deadlock/race. Not sure exactly where yet, but with @MagnusS we managed to bisect the 30% SYNACK losses in synjitsu to that commit (or the one later which tries to fix the deadlock, not totally sure).

A working trace:

Netif: add resume hook
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: c0:ff:ee:c0:ff:ee
 sg:true gso_tcpv4:true rx_copy:true rx_flip:false smart_poll:false
Netfif.poll_thread:loop
Netif.rx_poll
Netif.notify
Netif.tx_poll
Manager: connect
Attempt to open(/dev/urandom)!
Manager: configuring
Netif.listen
Manager: Interface to 192.168.2.140 nm 255.255.255.0 gw [192.168.2.1]

ARP: sending gratuitous from 192.168.2.140
Netif.writev
writev-no-retry "<FF><FF><FF><FF><FF><FF><C0><FF><EE><C0><FF><EE>\b\006\000\001\b\000\006\004\000\002
<C0><FF><EE><C0><FF><EE><C0><A8>\002\140<FF><FF><FF><FF><FF><FF>\000\000\000\000"
[1] available devices = [ 0 ]
Shared_page_pool.use
Shared_page_pool.alloc
[1] write-request len=42 size=42
Netif.notify
Front.push done
Manager: configuration done
FAST-START mode. Watching xenstore for incoming SYN!
Ports: 80
Dest-ip: 192.168.2.1
Dest-ports: 56581
Found 1 started connections.
tx=65535 sequence=173859064 options=(SACK_ok(Timestamp 2098518192 0)Noop Noop(Window_size_shift 5)Noo
p(MSS 1460)) tx_isn=452892043 rx_wnd=65535 rx_wnd_sca=2
Found SYN cookies.
Segment.output time=0.13 xmit=true rexmit=true
Wire.xmit checksum 83fc 192.168.2.140.80->192.168.2.1.56581 rst false syn true fin false psh false seq
         452892043 ack 173859065 [ MSS=1460; Window>>2 ] datalen 0 datafrag 0 dataoff 7 olen 8
Netif.writev
writev-no-retry "^<F9>8<F8><F1>d<C0><FF><EE><C0><FF><EE>\b\000E\000\0000<B4>\011\000\000&\006Z<DF>
<C0><A8>\002\140<C0><A8>\002\001\000P<DD>\005\026<FE>\149\139\n\\<E0><F9>p\018<FF><FF>\131<FC>\000\000\002\004\005<B4>\003\003\002\000"
Netfif.poll_thread:loop
Netif.rx_poll
Netif.tx_poll
[1] reply done
[2] available devices = [ 0 ]
Shared_page_pool.use
[2] write-request len=62 size=62
Netif.notify
Front.push done
Netfif.poll_thread:loop
Netif.rx_poll
Netif.tx_poll
[2] reply done
Netfif.poll_thread:loop
Netif.rx_poll
Segment.input
Segment.input
Netif.tx_poll
Segment.output time=0.15 xmit=true rexmit=true HTTP/1.1 200 OK^M

A failing trace:

Netif: add resume hook
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: c0:ff:ee:c0:ff:ee
 sg:true gso_tcpv4:true rx_copy:true rx_flip:false smart_poll:false
Netfif.poll_thread:loop
Netif.rx_poll
Netif.notify
Netif.tx_poll
Manager: connect
Attempt to open(/dev/urandom)!
Manager: configuring
Netif.listen
Manager: Interface to 192.168.2.140 nm 255.255.255.0 gw [192.168.2.1]

ARP: sending gratuitous from 192.168.2.140
Netif.writev
writev-no-retry "<FF><FF><FF><FF><FF><FF><C0><FF><EE><C0><FF><EE>\b\006\000\001\b\000\006\004\000\00
2<C0><FF><EE><C0><FF><EE><C0><A8>\002\140<FF><FF><FF><FF><FF><FF>\000\000\000\000"
[1] available devices = [ 0 ]
Shared_page_pool.use
Shared_page_pool.alloc
[1] write-request len=42 size=42
Netif.notify
Front.push done
Manager: configuration done
FAST-START mode. Watching xenstore for incoming SYN!
Ports: 80
Dest-ip: 192.168.2.1
Dest-ports: 56602
Found 1 started connections.
tx=65535 sequence=-1072429600 options=(SACK_ok(Timestamp 2098543247 0)Noop Noop(Window_size_shift 5)
Noop(MSS 1460)) tx_isn=452869351 rx_wnd=65535 rx_wnd_sca=2
Found SYN cookies.
Segment.output time=0.11 xmit=true rexmit=true
Wire.xmit checksum 01ec 192.168.2.140.80->192.168.2.1.56602 rst false syn true fin false psh false seq
         452869351 ack 3222537697 [ MSS=1460; Window>>2 ] datalen 0 datafrag 0 dataoff 7 olen 8
Netif.writev
writev-no-retry "^<F9>8<F8><F1>d<C0><FF><EE><C0><FF><EE>\b\000E\000\0000<CB><E9>\000\000&\006C\001
<C0><A8>\002\140<C0><A8>\002\001\000P<DD>\026\026<FE><<E7><C0>\020\005<E1>p\018<FF><FF>\001<EC>\000\000\002\004\005<B4>\003\003\002\000"
[2] available devices = [ 0 ]
Share_page_pool.use
[2] write-request len=62 size=62
Front.push done
Netfif.poll_thread:loop
Netif.rx_poll
Netif.tx_poll
[2] reply done
[1] reply done
Netfif.poll_thread:loop
Netif.rx_poll
Netif.tx_poll
Netfif.poll_thread:loop
Netif.rx_poll
Netif.tx_poll
Netfif.poll_thread:loop
...
Netif.tx_poll
Netfif.poll_thread:loop
Netif.rx_poll
Netif.tx_poll
Netfif.poll_thread:loop
Netif.rx_poll
Netif.tx_poll
Netfif.poll_thread:loop
Netif.rx_poll
Segment.output time=1.02 xmit=true rexmit=true

The interesting bits being the [1] write-request len=42 size=42 and [1] reply done (and same for [2]). So it seems that in the failing case, the ARP packet write is signalled after the SYNACK packet write, which actually never appears on the bridge.

This comment has been minimized.

Copy link
Member

@avsm avsm replied Feb 24, 2015

This one's a high priority to fix, as it's difficult to back out due to the dependency on the new semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.