New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCP page fault #80

Closed
MagnusS opened this Issue Nov 10, 2014 · 13 comments

Comments

Projects
None yet
4 participants
@MagnusS
Member

MagnusS commented Nov 10, 2014

When I run the static-website or stackv4 examples from mirage-skeleton under Xen they page fault with DHCP. Static IP seems to work fine. I use Xen 4.4 in Virtualbox (with Ubuntu 14.10 Server) and Mirage 2.0 from the main opam repo.

$ sudo xl create www.xl -c
Parsing config from www.xl
Xen Minimal OS!
  start_info: 0000000000322000(VA)
    nr_pages: 0x10000
  shared_inf: 0x40a67000(MA)
     pt_base: 0000000000325000(VA)
nr_pt_frames: 0x5
    mfn_list: 00000000002a2000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line:
       stack: 0000000000260800-0000000000280800
Mirage: start_kernel
MM: Init
      _text: 0000000000000000(VA)
     _etext: 0000000000151fde(VA)
   _erodata: 0000000000190000(VA)
     _edata: 0000000000247a10(VA)
stack start: 0000000000260800(VA)
       _end: 00000000002a12dc(VA)
  start_pfn: 32d
    max_pfn: 10000
Mapping memory range 0x400000 - 0x10000000
setting 0000000000000000-0000000000190000 readonly
skipped 1000
MM: Initialise page allocator for 3ab000(3ab000)-10000000(10000000)
MM: done
Demand map pfns at 10001000-0000002010001000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0000000010001000.
xencaml: app_main_thread
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
getenv(TMPDIR) -> null
getenv(TEMP) -> null
Netif: add resume hook
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: c0:ff:ee:c0:ff:ee
Manager: connect
Attempt to open(/dev/urandom)!
Manager: configuring
DHCP: start discovery

Sending DHCP broadcast len 552
Page fault at linear address 28, rip 151b17, regs 000000000027fc48, sp 27fcf0, our_sp 000000000027fc10, code 0
Page fault in pagetable walk (access to invalid memory?).
@avsm

This comment has been minimized.

Member

avsm commented Nov 10, 2014

What's the output of opam list -i ?

On 10 Nov 2014, at 22:04, Magnus Skjegstad notifications@github.com wrote:

When I run the static-website or stackv4 examples from mirage-skeleton under Xen they page fault with DHCP. Static IP seems to work fine. I use Xen 4.4 in Virtualbox (with Ubuntu 14.10 Server) and Mirage 2.0 from the main opam repo.

$ sudo xl create www.xl -c
Parsing config from www.xl
Xen Minimal OS!
start_info: 0000000000322000(VA)
nr_pages: 0x10000
shared_inf: 0x40a67000(MA)
pt_base: 0000000000325000(VA)
nr_pt_frames: 0x5
mfn_list: 00000000002a2000(VA)
mod_start: 0x0(VA)
mod_len: 0
flags: 0x0
cmd_line:
stack: 0000000000260800-0000000000280800
Mirage: start_kernel
MM: Init
_text: 0000000000000000(VA)
_etext: 0000000000151fde(VA)
_erodata: 0000000000190000(VA)
_edata: 0000000000247a10(VA)
stack start: 0000000000260800(VA)
_end: 00000000002a12dc(VA)
start_pfn: 32d
max_pfn: 10000
Mapping memory range 0x400000 - 0x10000000
setting 0000000000000000-0000000000190000 readonly
skipped 1000
MM: Initialise page allocator for 3ab000(3ab000)-10000000(10000000)
MM: done
Demand map pfns at 10001000-0000002010001000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0000000010001000.
xencaml: app_main_thread
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
getenv(TMPDIR) -> null
getenv(TEMP) -> null
Netif: add resume hook
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: c0:ff:ee:c0:ff:ee
Manager: connect
Attempt to open(/dev/urandom)!
Manager: configuring
DHCP: start discovery

Sending DHCP broadcast len 552
Page fault at linear address 28, rip 151b17, regs 000000000027fc48, sp 27fcf0, our_sp 000000000027fc10, code 0
Page fault in pagetable walk (access to invalid memory?).

Reply to this email directly or view it on GitHub #80.

@MagnusS

This comment has been minimized.

Member

MagnusS commented Nov 10, 2014

$ opam list -i
# Installed packages for system:
base-bigarray             base  Bigarray library distributed with the OCaml compiler
base-bytes              legacy  Bytes compatibility library distributed with ocamlfind
base-no-ppx               base  A pseudo-library to indicate lack of extension points support
base-threads              base  Threads library distributed with the OCaml compiler
base-unix                 base  Unix library distributed with the OCaml compiler
base64                   1.0.0  Base64 encoding and decoding library
camlp4                  4.01.0  Camlp4 is a system for writing extensible parsers for programming languages
cmdliner                 0.9.5  Declarative definition of command line interfaces for OCaml
cohttp                  0.12.0  HTTP library for Lwt, Async and Mirage
conduit                  0.6.1  Network connection library for TCP and SSL
conf-pkg-config            1.0  Virtual package relying on pkg-config installation.
crunch                   1.3.0  Convert a filesystem into a static OCaml module
cstruct                  1.4.0  access C structures via a camlp4 extension
dns                     0.11.0  DNS client and server implementation
fieldslib            109.20.03  Syntax extension to define first class values representing record fields, to get and set record fields, iterate and fold over
io-page                  1.1.1  Allocate memory pages suitable for aligned I/O
ipaddr                   2.5.0  IP (and MAC) address representation library
lwt                      2.4.6  A cooperative threads library for OCaml
mirage                   2.0.0  The Mirage library operating system
mirage-clock-unix        1.0.0  A Mirage-compatible Clock library for Unix
mirage-clock-xen         1.0.0  A Mirage-compatible Clock library for Xen
mirage-conduit           2.0.0  Virtual package for the Mirage Conduit transports
mirage-console           2.0.0  A Mirage-compatible Console library for Xen and Unix
mirage-dns               2.0.0  Virtual package for the Mirage DNS transports
mirage-http              2.0.0  Mirage HTTP client and server driver for Unix
mirage-net-unix          1.1.1  Ethernet network driver for Mirage, using tuntap
mirage-net-xen           1.1.3  Ethernet network device driver for Mirage/Xen
mirage-types             2.0.0  Module type definitions for Mirage-compatible applications
mirage-types-lwt         2.0.0  Lwt module type definitions for Mirage-compatible applications
mirage-unix              2.0.0  Mirage OS library for Unix compilation
mirage-xen               2.0.0  Mirage OS library for Xen compilation
mirage-xen-minios        0.4.1  Xen MiniOS guest operating system library
oasis                    0.4.5  Architecture for building OCaml libraries and applications
ocaml-data-notation     0.0.11  Store data using OCaml notation
ocamlfind                1.5.5  A library manager for OCaml
ocamlify                 0.0.1  Include files in OCaml code
ocamlmod                 0.0.7  Generate OCaml modules from source files
ocplib-endian              0.7  Optimised functions to read and write int16/32/64 from strings and bigarrays, based on new primitives added in version 4.01.
optcomp                    1.6  Optional compilation with cpp-like directives
ounit                    2.0.0  Unit testing framework loosely based on HUnit. It is similar to JUnit, and other XUnit testing frameworks
re                       1.2.2  RE is a regular expression library for OCaml
sexplib              111.13.00  Library for serializing OCaml values to and from S-expressions
shared-memory-ring       1.1.0  Shared memory rings for RPC and bytestream communications.
ssl                      0.4.7  Bindings for OpenSSL
stringext                1.0.0  Extra string functions for OCaml
tcpip                    2.0.1  Userlevel TCP/IP stack
tuntap                   1.0.0  TUN/TAP bindings
type_conv            111.13.00  Library for building type-driven syntax extensions
uri                      1.7.2  RFC3986 URI/URL parsing library
vchan                    2.0.0  Xen Vchan implementation
xen-evtchn               1.0.5  Xen event channel bindings.
xen-gnt                  2.0.0  Xen grant table bindings
xenstore                 1.2.5  Xenstore protocol clients and server
xenstore_transport       0.9.4  Low-level libraries for connecting to a xenstore service on a xen host.
@avsm

This comment has been minimized.

Member

avsm commented Nov 10, 2014

could you run 'gdb ' and 'dis 151b17' to find out where it faulted (thats the RIP instruction pointer)

On 10 Nov 2014, at 22:04, Magnus Skjegstad notifications@github.com wrote:

When I run the static-website or stackv4 examples from mirage-skeleton under Xen they page fault with DHCP. Static IP seems to work fine. I use Xen 4.4 in Virtualbox (with Ubuntu 14.10 Server) and Mirage 2.0 from the main opam repo.

$ sudo xl create www.xl -c
Parsing config from www.xl
Xen Minimal OS
!

start_info: 0000000000322000(VA)
nr_pages: 0x10000
shared_inf: 0x40a67000(MA)
pt_base: 0000000000325000(VA)
nr_pt_frames: 0x5
mfn_list: 00000000002a2000(VA)
mod_start: 0x0(VA)
mod_len: 0
flags: 0x0
cmd_line:
stack: 0000000000260800-0000000000280800
Mirage: start_kernel
MM: Init
_text: 0000000000000000(VA)
_etext: 0000000000151fde(VA)
_erodata: 0000000000190000(VA)
_edata: 0000000000247a10(VA)
stack start: 0000000000260800(VA)
_end: 00000000002a12dc(VA)
start_pfn: 32d
max_pfn: 10000
Mapping memory range 0x400000 - 0x10000000
setting 0000000000000000-0000000000190000
readonly

skipped 1000
MM: Initialise page allocator
for 3ab000(3ab000)-10000000(10000000
)
MM:
done

Demand map pfns at 10001000-0000002010001000.
Initialising timer interface
Initialising console ...
done
.
gnttab_table mapped at 0000000010001000.
xencaml: app_main_thread
getenv(OCAMLRUNPARAM) -

null
getenv(CAMLRUNPARAM) -

null
Unsupported
function lseek called in
Mini-OS kernel
Unsupported
function lseek called in
Mini-OS kernel
Unsupported
function lseek called in
Mini-OS kernel
getenv(OCAMLRUNPARAM) -

null
getenv(CAMLRUNPARAM) -

null
getenv(TMPDIR) -

null
getenv(TEMP) -

null
Netif: add resume hook
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: c0:ff:ee:c0:ff:ee
Manager: connect
Attempt to open(/dev/urandom)
!

Manager: configuring
DHCP: start discovery

Sending DHCP broadcast len 552
Page fault at linear address 28, rip 151b17, regs 000000000027fc48, sp 27fcf0, our_sp 000000000027fc10, code 0
Page fault
in pagetable walk (access to invalid memory?).

Reply to this email directly or view it on GitHub.

@MagnusS

This comment has been minimized.

Member

MagnusS commented Nov 11, 2014

disas says 0x151b17 is in memmove

@avsm

This comment has been minimized.

Member

avsm commented Nov 11, 2014

time for some printf debugging to narrow down where the fault is occurring...probably in the dhcp code in mirage-tcpip

@MagnusS

This comment has been minimized.

Member

MagnusS commented Nov 12, 2014

After doing some more testing it turns out that static IP doesn't work either. Interestingly, the static IP kernel only seems to crash after it has received (or tried to reply to) two IP packets. It crashes with TCP SYNs on closed and open ports and with ICMP packets. ARP seems to work fine.

gdb disas reports that the page faults are in caml_tcpip_ones_complement (ICMP) and caml_tcpip_ones_complement_list (TCP SYN).

If I edit lib/tcpip_checksums.ml to use caml_ones_complement and caml_ones_complement_list (not caml_tcpip_*) that fixes the problem.

If I replace mirage-tcpip/lib/checksums_stubs.c with mirage-platform/xen/runtime/xencaml/checksum_stubs.c and rename the C functions to caml_tcpip_* the kernel still crashes.

@avsm

This comment has been minimized.

Member

avsm commented Nov 12, 2014

It would be good to take this binary image and run it on a real Xen box to determine if it's a vbox specific problem or not.

On 12 Nov 2014, at 10:25, Magnus Skjegstad notifications@github.com wrote:

After doing some more testing it turns out that static IP doesn't work either. Interestingly, the static IP kernel only seems to crash after it has received (or tried to reply to) two IP packets. It crashes with TCP SYNs on closed and open ports and with ICMP packets. ARP seems to work fine.

gdb disas reports that the page faults are in caml_tcpip_ones_complement (ICMP) and caml_tcpip_ones_complement_list (TCP SYN).

If I edit lib/tcpip_checksums.ml to use caml_ones_complement and caml_ones_complement_list (not caml_tcpip_*) that fixes the problem.

If I replace mirage-tcpip/lib/checksums_stubs.c with mirage-platform/xen/runtime/xencaml/checksum_stubs.c and rename the C functions to caml_tcpip_* the kernel still crashes.


Reply to this email directly or view it on GitHub #80 (comment).

@talex5

This comment has been minimized.

Contributor

talex5 commented Nov 12, 2014

@MagnusS what's the difference in the disassembly of the two versions of ones_complement?

@MagnusS

This comment has been minimized.

Member

MagnusS commented Nov 17, 2014

I don't have access to a real Xen server at the moment, but I installed the older Ubuntu 14.04 w/Xen in vbox and ran the same examples. The gcc in 14.04 is older - 4.8 vs 4.9 in 14.10. The unikernels compiled in Ubuntu 14.04 works without page fault in both 14.04 and 14.10.

As caml_ones_complement_checksum (which works) is from libxencaml.a and caml_tcpip_ones_complement_checksum (which doesn't work) is from libtcpip_stubs.a, I checked if there were differences in how the libraries were compiled. The only flag used to compile checksum_stubs.c in libtctip_stubs.a is -O2. The flags used for libxencaml.a are (without -D/U/I/W etc) -O3 -mno-red-zone -fno-tree-loop-distribute-patterns -fno-stack-protector -fno-reorder-blocks -fstrict-aliasing -m64 -fno-asynchronous-unwind-tables -momit-leaf-frame-pointer -mfancy-math-387.

I compiled libtcpip_stubs.a with the flags above in Ubuntu 14.10 and the DHCP and static IP versions of static_website now seem to work without page fault.

@talex5

This comment has been minimized.

Contributor

talex5 commented Nov 17, 2014

I guess the -mno-red-zone is the most likely cause (I'm not sure how Mini-OS on x86 handles the stack).

@talex5

This comment has been minimized.

Contributor

talex5 commented Nov 17, 2014

@avsm what prevents normal OCaml code from assuming a red zone? Do we just hope that ocamlopt doesn't do that?

@avsm

This comment has been minimized.

Member

avsm commented Nov 17, 2014

Yes, we absolutely must compile with no red zone on MiniOS/x86_64, since it doesn't work when the whole application is running in a privileged ring.

On 17 Nov 2014, at 11:35, Thomas Leonard notifications@github.com wrote:

I guess the -mno-red-zone is the most likely cause (I'm not sure how Mini-OS on x86 handles the stack).


Reply to this email directly or view it on GitHub #80 (comment).

talex5 added a commit to talex5/mirage-tcpip that referenced this issue Nov 18, 2014

Build with -mno-red-zone on x86_64
Otherwise, an interrupt may overwrite part of the stack if we're in
checksum_stubs.c at the time. Should help with mirage#80.

Also added -fno-stack-protector in case that's on by default somewhere.

talex5 added a commit to talex5/mirage-tcpip that referenced this issue Nov 18, 2014

Build with -mno-red-zone on x86_64
Otherwise, an interrupt may overwrite part of the stack if we're in
checksum_stubs.c at the time. Should help with mirage#80.

Also added -fno-stack-protector in case that's on by default somewhere.

talex5 added a commit to talex5/mirage-tcpip that referenced this issue Nov 18, 2014

Build with -mno-red-zone on x86_64
Otherwise, an interrupt may overwrite part of the stack if we're in
checksum_stubs.c at the time. Should help with mirage#80.

Also added -fno-stack-protector in case that's on by default somewhere.
@MagnusS

This comment has been minimized.

Member

MagnusS commented Nov 21, 2014

I can confirm that the page fault is fixed in 14.10 with -mno-red-zone and -fno-stack-protector. Ubuntu patches gcc to enable stack protector by default: https://wiki.ubuntu.com/Security/Features

@avsm avsm closed this Dec 31, 2014

@avsm avsm reopened this Dec 31, 2014

@samoht samoht closed this Jun 10, 2015

samoht pushed a commit to samoht/mirage-tcpip that referenced this issue Apr 4, 2017

Build with -mno-red-zone on x86_64
Otherwise, an interrupt may overwrite part of the stack if we're in
checksum_stubs.c at the time. Should help with mirage#80.

Also added -fno-stack-protector in case that's on by default somewhere.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment