New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression: exception End_of_file #39

Closed
hannesm opened this Issue Apr 14, 2016 · 14 comments

Comments

Projects
None yet
4 participants
@hannesm
Member

hannesm commented Apr 14, 2016

while mirage-net-xen=1.4.1 ouputs:

ARP: updating 198.167.222.200 -> 00:30:48:9d:04:a0
RX exn End_of_file
RX exn End_of_file
RX exn End_of_file
RX exn End_of_file

mirage-net-xen=1.6.0 (and likely 1.5.0) outputs:

ARP: updating 198.167.222.200 -> 00:30:48:9d:04:a0
Fatal error: exception End_of_file
Raised at file "bytes.ml", line 220, characters 25-34
Called from file "src/pre_sexp.ml", line 86, characters 11-45
Mirage exiting with status 2
Do_exit called!

and kills the unikernel hard... to reproduce, this is a unikernel serving a HTTP website and force-reloading it from the browser...

@yomimono

This comment has been minimized.

Member

yomimono commented Apr 19, 2016

I'm able to trigger this with a simple invocation of curl.

@hannesm

This comment has been minimized.

Member

hannesm commented Apr 19, 2016

nice! so how did that pass QA ;)

@yomimono

This comment has been minimized.

Member

yomimono commented Apr 19, 2016

I notice there don't seem to be any tests in this repo...

@talex5

This comment has been minimized.

Contributor

talex5 commented Apr 20, 2016

I guess the error from 1.4.1 is coming from:

let rx_poll nf (fn: Cstruct.t -> unit Lwt.t) =
  MProf.Trace.label "Netif.rx_poll";
  Ring.Rpc.Front.ack_responses nf.rx_fring (fun slot ->
      let id,(offset,flags,status) = RX.Proto_64.read slot in
      let gref, page = Hashtbl.find nf.rx_map id in
      Hashtbl.remove nf.rx_map id;
      Gnt.Gntshr.end_access gref;
      Gnt.Gntshr.put gref;
      match status with
      |sz when status > 0 ->
        let packet = Cstruct.sub (Io_page.to_cstruct page) 0 sz in
        nf.stats.rx_pkts <- Int32.succ nf.stats.rx_pkts;
        nf.stats.rx_bytes <- Int64.add nf.stats.rx_bytes (Int64.of_int sz);
        ignore_result 
          (try_lwt fn packet
           with exn -> return (printf "RX exn %s\n%!" (Printexc.to_string exn)))
      |err -> printf "RX error %d\n%!" err
    )

The corresponding code in 1.6.0 is:

          Lwt.async (fun () ->
            Stats.rx nf.stats (Int64.of_int (Cstruct.len data));
            fn data
          )

Arguably, the new code is correct. An unhandled async exception from the application should be handled by the application's handler (Lwt default: crash with stack trace), not hard-coded as a print statement in mirage-net-xen.

@hannesm

This comment has been minimized.

Member

hannesm commented Apr 20, 2016

@talex5 I agree -- but unless someone digs into whatever causes the exception, I'd prefer to retain the behaviour, at least between minor versions. From a deployment point of view this makes a huge difference.

@talex5

This comment has been minimized.

Contributor

talex5 commented Apr 20, 2016

As a work-around, you can set Lwt.async_exception_hook to a handler that logs but continues.

@talex5

This comment has been minimized.

Contributor

talex5 commented Apr 20, 2016

On balance, I think it would be better to log these errors in tcpip. That can provide more context about the problem (e.g. End_of_file handling input from TCP stream [remote=192.168.1.3:12345]). Possibly we should log here too (once we have logs support), but only as a last resort.

@samoht

This comment has been minimized.

Member

samoht commented May 3, 2016

Note: this issue made the deployement of mirage-www impossible with latest versions of mirage-net-xen. I'm pinning to 1.4.2.

@samoht samoht referenced this issue May 3, 2016

Closed

Deployment issue of 04/05/2016 #448

1 of 4 tasks complete
@hannesm

This comment has been minimized.

Member

hannesm commented May 4, 2016

it is likely be mirage-http which throws

@talex5

This comment has been minimized.

Contributor

talex5 commented May 4, 2016

I'll change it to log then.

talex5 added a commit to talex5/mirage-net-xen that referenced this issue May 4, 2016

Log exceptions from the listen callback
Since version 1.5 we've been passing exceptions to the user's Lwt
exception handler rather than printing to the console. It seems that
several users decided to stay with 1.4 rather than configure a handler.
This patch uses the new logging support to log an error and continue.
In addition, it also dumps out the frame that caused the crash.

Closes mirage#39
@hannesm

This comment has been minimized.

Member

hannesm commented May 4, 2016

my opinion on this is that the catch shouldn't be there... but in order to do that we should fix up all libraries up the stack which we have under control, then make releases (and at the same time, adjust [mirage-net-unix])https://github.com/mirage/mirage-net-unix/blob/ddb6d03893145309fa49f58a65298eb8c4fb1f24/lib/netif.ml#L126) to behave in the same way). since tcpip etc. do not directly depend on mirage-net-xen, I believe the safest is to use a mirage-types release for that (maybe the one where we switch to proper error handling?) [and related to the errors in https://github.com/mirage/mirage-net-xen/commit/6b1687c866eed9c23a239b2cdc7c32a4bc01bc33]...

@talex5

This comment has been minimized.

Contributor

talex5 commented May 4, 2016

I don't think changing types helps here. The listen callback isn't supposed to return an error, so the type is already correct.

The only choice is how to handle a bug in the callback function:

  • 1.4: always print and continue.
  • 1.5: crash by default; allow the user to print and continue for all async exceptions if desired.
  • this PR: log by default; allow the user to crash on Log.err if desired, and allow them to filter based on which component sent the error.

I'm happy with the behaviour in this PR.

@hannesm

This comment has been minimized.

Member

hannesm commented May 4, 2016

the type is correct, yes. I was thinking of using the mirage-types version to denote conflicts (for changes in behaviour)

EDIT: I'd prefer to not have any try in the codebase in the end. logging and errors shouldn't be mixed IMHO.

@talex5 talex5 closed this in #44 May 4, 2016

@talex5

This comment has been minimized.

Contributor

talex5 commented May 4, 2016

try is needed if we want to provide extra information when something goes wrong. I don't know what you mean about not mixing logging and errors. We have Log.err for a reason...

talex5 added a commit to talex5/mirage-tcpip that referenced this issue May 4, 2016

Log exceptions accepting TCP connections
Before, these were passed back to the network driver. However, tcpip is
in a better position to report useful information. Example:

    [ERROR] uncaught exception accepting new TCP flow on port 80:
            (Failure "Test exception")

See: mirage/mirage-net-xen#39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment