Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression: exception End_of_file #39

Closed
hannesm opened this issue Apr 14, 2016 · 14 comments
Closed

regression: exception End_of_file #39

hannesm opened this issue Apr 14, 2016 · 14 comments

Comments

@hannesm
Copy link
Member

@hannesm hannesm commented Apr 14, 2016

while mirage-net-xen=1.4.1 ouputs:

ARP: updating 198.167.222.200 -> 00:30:48:9d:04:a0
RX exn End_of_file
RX exn End_of_file
RX exn End_of_file
RX exn End_of_file

mirage-net-xen=1.6.0 (and likely 1.5.0) outputs:

ARP: updating 198.167.222.200 -> 00:30:48:9d:04:a0
Fatal error: exception End_of_file
Raised at file "bytes.ml", line 220, characters 25-34
Called from file "src/pre_sexp.ml", line 86, characters 11-45
Mirage exiting with status 2
Do_exit called!

and kills the unikernel hard... to reproduce, this is a unikernel serving a HTTP website and force-reloading it from the browser...

@yomimono
Copy link
Contributor

@yomimono yomimono commented Apr 19, 2016

I'm able to trigger this with a simple invocation of curl.

@hannesm
Copy link
Member Author

@hannesm hannesm commented Apr 19, 2016

nice! so how did that pass QA ;)

@yomimono
Copy link
Contributor

@yomimono yomimono commented Apr 19, 2016

I notice there don't seem to be any tests in this repo...

@talex5
Copy link
Contributor

@talex5 talex5 commented Apr 20, 2016

I guess the error from 1.4.1 is coming from:

let rx_poll nf (fn: Cstruct.t -> unit Lwt.t) =
  MProf.Trace.label "Netif.rx_poll";
  Ring.Rpc.Front.ack_responses nf.rx_fring (fun slot ->
      let id,(offset,flags,status) = RX.Proto_64.read slot in
      let gref, page = Hashtbl.find nf.rx_map id in
      Hashtbl.remove nf.rx_map id;
      Gnt.Gntshr.end_access gref;
      Gnt.Gntshr.put gref;
      match status with
      |sz when status > 0 ->
        let packet = Cstruct.sub (Io_page.to_cstruct page) 0 sz in
        nf.stats.rx_pkts <- Int32.succ nf.stats.rx_pkts;
        nf.stats.rx_bytes <- Int64.add nf.stats.rx_bytes (Int64.of_int sz);
        ignore_result 
          (try_lwt fn packet
           with exn -> return (printf "RX exn %s\n%!" (Printexc.to_string exn)))
      |err -> printf "RX error %d\n%!" err
    )

The corresponding code in 1.6.0 is:

          Lwt.async (fun () ->
            Stats.rx nf.stats (Int64.of_int (Cstruct.len data));
            fn data
          )

Arguably, the new code is correct. An unhandled async exception from the application should be handled by the application's handler (Lwt default: crash with stack trace), not hard-coded as a print statement in mirage-net-xen.

@hannesm
Copy link
Member Author

@hannesm hannesm commented Apr 20, 2016

@talex5 I agree -- but unless someone digs into whatever causes the exception, I'd prefer to retain the behaviour, at least between minor versions. From a deployment point of view this makes a huge difference.

@talex5
Copy link
Contributor

@talex5 talex5 commented Apr 20, 2016

As a work-around, you can set Lwt.async_exception_hook to a handler that logs but continues.

@talex5
Copy link
Contributor

@talex5 talex5 commented Apr 20, 2016

On balance, I think it would be better to log these errors in tcpip. That can provide more context about the problem (e.g. End_of_file handling input from TCP stream [remote=192.168.1.3:12345]). Possibly we should log here too (once we have logs support), but only as a last resort.

@samoht
Copy link
Member

@samoht samoht commented May 3, 2016

Note: this issue made the deployement of mirage-www impossible with latest versions of mirage-net-xen. I'm pinning to 1.4.2.

@samoht samoht mentioned this issue May 3, 2016
1 of 4 tasks complete
@hannesm
Copy link
Member Author

@hannesm hannesm commented May 4, 2016

it is likely be mirage-http which throws

@talex5
Copy link
Contributor

@talex5 talex5 commented May 4, 2016

I'll change it to log then.

talex5 added a commit to talex5/mirage-net-xen that referenced this issue May 4, 2016
Since version 1.5 we've been passing exceptions to the user's Lwt
exception handler rather than printing to the console. It seems that
several users decided to stay with 1.4 rather than configure a handler.
This patch uses the new logging support to log an error and continue.
In addition, it also dumps out the frame that caused the crash.

Closes mirage#39
@hannesm
Copy link
Member Author

@hannesm hannesm commented May 4, 2016

my opinion on this is that the catch shouldn't be there... but in order to do that we should fix up all libraries up the stack which we have under control, then make releases (and at the same time, adjust [mirage-net-unix])https://github.com/mirage/mirage-net-unix/blob/ddb6d03893145309fa49f58a65298eb8c4fb1f24/lib/netif.ml#L126) to behave in the same way). since tcpip etc. do not directly depend on mirage-net-xen, I believe the safest is to use a mirage-types release for that (maybe the one where we switch to proper error handling?) [and related to the errors in https://github.com/mirage/mirage-net-xen/commit/6b1687c866eed9c23a239b2cdc7c32a4bc01bc33]...

@talex5
Copy link
Contributor

@talex5 talex5 commented May 4, 2016

I don't think changing types helps here. The listen callback isn't supposed to return an error, so the type is already correct.

The only choice is how to handle a bug in the callback function:

  • 1.4: always print and continue.
  • 1.5: crash by default; allow the user to print and continue for all async exceptions if desired.
  • this PR: log by default; allow the user to crash on Log.err if desired, and allow them to filter based on which component sent the error.

I'm happy with the behaviour in this PR.

@hannesm
Copy link
Member Author

@hannesm hannesm commented May 4, 2016

the type is correct, yes. I was thinking of using the mirage-types version to denote conflicts (for changes in behaviour)

EDIT: I'd prefer to not have any try in the codebase in the end. logging and errors shouldn't be mixed IMHO.

@talex5 talex5 closed this in #44 May 4, 2016
@talex5
Copy link
Contributor

@talex5 talex5 commented May 4, 2016

try is needed if we want to provide extra information when something goes wrong. I don't know what you mean about not mixing logging and errors. We have Log.err for a reason...

talex5 added a commit to talex5/mirage-tcpip that referenced this issue May 4, 2016
Before, these were passed back to the network driver. However, tcpip is
in a better position to report useful information. Example:

    [ERROR] uncaught exception accepting new TCP flow on port 80:
            (Failure "Test exception")

See: mirage/mirage-net-xen#39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants