Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in getservbyname? #136

Closed
samoht opened this issue Mar 15, 2015 · 12 comments
Closed

segfault in getservbyname? #136

samoht opened this issue Mar 15, 2015 · 12 comments

Comments

@samoht
Copy link

samoht commented Mar 15, 2015

While running some stress tests for cohttp, I got a segfault on OSX 10.10.2:

* thread #1: tid = 0x10f742, 0x00000001001c06b9 foo`caml_alloc_array + 99, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001001c06b9 foo`caml_alloc_array + 99
foo`caml_alloc_array + 99:
-> 0x1001c06b9:  cmpq   $0x0, 0x10(%r13,%rbx,8)
   0x1001c06bf:  leaq   0x1(%rbx), %rbx
   0x1001c06c3:  jne    0x1001c06b9               ; caml_alloc_array + 99
   0x1001c06c5:  cmpq   $-0x1, %rbx
(lldb) bt
* thread #1: tid = 0x10f742, 0x00000001001c06b9 foo`caml_alloc_array + 99, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001001c06b9 foo`caml_alloc_array + 99
    frame #1: 0x00000001001b1c7e foo`alloc_servent + 126
    frame #2: 0x00000001001af099 foo`result_getservbyname + 57
    frame #3: 0x00000001001b0f01 foo`lwt_unix_self_result + 49
    frame #4: 0x00000001001024b8 foo`.L573 + 20
@samoht
Copy link
Author

samoht commented Mar 15, 2015

I've also have some other weird errors on that stress test:

* thread #1: tid = 0x110b88, 0x00007fff904b0c22 libsystem_kernel.dylib`write + 10, queue = 'com.apple.main-thread', stop reason = signal SIGPIPE
    frame #0: 0x00007fff904b0c22 libsystem_kernel.dylib`write + 10
libsystem_kernel.dylib`write + 10:
-> 0x7fff904b0c22:  jae    0x7fff904b0c2c            ; write + 20
   0x7fff904b0c24:  movq   %rax, %rdi
   0x7fff904b0c27:  jmp    0x7fff904aac78            ; cerror
   0x7fff904b0c2c:  retq
(lldb) bt
* thread #1: tid = 0x110b88, 0x00007fff904b0c22 libsystem_kernel.dylib`write + 10, queue = 'com.apple.main-thread', stop reason = signal SIGPIPE
  * frame #0: 0x00007fff904b0c22 libsystem_kernel.dylib`write + 10
    frame #1: 0x0000000100301597 test.native`lwt_unix_bytes_write + 71
    frame #2: 0x00000001001f72a3 test.native`.L106 + 31

@samoht
Copy link
Author

samoht commented Mar 15, 2015

I might do something stupid, but I got another one:

* thread #1: tid = 0x112d44, 0x00007fff8c45c152 libsystem_c.dylib`strlen + 18, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00007fff8c45c152 libsystem_c.dylib`strlen + 18
libsystem_c.dylib`strlen + 18:
-> 0x7fff8c45c152:  pcmpeqb (%rdi), %xmm0
   0x7fff8c45c156:  pmovmskb %xmm0, %esi
   0x7fff8c45c15a:  andq   $0xf, %rcx
   0x7fff8c45c15e:  orq    $-0x1, %rax
(lldb) bt
* thread #1: tid = 0x112d44, 0x00007fff8c45c152 libsystem_c.dylib`strlen + 18, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00007fff8c45c152 libsystem_c.dylib`strlen + 18
    frame #1: 0x00000001001c091c client`caml_copy_string + 18
    frame #2: 0x00000001001b1f5d client`alloc_servent + 109
    frame #3: 0x00000001001af389 client`result_getservbyname + 57
    frame #4: 0x00000001001b11f1 client`lwt_unix_self_result + 49
    frame #5: 0x0000000100126af8 client`.L573 + 20

To repro:

(* server.ml *)
module Server = Cohttp_lwt_unix.Server

let respond body =
  Server.respond_string ~status:`OK ~body ()

let listen () =
  let callback (_, conn_id) req body = respond "Hello" in
  let config = Server.make ~callback () in
  Server.create ~mode:(`TCP (`Port 8081)) config

let () =
  Lwt_main.run (listen ())
(* client.ml *)
module Client = Cohttp_lwt_unix.Client

let post uri ?query body =
  let body = Cohttp_lwt_body.of_string body in
  Lwt.bind (Client.post ~body uri) (fun _ -> Lwt.return_unit)

let rec write fn = function
  | 0 -> Lwt.return_unit
  | i -> Lwt.join [fn i; write fn (i-1)]

let test () =
  write (fun i ->
      let uri = Uri.of_string (Printf.sprintf "http://127.0.0.1:8081/%d" i) in
      post uri "hoho"
    ) 1000

let () =
  Lwt_main.run (test ())

Makefile

all: client server

client: client.ml
    ocamlfind ocamlopt -package lwt.unix -package cohttp.lwt \
      client.ml -o client -linkpkg

server: server.ml
    ocamlfind ocamlopt -package lwt.unix -package cohttp.lwt \
      server.ml -o server -linkpkg
$ opam list cohttp lwt
# Available packages for fresh:
cohttp  0.15.2  HTTP library for Lwt, Async and Mirage
lwt      2.4.8  A cooperative threads library for OCaml

@fdopen
Copy link
Contributor

fdopen commented Mar 16, 2015

Can you post the output (src/unix/lwt_config.h) of the the configure script for OSX?

The wrappers look wrong for the not HAVE_NETDB_REENTRANT case. The data is not duplicated (

job->entry = FUNC(ARGS_CALL); \
). If another worker calls getservbyname before result_getservbyname from the previous request is called, memory corruption is possible.

What's the context for the sigpipe error? Isn't this expected behaviour for certain cases?
Signals handler are usually not registered inside libraries, but at application level.

@vouillon
Copy link
Member

They look wrong, indeed. I think there should be a lock as well, so that only one worker calls getservbyname at a time.

@fdopen
Copy link
Contributor

fdopen commented Mar 17, 2015

I've fixed some of the issues here: https://github.com/fdopen/lwt/tree/fix-136
Could someone try it out on osx, if the changes prevent the segfaults?

@samoht
Copy link
Author

samoht commented Mar 23, 2015

@fdopen it seems that your fix is working fine, thanks!

@samoht
Copy link
Author

samoht commented Mar 23, 2015

And here is the output of src/unix/lwt_config.h on OSX with latest master from upstream:

$ cat src/unix/lwt_config.h
#ifndef __LWT_CONFIG_H
#define __LWT_CONFIG_H
#define HAVE_LIBEV
#define HAVE_PTHREAD
//#define HAVE_EVENTFD
#define HAVE_FD_PASSING
//#define HAVE_GETCPU
//#define HAVE_AFFINITY
//#define HAVE_GET_CREDENTIALS_LINUX
//#define HAVE_GET_CREDENTIALS_NETBSD
//#define HAVE_GET_CREDENTIALS_OPENBSD
//#define HAVE_GET_CREDENTIALS_FREEBSD
#define HAVE_GETPEEREID
#define HAVE_FDATASYNC
//#define HAVE_NETDB_REENTRANT
#if defined(HAVE_GET_CREDENTIALS_LINUX) || defined(HAVE_GET_CREDENTIALS_NETBSD) || defined(HAVE_GET_CREDENTIALS_OPENBSD) || defined(HAVE_GET_CREDENTIALS_FREEBSD) || defined(HAVE_GETPEEREID)
#  define HAVE_GET_CREDENTIALS
#endif
//#define LWT_ON_WINDOWS
#endif

@samoht
Copy link
Author

samoht commented Mar 27, 2015

ping?

@avsm
Copy link
Collaborator

avsm commented Mar 27, 2015

@fdopen the fix in your tree looks good, but it means that the OCaml inliner cant optimise away the checks for a reentrant hostdb (since its an external C function call). This is pretty constant though:

+CAMLprim value lwt_have_netdb_reentrant(value u){
+  (void)u;
+#ifdef HAVE_NETDB_REENTRANT
+  return Val_int(1);
+#else
+  return Val_int(0);
+#endif
+}

if that were an ML file with a constrant true/false generated by discover.ml then it would be inlined easily.

@fdopen fdopen mentioned this issue Mar 27, 2015
@vbmithr
Copy link
Member

vbmithr commented Apr 10, 2015

I'm closing this. Please reopen if the issue appears again.

@vbmithr vbmithr closed this as completed Apr 10, 2015
@samoht
Copy link
Author

samoht commented Apr 24, 2015

Any plan for a minor release with that fix?

@samoht
Copy link
Author

samoht commented Jun 10, 2015

ping?

samoht added a commit to samoht/opam-repository that referenced this issue Aug 4, 2015
The HTTP tests are broken because of ocsigen/lwt#136,
so they need lwt 1.5.0 to pass (but we don't have a release of cohttp compatible
with that version of lwt yet).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants