Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when parsing many files #65

Closed
mjambon opened this issue Jul 13, 2020 · 3 comments
Closed

Segmentation fault when parsing many files #65

mjambon opened this issue Jul 13, 2020 · 3 comments
Labels
bug Something isn't working priority:medium to do, not blocking users tech debt works for now but fragile

Comments

@mjambon
Copy link
Member

mjambon commented Jul 13, 2020

See #64 for some context and ask @aryx for details on how to reproduce the crash.

keywords: segfault, crash, memory leak

@mjambon mjambon added the bug Something isn't working label Jul 13, 2020
@mjambon mjambon added priority:medium to do, not blocking users tech debt works for now but fragile labels Mar 29, 2021
@mjambon
Copy link
Member Author

mjambon commented Apr 12, 2021

@aryx can you provide instructions on how to reproduce the crash?

@aryx
Copy link
Collaborator

aryx commented Apr 13, 2021

Actually, it's either you get a segfault, or a giant memory leak. To get the segfault, which should be easier to debug, you need
first to modify semgrep/ocaml-tree-sitter/src/bindings/lib/bindings.c which right now contains:

static void finalize_tree(value v) {
  tree_W *p;
  p = (tree_W *)Data_custom_val(v);
  //TODO: ts_tree_delete(p->tree);
  // this caused some segfaults, probably during Gc after
  // analyzing many Ruby files. We go around this bug by
  // running the Ruby parser in a separate process so segfaults
  // or here memory leak do not reach the main semgrep-core process.
}

step1: uncomment the call to ts_tree_delete(p->tree).

then step2, in many of the semgrep-core/parsing/tree_sitter/Parse_xxx_tree_sitter.ml there will be code like

let parse file =
  H.wrap_parser
    (fun () ->
       Parallel.backtrace_when_exn := false;
       Parallel.invoke Tree_sitter_java.Parse.file file ()
    )
    (fun cst ->
       let env = { H.file; conv = H.line_col_to_pos file; extra = () } in
       program env cst
    )

which use Parallel.invoke to call Tree_sitter_java.Parse.file under a fork. If you call directly
Tree_sitter_java.Parse.file file, without the Parallel.invoke, and you use semgrep on a big java repository, at some point you'll get a segfault.

@mjambon
Copy link
Member Author

mjambon commented Sep 14, 2021

Fixed by semgrep/ocaml-tree-sitter-core#11

@mjambon mjambon closed this as completed Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:medium to do, not blocking users tech debt works for now but fragile
Development

No branches or pull requests

2 participants