-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken Backtraces #714
Comments
Thanks for the report. I'm looking into this. It might take more than a few hours, because the solution might be to fundamentally improve Lwt's backtrace handling, not sure yet :) |
This commit avoids raising exceptions during Lwt's own usage of Lwt_pqueue. Such exceptions clobber the user's backtrace. This could also be solved by using raise_notrace. However, there may still be users of Lwt_pqueue, so I chose not to risk ruining their backtraces. Part of #714.
To address the particular problem in the issue, try this change: let test () =
let%lwt () = Lwt_io.printf "Test\n" in
- let%lwt () = Lwt.fail_with "Test" in
+ let%lwt () = failwith "Test" in
Lwt.return ()
let () =
Printexc.record_backtrace true;
test () |> Lwt_main.run This gives me the output:
The issue is that While debugging this, I also found that this still doesn't work: let test () =
let%lwt () = Lwt_unix.sleep 0.1 in
let%lwt () = failwith "Test" in
Lwt.return ()
let () =
Printexc.record_backtrace true;
test () |> Lwt_main.run It results in the output:
With the first attached commit, it results in:
This second issue is that OCaml stores backtraces in a global variable (thread-local in 4.10, I think, but the result is the same for Lwt). So, if an exception is (1) raised internally by Lwt, or (2) in an interfering Lwt control flow, between when a user exception is raised and its backtrace would be printed, the interfering exception clobbers the backtrace. Lwt was raising To deal with (2), we probably need to rework Lwt's backtrace handling. @Drup, can you comment on the history of current backtrace handling in the PPX? @vog, does this fix the immediate issue in your full project? If so, I'll release this in Lwt 4.3.1, and address backtraces fully later. |
I replaced Is there a simple way to test against the latest Or, does it make more sense for you to release Lwt 4.3.1 first, so I'll test against the new release? |
Yes, in your switch (in the project directory), run
|
Once testing with |
It still fails after pinning Lwt to master. I'll provide a new minimal example. |
Minimal example: let test () =
match%lwt Caqti_lwt.connect (Uri.of_string "sqlite3:test.db") with
| Error _ -> Lwt.return ()
| Ok (module Db : Caqti_lwt.CONNECTION) ->
let%lwt _ = Db.find (Caqti_request.find Caqti_type.unit Caqti_type.unit "select 1") () in
let%lwt () = Lwt_io.printf "Test" in
failwith "Test"
let () =
Printexc.record_backtrace true;
Lwt_main.run @@
test () Dune file:
Output:
Alas I was unable to get rid of the Caqti dependency. Every further simplification or replacement of the Caqti calls made the bug disappear. |
Have you tried simplifying Caqti as well? If you are doing this in a Dune workspace, it should be fairly easy to undo most of Caqti (if you haven't done so yet). This is what I ended up doing earlier to find the |
Sorry for asking again perhaps simple questions about Opam and Dune, but: Which steps are needed to edit Caqti? I tried to edit some file in What's the correct way to build with a locally modified version of the Caqti libraries? |
Since the code is already pretty simplified, you could just put the repro case directly into a clone of the Caqti repo as a new executable, and build it there, while simplifying Caqti (and the case, hopefully). For a more complex case, you should be able to make a Dune workspace that looks like this:
This structure may be useful if you need to pull in |
You can also test modified Caqti with opam, using an opam pin of the Caqti repo, but this is a much slower workflow, because you will have to fully compile and reinstall Caqti after each change. With a Dune workspace, you get incremental builds. |
Uh-oh, it seems that
I wonder how OPAM managed to install Caqti in my 4.08.1 switch, where it successfully compiled and worked flawlessly (except for the backtraces). I'll retry with a 4.07 and 4.06 switch, but there must be a better way, as it obviously is possible to compile it with 4.08, I just can't figure out how. |
I have met the same (or of very similar nature) problem before, see: https://discuss.ocaml.org/t/ocaml-and-backtraces/3542/7 |
I'm now using an OCaml 4.07.0 switch. The issue is still reproducible, and I can test with a custom Caqti build. However, simplifying and reducing Caqti is not straight forward ... |
@vog the difference is that when opam installs a package, it is installed in release mode, where most warnings remain warnings. So, those deprecation warnings are written to a log file, which you never see. When you build a package locally, it is built in development mode by default, in which deprecation warnings are errors. |
I opened paurkedal/ocaml-caqti#29 on Caqti about that. |
No yet reached a minimal example, but I found the following snippet which uses let poll ?(read = false) ?(write = false) ?timeout fd =
let choices =
[] |> (fun acc -> if read then Lwt_unix.wait_read fd :: acc else acc)
|> (fun acc -> if write then Lwt_unix.wait_write fd :: acc else acc)
|> Option.fold (fun t acc -> Lwt_unix.timeout t :: acc) timeout in
if choices = [] then
invalid_arg "Caqti_lwt.Unix.poll: No operation specified."
else
begin
Lwt.catch
(fun () -> Lwt.choose choices >|= fun _ -> false)
(function
| Lwt_unix.Timeout -> Lwt.return_true
| exn -> Lwt.fail exn)
end >>= fun timed_out ->
Lwt.return (Lwt_unix.readable fd, Lwt_unix.writable fd, timed_out) Obviously, replacing Perhaps fixing that snippet is sufficient to fix the issue? |
If that turns out to be the case, then we need to do the full backtrace handling rework, since there is no small patch to handle that case correctly. One thing you can do is use |
That poll function turned out not to be actually used by my minimal example. The current state is that I fully inlined Caqti, removed a lot of code, and I'm down to 722 lines. Due to dynamic stuff, as well as OCaml not checking unused parts of module types (just of modules), going further down is a bit tricky. |
That's a lot of minification :) If you get really stuck, please upload your remaining case to a gist, with clear instructions on how to reproduce the environment in which it is built, and I can take over. |
Thanks! I'm down to 677 lines. This is the Gist: https://gist.github.com/vog/c0ee4cff2aeff4840b4e6c947f7a7ec2 The "run" shell script contains the instructions. |
Another option for this change was to use Map.S.find_opt. However, this requires OCaml 4.05, but Lwt still supports 4.02. The code in this commit avoids having to select different implementations depending on the OCaml version. This code is not performance-critical, so doing two descents into the map should not be a problem. Part of #714.
I was able to reduce the case to: let foo : unit Lwt.key = Lwt.new_key ()
let test () =
let%lwt () = Lwt_unix.sleep 0.1 in
Lwt.async begin fun () ->
let%lwt () = Lwt.pause () in
ignore (Lwt.get foo);
Lwt.return_unit
end;
failwith "Test"
let () =
Printexc.record_backtrace true;
Lwt_main.run @@
test () It can probably be reduced even further, but by this point, I already knew that the culprit is Lines 794 to 802 in 25d31d6
A fix is attached, above. Before the fix, the above code results in
With the fix, it results in
The backtrace could use even more improvement, but it is at least somewhat accurate. @vog, the backtrace in the repro case from the gist you posted is now
Again, not completely accurate, but definitely improved. Could you test your full project again, with the latest Lwt |
I just re-tested with my real code and can confirm that the latest Lwt master is a huge improvement! Although there are still missing parts in the backtrace (as you described yourself), the backtrace is now good enough to find most errors immediately. No more guessing around. Thanks! For further analysis of the remaining issues, I propose the following additional minimal test case which produces a non-optimal backtrace: let test () =
let%lwt () = Lwt_unix.sleep 0.1 in
let%lwt () = Lwt_io.printf "test\n" in
let%lwt () = failwith "Test" in
Lwt.return ()
let () =
Printexc.record_backtrace true;
test () |> Lwt_main.run |
Good :) I opened #720 about "true" backtrace handling by Lwt, which is probably needed to solve remaining issues. That will be a medium-sized project for this repo, and I'm not sure yet if it can be made to work. So, I'm going to put that off for a 4.4.0 release, close this issue, and release the patches from this issue in Lwt 4.3.1 soon. I linked your test case from #720. Thanks for reporting this and working on it! |
The following Lwt program produces the expected backtrace:
However, when switching the order of lines 2 and 3, the backtrace is broken:
The expected output would have been:
Is this a bug in Lwt? If not, what am I doing wrong?
I was able to reproduce this issue with
lwt
4.3.0 andlwt_ppx
1.2.3 on vanillaocaml
4.08.1 and used the followingdune
file for building:Adding
(flags (-g))
makes no difference.The text was updated successfully, but these errors were encountered: