-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json output for the compiler, part 1: structured metadata for error messages #9979
base: trunk
Are you sure you want to change the base?
Conversation
Thank you for your PR and congrats to @muskangarg21! This should be very helpful to get nicer errors in dune, merlin, and ocaml-lsp! I can't comment on the implementation but wonder if we're planning to have more formal description of produced JSONs (e.g., what values can ...
"lines": [6, 8]
"characters": [3, 18]
... is a little ambiguous (does the first character number pertain to the first line number?) and perhaps could be better represented as ...
"start_pos" : { "line" : 6, "character" : 3 },
"end_pos": { "line" : 8, "character" : 18 },
... Replacing fixed-sized arrays with a json object would also cut down on code such as Anyway, thank you! |
243844b
to
099526d
Compare
Before we bikeshed on the output format: Just like there are DTDs to describe XML documents, I heard there are JSON schema to describe JSON documents. Would it be useful to write down a schema for the JSON data that is produced here? I didn't see it in the PR but may have missed it. |
Relevant link: https://json-schema.org/ |
well, the format I had in mind implied use of json schemas :-) |
It would be probably useful to have a json schema for the output. However, it opens the door to few design questions: do we want a hand-written schema, or one automatically generated? There is also the question of the coupling between the schema and the json printer, do we want to test that the printer is valid according to the schema, or would we rather make the printer valid by construction using a typed representation of the schema? |
What if we saw these questions as opportunities for incremental improvements, rather than choices that you want to make right now? Having a schema written down, even just for documentation, is better than not having one at all. Having a way to check validity (at runtime or compile-time) is better than not. (Among checking strategies, it's not clear that "typed representations" pull their complexity weight, so this is more of an exploration of tradeoffs.) Rather than the perfect/definitive solution, I think that you should go for the amount of effort you think is right for a first PR. (Maybe that's the current state, but then maybe not.) |
My initial request was just to have some reference (this is a good example) that users of the compiler's json output can use and have some discussion on the json schema design choices, so that the incremental changes later don't snowball less desirable choices made at the beginning. |
utils/misc.mli
Outdated
(** Store json fragments for the json output mode *) | ||
type json_fragments = | ||
{ | ||
toplevel_keys : Json.t Stdlib.String.Map.t ref; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be a mutable field instead?
@Octachron pointed out this PR to me when I asked about implementing ocaml/ocaml-lsp#258 This lsp issue requires me to distinguish between "unused value" and "deprecation" warnings. Off the top of my head, I don't see how this PR helps lsp. In lsp, we already have direct access to
Plus a similar change for alerts that would allow me to pick out the ones that correspond to deprecations. Where this PR does help is for reporting build errors. It makes it possible for us to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished a first pass of reviews. There is a fair number of questions and comments, because the diff itself is large. Overall this looks fairly good, and I must say that I am impressed with the breadth of changes to the compiler internals; this is a more ambitious change than I would have guessed. Thanks!
driver/main_args.ml
Outdated
@@ -676,6 +676,11 @@ let mk_error_style f = | |||
\ be set through the OCAML_ERROR_STYLE environment variable." | |||
;; | |||
|
|||
let mk_json f = | |||
"-json", Arg.Unit f, | |||
" Print the compiler messages in json format" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(when supported)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably a good point currently.
| `String of string | ||
| `Assoc of (string * t) list | ||
| `List of t list | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that this was designed to be compatible with the polymorphic-variant representation of other JSON libraries. Maybe this could be maybe explicit with a comment pointing to the definition in those libraries, so that future evolutions of the datatype (if any) can be manually checked to preserve this compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a related note, I know Yojson is the most popular json library in OCaml, but I've come to appreciate how jsonaf handles numbers in json more. It's not a huge deal, but I'd prefer if the compiler followed this convention instead.
utils/misc.mli
Outdated
type json_fragments = | ||
{ | ||
toplevel_keys : Json.t Stdlib.String.Map.t ref; | ||
error_key : Json.t list ref; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what error_key
means here, maybe a comment (or a different name) would help. (toplevel_keys
is clear enough)
utils/misc.mli
Outdated
(** Json mode: we are building a json output, and only print to the | ||
underlying formatter once the logging session ends. *) | ||
|
||
val logf : string -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the key
parameter could be a keyword?
toplevel/opttoploop.ml
Outdated
Misc.Log.logf "status" log "Interrupted.@."; | ||
Misc.Log.flush log; | ||
false | ||
| x -> Location.report_exception log x; Misc.Log.flush log; false) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would want to factorize the three flush
calls here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we could use a function Location.with_log : formatter -> (unit -> 'a) -> 'a
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have factorized the code with a with_log
function as much as possible (there is at least one case when we want to avoid printing empty object needlessly in the toplevel).
toplevel/opttoploop.ml
Outdated
|
||
let execute_phrase_with_log print_outcome log phr = | ||
Fun.protect (fun () -> execute_phrase print_outcome log phr) | ||
~finally:Warnings.reset_fatal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand what the reset_fatal
business is doing here, but this might be due to an inconsistency between toploop and opttoploop (which I am reviewing first).
I haven't looked at this yet, but as I said to @Octachron in private: please do not merge before I do. And here is something I haven't said to @Octachron in private: I wish there had been more (public?) communication, in particular with merlin's authors, prior to this work being actually done/started; of course I might just have had my head buried in the sand, and missed the discussions when they happened, in which case: my bad. |
Regarding previous communication: there was a first PR (#9538) on May 9th, 2020, and then another (#9758) on July 13th, 2020, and a discussion on caml-devel on the same day. (There was also an older issue, #7123 from 2016, where those PRs were reported.) This being said, I think that the onus is on downstream forkers to send us their stuff, rather than on compiler maintainers to pre-warn them of potential changes to the codebase. |
To be fair, @trefis's complaint is not the first I have heard about a lack of communication for this line of work. There doesn't seem to have been much discussion with the authors of tools that might consume this data (e.g. dune, ocaml-lsp, merlin, etc.). Perhaps we could get a description of some use cases that this work is intended to solve, and then check with the authors of the tools involved that this is what they actually need. |
Thanks for all the bookkeeping @gasche ! I don't want to derail this conversation too much, so let's call this my last message in this PR about communication. (You're of course free to answer my answer if you'd like :p) For the record: #9538 is about a separate tool, so I didn't feel particularly concerned, and didn't have anything to say about it.
Interesting, I'm pretty sure I remember instances of compiler maintainers (myself included) being worried by the potential impact of such or such PR, building opam-repository with it, and sometimes sending patches to users packages. |
@trefis: Let me restate that indeed, the PR will not be merged without your review. Concerning the content of the PR, one important point is that the PR and its second part is more wide than deep and mostly focuses on adding a JSON-structured metadata layer to all compiler messages. I expect most of those messages to be completely orthogonal to Merlin's changes. Similarly, the precise format ought to be discussed. Nevertheless, I thought that for basic metadata that discussion could happen in this very PR or amended later. That was probably an error in term of communication. I am not completely sure what is the best way to fix this mistake? A "Ocaml and tools" meeting maybe? |
We are still waiting for Merlin people to give input on this PR proposal. |
Thanks for the ping, that had dropped of my stack. |
I just had a look, and as @Octachron said this (so far) completely orthogonal to what's been done inside Merlin. Sorry for the delay! |
I discussed the issue of editor integration & compiler errors with the dune team today, and we thought of another use case where structured errors can be useful. For errors displayed in editors like vscode, it's better to display error messages formatted with a much wider margin so that they fit into a single line whenever possible. Thus, it would be good if the Format information was part of the error structure. |
099526d
to
320c88a
Compare
@rgrinberg , I could add the formatter geometry to the data. However, I am not sure how often it will be different from the default value, since there is no easy to change this default when using the compiler executables. |
320c88a
to
ce72559
Compare
For backward compatibility sake
1f0eea1
to
10b1ec6
Compare
I was curious to know what has been done inside merlin, and asked @voodoos. Answer: The |
This PR is the result of @muskangarg21's Outreachy internship.
The aim of this PR is to give a basic machine-readable output to avoid the need of parsing the compiler messages.
It adds a new
-json
flag to ocaml, ocamc, and ocamlopt that turns the compiler messages format to json.For instance trying to compile
with
# ocamlc -json test.ml
yields with this PR:
(the testsuite contains more involved examples).
Note that This PR doesn't alter the content of error messages, and only structures the associated metadata.
The current json scheme (an object with a single "error_report" field) has been chosen to be extensible.
In particular, @muskangarg21 has extended this output mode to all debugging output of the compiler (
-dprofile
,-dparsetree
,-d....
). Since this extension is mostly busy work, I have split it into a upcoming second PR.