Support for Whole Document Parsing #5

ejgallego · 2016-06-11T17:05:56Z

@cpitclaudel and @JasonGross have requested to have whole Coq document parsing. This issue is to design and discuss how such a mechanism would work.

Current status [edit: updated, see below]

SerAPI provides a parsing command that will indicate where the current sentence ends:

(P (Parse "Lemma U n : n + n = n. Proof. intros."))
(Answer P Ack)
(Answer P
 (ObjList
  ((CoqLoc
    ((fname "") (line_nb 1) (bol_pos 0) (line_nb_last 1) (bol_pos_last 0)
     (bp 0) (ep 22))))))
(Answer P Completed)

This was meant as a proof of concept but it is useful in itself.

Proposal 1: extend to lists of positions:

A first step towards improving this would be to support a (Parse Sentence "doc").
Such command would return a list of CoqLoc objects signalling the end of each sentence. It would be up to the ide to split the doc.

If the last sentence is not a valid one, I wonder what to return. We could fail the whole parsing, just ignore it, or return the list of valid positions plus an exception.

[Edit]

Current Status

This feature is mostly implemented, StmAdd now takes a parameter indicating the number of sentences to parse, and will try to parse as many as indicated. I a parse error occurs, it will still add the number of well-parsed sentences.

To close this issue, it remains to:

Push Allow stream parsing from the Stm. [Under revision] coq/coq#204 upstream, adapt to it.
Be sure that IDE implementors can validate the current design.

The text was updated successfully, but these errors were encountered:

JasonGross · 2016-06-11T17:49:48Z

If the last sentence is not valid, at the very least, return the location of the start of the last sentence. I think returning the list of valid positions, together with an exception, would be best; this way I don't have to make an extra round-trip to parse the valid portion of the document if I want to do that.

ejgallego · 2016-06-11T18:22:20Z

I've pushed a bit more smart parsing in 6bb9b53. However, this is going to be hard to do well: As you can see in the below example, Coq parsing is extremely stateful so we must execute the document as we proceed with parsing!

(P (Parse "Lemma U n : n + n = n. Proof. intros n. "))
(Answer P Ack)
(Answer P
 (ObjList
  ((CoqLoc
    ((fname "") (line_nb 1) (bol_pos 0) (line_nb_last 1) (bol_pos_last 0)
     (bp 0) (ep 22)))
   (CoqLoc
    ((fname "") (line_nb 1) (bol_pos 0) (line_nb_last 1) (bol_pos_last 0)
     (bp 23) (ep 29))))))
(Answer P (CoqExn (Stream.Error "illegal begin of vernac")))

Here intros can't be recognized without processing both previous commands :S I'll think more, but this may be hard to realize effectively.

cpitclaudel · 2016-06-11T18:45:43Z

I think a simple enough interface (for IDEs) would be one that just returns the end of the current sentence. This way after guessing a good-ish location, IDEs would send the current sentence. They'd either get an OK, or an error; in the latter case, they'd simply send more, repeating until they've got at least one sentence.

Even that roundtripping might not be the best, though; a version of Add that just processes the first sentence might be nicer.

In any case, kudos on the great work!

ejgallego · 2016-06-11T18:48:02Z

Hi Clément, thanks for the comments! The simple interface should be working now, then, just use (Parse 1 "code").

Another possibility on parse would be to build the STM DAG using Stm.add and inform the IDE about it. Then, the IDE may observe nodes it is interested in.

Thus, in this case (Parse) would return a list of positions and stm object info for each sentence. I'll implement it soon I guess.

ejgallego · 2016-06-11T18:48:45Z

An additional problem with the current Parse is that somehow the input stream doesn't recognized newlines; looking into it...

ejgallego · 2016-06-12T23:28:24Z

StmAdd should gain capability for full document parsing soon, blocked on coq/coq#204

ejgallego · 2016-06-14T02:40:34Z

Preliminary design commited in d100e0e, waiting to resolve this upstream to close the issue.

ejgallego · 2016-06-26T00:38:55Z

A question we have been discussing with Enrico is what should we do on Edit/Cancel. What location information should SerAPI report?

Doing an EditAt may invalidate all the location information for sentences logically after the edited one, so this could imply that the Stm should store relative location info only.

cpitclaudel · 2016-06-26T02:18:06Z

Hmm. I'm not sure I understand the question fully. Do you mean that an Add to a state in the middle document can move things around? I think that's true. Worse, the user might have some unprocessed text in the middle of a proof.

So yes, I think the position information needs to be relative. But that won't be much of a problem on the IDE side, I expect. In most cases positions relative to the start of a sentence/state can be enough; for other cases where we want absolute positions (e.g. line and column numbers or absolute offsets in a file, say for the debugger), the IDE could send a list of pairs (state-id, position-of-beginning, position-of-end) to Coq to help it resync.

In fact, I don't think there's a way around this resyncing process: we want to allow users to add comments, spaces, and contents inside comments without canceling (PG already supports editing comments without cancelling, for example). We also want to make meaning-preserving changes without canceling (for example fully qualifying module names, which Company-coq can now do).

An alternative would of course be to add a HotSwap command that would replace a sentence with another one, but then Coq would need to analyze whether the whole thing is meaning-preserving, and that sounds painfully complex (if we go that route, then the whole document back and forth model would probably work just as well)

ejgallego · 2016-06-26T08:54:57Z

Indeed that is what I meant. Coq reports locations in several ways but it is not 100% clear when to report a relative vs absolute position, specially if we allow "whole document parsing".

I have done some tests with whole-document back and forth (with SerAPI caching). It is interesting but with the current edit model is just doesn't seem too effective.

ejgallego · 2017-03-23T02:31:08Z

I am close to solving this issue. Reliable full document parsing is implemented in the stm branch. The final bit remaining is to have a compounded definition of "document span". For this purpose, we will:

Add will take an "span id" plus a document possibly containing many sentences.
Span ids have an optional "part" number, which in the case of parse and feedback information will be used.
Cancel can take either an span id, in which case it will invalidate all the sentences in the span, or a pair of span id and subpart.
The workflow for Add is then, the client submits and add with its particularly chosen span id. It will receive location information first. If a cancel needs to be submitted, then there are two possibilities: the client has precise information about the cancel, then it just sends the subspan id. Client doesn't yet have the precise information, it must cancel the whole span.

We adapt to Coq changes motivate by SerAPI. We now have true and reliable full document "parsing" (in the Coq sense of stateful parsing). However to be able to close ejgallego#5 we still need to enrich our `Add` protocol with the notion of sub-document.

ejgallego mentioned this issue Jun 11, 2016

Define protocol and batching options for control commands [Was: Question about Acks] #2

Closed

ejgallego changed the title ~~Document parse support~~ Support for Whole Document Parsing Jun 12, 2016

ejgallego added this to the 0.03 milestone Jun 12, 2016

ejgallego self-assigned this Jun 12, 2016

ejgallego added the kind: upstream label Jun 14, 2016

ejgallego modified the milestones: 0.1, 0.03 Jun 15, 2016

ejgallego added the in-progress label Jun 15, 2016

ejgallego mentioned this issue Jan 19, 2017

Coquille support #31

Closed

2 tasks

ejgallego closed this as completed in fbb9176 Apr 13, 2017

This was referenced May 27, 2020

[serapi] New query Comments to return all comments in a document #205

Merged

Feature request: split comments, not just code ejgallego/coq-lsp#353

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Whole Document Parsing #5

Support for Whole Document Parsing #5

ejgallego commented Jun 11, 2016 •

edited

JasonGross commented Jun 11, 2016

ejgallego commented Jun 11, 2016 •

edited

cpitclaudel commented Jun 11, 2016

ejgallego commented Jun 11, 2016

ejgallego commented Jun 11, 2016

ejgallego commented Jun 12, 2016 •

edited

ejgallego commented Jun 14, 2016

ejgallego commented Jun 26, 2016

cpitclaudel commented Jun 26, 2016

ejgallego commented Jun 26, 2016

ejgallego commented Mar 23, 2017 •

edited

Support for Whole Document Parsing #5

Support for Whole Document Parsing #5

Comments

ejgallego commented Jun 11, 2016 • edited

Current status [edit: updated, see below]

Proposal 1: extend to lists of positions:

Current Status

JasonGross commented Jun 11, 2016

ejgallego commented Jun 11, 2016 • edited

cpitclaudel commented Jun 11, 2016

ejgallego commented Jun 11, 2016

ejgallego commented Jun 11, 2016

ejgallego commented Jun 12, 2016 • edited

ejgallego commented Jun 14, 2016

ejgallego commented Jun 26, 2016

cpitclaudel commented Jun 26, 2016

ejgallego commented Jun 26, 2016

ejgallego commented Mar 23, 2017 • edited

ejgallego commented Jun 11, 2016 •

edited

ejgallego commented Jun 11, 2016 •

edited

ejgallego commented Jun 12, 2016 •

edited

ejgallego commented Mar 23, 2017 •

edited