Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Whole Document Parsing #5

Closed
1 of 2 tasks
ejgallego opened this issue Jun 11, 2016 · 11 comments
Closed
1 of 2 tasks

Support for Whole Document Parsing #5

ejgallego opened this issue Jun 11, 2016 · 11 comments
Assignees
Milestone

Comments

@ejgallego
Copy link
Owner

ejgallego commented Jun 11, 2016

@cpitclaudel and @JasonGross have requested to have whole Coq document parsing. This issue is to design and discuss how such a mechanism would work.

Current status [edit: updated, see below]

SerAPI provides a parsing command that will indicate where the current sentence ends:

(P (Parse "Lemma U n : n + n = n. Proof. intros."))
(Answer P Ack)
(Answer P
 (ObjList
  ((CoqLoc
    ((fname "") (line_nb 1) (bol_pos 0) (line_nb_last 1) (bol_pos_last 0)
     (bp 0) (ep 22))))))
(Answer P Completed)

This was meant as a proof of concept but it is useful in itself.

Proposal 1: extend to lists of positions:

A first step towards improving this would be to support a (Parse Sentence "doc").
Such command would return a list of CoqLoc objects signalling the end of each sentence. It would be up to the ide to split the doc.

If the last sentence is not a valid one, I wonder what to return. We could fail the whole parsing, just ignore it, or return the list of valid positions plus an exception.

[Edit]

Current Status

This feature is mostly implemented, StmAdd now takes a parameter indicating the number of sentences to parse, and will try to parse as many as indicated. I a parse error occurs, it will still add the number of well-parsed sentences.

To close this issue, it remains to:

@JasonGross
Copy link
Collaborator

If the last sentence is not valid, at the very least, return the location of the start of the last sentence. I think returning the list of valid positions, together with an exception, would be best; this way I don't have to make an extra round-trip to parse the valid portion of the document if I want to do that.

@ejgallego
Copy link
Owner Author

ejgallego commented Jun 11, 2016

I've pushed a bit more smart parsing in 6bb9b53. However, this is going to be hard to do well: As you can see in the below example, Coq parsing is extremely stateful so we must execute the document as we proceed with parsing!

(P (Parse "Lemma U n : n + n = n. Proof. intros n. "))
(Answer P Ack)
(Answer P
 (ObjList
  ((CoqLoc
    ((fname "") (line_nb 1) (bol_pos 0) (line_nb_last 1) (bol_pos_last 0)
     (bp 0) (ep 22)))
   (CoqLoc
    ((fname "") (line_nb 1) (bol_pos 0) (line_nb_last 1) (bol_pos_last 0)
     (bp 23) (ep 29))))))
(Answer P (CoqExn (Stream.Error "illegal begin of vernac")))

Here intros can't be recognized without processing both previous commands :S I'll think more, but this may be hard to realize effectively.

@cpitclaudel
Copy link
Collaborator

I think a simple enough interface (for IDEs) would be one that just returns the end of the current sentence. This way after guessing a good-ish location, IDEs would send the current sentence. They'd either get an OK, or an error; in the latter case, they'd simply send more, repeating until they've got at least one sentence.

Even that roundtripping might not be the best, though; a version of Add that just processes the first sentence might be nicer.

In any case, kudos on the great work!

@ejgallego
Copy link
Owner Author

Hi Clément, thanks for the comments! The simple interface should be working now, then, just use (Parse 1 "code").

Another possibility on parse would be to build the STM DAG using Stm.add and inform the IDE about it. Then, the IDE may observe nodes it is interested in.

Thus, in this case (Parse) would return a list of positions and stm object info for each sentence. I'll implement it soon I guess.

@ejgallego
Copy link
Owner Author

An additional problem with the current Parse is that somehow the input stream doesn't recognized newlines; looking into it...

@ejgallego ejgallego changed the title Document parse support Support for Whole Document Parsing Jun 12, 2016
@ejgallego ejgallego added this to the 0.03 milestone Jun 12, 2016
@ejgallego ejgallego self-assigned this Jun 12, 2016
@ejgallego
Copy link
Owner Author

ejgallego commented Jun 12, 2016

StmAdd should gain capability for full document parsing soon, blocked on coq/coq#204

@ejgallego
Copy link
Owner Author

Preliminary design commited in d100e0e, waiting to resolve this upstream to close the issue.

@ejgallego
Copy link
Owner Author

A question we have been discussing with Enrico is what should we do on Edit/Cancel. What location information should SerAPI report?

Doing an EditAt may invalidate all the location information for sentences logically after the edited one, so this could imply that the Stm should store relative location info only.

@cpitclaudel
Copy link
Collaborator

Hmm. I'm not sure I understand the question fully. Do you mean that an Add to a state in the middle document can move things around? I think that's true. Worse, the user might have some unprocessed text in the middle of a proof.

So yes, I think the position information needs to be relative. But that won't be much of a problem on the IDE side, I expect. In most cases positions relative to the start of a sentence/state can be enough; for other cases where we want absolute positions (e.g. line and column numbers or absolute offsets in a file, say for the debugger), the IDE could send a list of pairs (state-id, position-of-beginning, position-of-end) to Coq to help it resync.

In fact, I don't think there's a way around this resyncing process: we want to allow users to add comments, spaces, and contents inside comments without canceling (PG already supports editing comments without cancelling, for example). We also want to make meaning-preserving changes without canceling (for example fully qualifying module names, which Company-coq can now do).

An alternative would of course be to add a HotSwap command that would replace a sentence with another one, but then Coq would need to analyze whether the whole thing is meaning-preserving, and that sounds painfully complex (if we go that route, then the whole document back and forth model would probably work just as well)

@ejgallego
Copy link
Owner Author

Indeed that is what I meant. Coq reports locations in several ways but it is not 100% clear when to report a relative vs absolute position, specially if we allow "whole document parsing".

I have done some tests with whole-document back and forth (with SerAPI caching). It is interesting but with the current edit model is just doesn't seem too effective.

@ejgallego ejgallego mentioned this issue Jan 19, 2017
2 tasks
@ejgallego
Copy link
Owner Author

ejgallego commented Mar 23, 2017

I am close to solving this issue. Reliable full document parsing is implemented in the stm branch. The final bit remaining is to have a compounded definition of "document span". For this purpose, we will:

  • Add will take an "span id" plus a document possibly containing many sentences.
  • Span ids have an optional "part" number, which in the case of parse and feedback information will be used.
  • Cancel can take either an span id, in which case it will invalidate all the sentences in the span, or a pair of span id and subpart.
  • The workflow for Add is then, the client submits and add with its particularly chosen span id. It will receive location information first. If a cancel needs to be submitted, then there are two possibilities: the client has precise information about the cancel, then it just sends the subspan id. Client doesn't yet have the precise information, it must cancel the whole span.

Ptival pushed a commit to Ptival/coq-serapi that referenced this issue Jul 14, 2017
We adapt to Coq changes motivate by SerAPI. We now have true and
reliable full document "parsing" (in the Coq sense of stateful
parsing).

However to be able to close ejgallego#5 we still need to enrich our `Add`
protocol with the notion of sub-document.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants