Parse errors are not user friendly... #26

dimitri · 2014-06-06T10:08:11Z

As seen in dimitri/pgloader#56 we have problems with users trying to decipher error messages. Is it possible to do something about it? Do you have plans to improve error reporting?

scymtym · 2014-06-06T18:59:00Z

Hi,

note that the esrap repository from which quicklisp fetches is scymtym/esrap.

Regarding your question: I have been working on parse error reports like

At

  t a { a b:double a b { a@string a b } { a b > } 
                         ^ (Line 1, Column 30, Position 30)

Could not parse NAME. Expected:

     any character satisfying ALPHANUMERICP
  or the character { (LEFT_CURLY_BRACKET)

Reached via

     STATEMENT
  -> (+ (OR SCOPE DECLARATION USE))
  -> (OR SCOPE DECLARATION USE)
  -> USE
  -> NAME
  -> (+ (ALPHANUMERICP CHARACTER))
  -> (ALPHANUMERICP CHARACTER)

Do you think that would help?

@nikodemus: would do you think about this?

dimitri · 2014-06-06T23:24:46Z

It would help tremendously, yes!

scymtym · 2014-06-07T12:39:41Z

OK, the code will need some work before it can go into master, but I will try to get it done as soon as possible.

scymtym · 2014-07-10T13:47:26Z

@dimitri, I pushed a draft of the improved parse error reporting to scymtym/esrap:wip-better-errors. In addition to the error position, parse errors report

the most specific rule that failed to apply
a list of characters, strings, etc. expected at the failure position
a derivation explaining how the failing rule was reached

I tested the code by hacking an Emacs completion backend that displays the expected strings or characters:

Could you test whether it would improve the situation for you? I would be interested to hear whether it works for you and whether error messages are too verbose/too terse/just right.

After some cleanup, the code could go into esrap master, I think.

PS: The completion hack above uncovered the following potential bug in the pgloader parser: the unix: scheme visible in the screenshot leads to the following error:

:UNKNOWN fell through ECASE expression.
Wanted one of (:MYSQL :POSTGRESQL).
   [Condition of type SB-KERNEL:CASE-FAILURE]

Restarts:
 0: [*ABORT] Return to SLIME's top level.
 1: [ABORT] Abort thread (#<THREAD "worker" RUNNING {128E6B01}>)

Backtrace:
  0: (SB-KERNEL:CASE-FAILURE ECASE :UNKNOWN (:MYSQL :POSTGRESQL))
  1: ((LAMBDA (PGLOADER.PARSER::URI #:START398 #:END399) :IN "/home/jmoringe/.local/share/common-lisp/quicklisp/dists/quicklisp/software/pgloader-3.0.99/src/parser.lisp") ((:TYPE :UNKNOWN) (:USER "a" :PASS..
      Locals:
        PGLOADER.PARSER::DBNAME = "bla"
        PGLOADER.PARSER::HOST = (:HOST "c")
        PGLOADER.PARSER::PASSWORD = "b"
        PGLOADER.PARSER::PORT = NIL
        PGLOADER.PARSER::TABLE-NAME = NIL
        TYPE#1 = :UNKNOWN
        PGLOADER.PARSER::URI = ((:TYPE :UNKNOWN) (:USER "a" :PASSWORD "b") (:HOST (:HOST "c") :PORT NIL) (:DBNAME "bla") NIL)
        PGLOADER.PARSER::USER = "a"
        PGLOADER.PARSER::USER#1 = "a"
        #:WHOLE400 = (:TYPE :UNKNOWN :USER "a" :PASSWORD "b" ...)
  2: ((LAMBDA NIL :IN ESRAP::COMPILE-RULE))
  3: (ESRAP::RESULT-PRODUCTION #S(ESRAP::RESULT :%PRODUCTION #<CLOSURE (LAMBDA NIL :IN ESRAP::COMPILE-RULE) {12997B55}> :POSITION 46))
  4: ((LAMBDA NIL :IN ESRAP::COMPILE-SEQUENCE))

The case failure is in line 342 of parser.lisp (Quicklisp version of pgloader).

dimitri · 2014-07-11T11:17:38Z

Hi,

Jan Moringen notifications@github.com writes:

@dimitri, I pushed a draft of the improved parse error reporting to
scymtym/esrap:wip-better-errors. In addition to the error position,
parse errors report

Thanks a lot for your work on that, it's going to be awesome!

I did git clone your project and branch and am trying it by injecting
random errors in my test load files and seeing what happens with the
following command:

(pgloader::with-monitor ()
(pgloader.parser::parse-commands-from-file
"/Users/dim/dev/pgloader/test/csv.load"))

As far as the error message goes, it's already way better, but as far as
understanding what's wrong in the source file goes, I think we need more
improvements.

Could you please try removing a comma separator in the file (I tested
just after the truncate keyword), or maybe removing the d letter from
the “enclosed” keyword, etc, and see what happens?

It seems that all I can get from those tests is the following error
message:

At end of input


  ^ (Line 51, Column 1, Position 1400)

Could not parse DOUBLED-COLON. Expected:

  the string "::"

Reached via

     DOUBLED-COLON
  -> (AND "::")
  -> "::"

Could not parse DOUBLED-AT-SIGN. Expected:

  the string "@@"

Reached via

     DOUBLED-AT-SIGN
  -> (AND "@@")
  -> "@@"

Could not parse DSN-USER-PASSWORD. Expected:

  the character @ (COMMERCIAL_AT)

Reached via

     DSN-USER-PASSWORD
  -> (AND USERNAME (? (AND ":" (? PASSWORD))) "@")
  -> "@"

It might that my esrap grammar is just confused, tho.

After some cleanup, the code could go into esrap master, I think.

Yes please! ;-)

PS: The completion hack above uncovered the following potential bug in
the pgloader parser: the unix: scheme visible in the screenshot
leads to the following error:

I will see about that, thanks for reporting!

:UNKNOWN fell through ECASE expression.

My understanding is that using ecase prevents the parser to do proper
backtracking here, and so we won't get to your nicer error messages in
case of typo, right?

Regards,

dim

scymtym · 2014-07-16T14:26:58Z

Could you please try removing a comma separator in the file (I tested just after the truncate keyword), or maybe removing the d letter from the “enclosed” keyword, etc, and see what happens?
[...]
It might that my esrap grammar is just confused, tho.

This is indeed caused by a problem in the grammar. I debugged this by doing (esrap:trace-rule 'commands :recursive t). Among many other things, the output contained

7: PGLOADER.PARSER::USERNAME 440-1367 -> "/pgloader?csv (a, b, d, c)

     WITH truncate,
          skip header = 1,
          fields optionally enclosed by '\"',
          fields escaped by double-quote,
          fields terminated by ','

      SET client_encoding to 'latin1',
          work_mem to '12MB'
          standard_conforming_strings to 'on'

   BEFORE LOAD DO
    $$ drop table if exists csv; $$,
    $$ create table csv (
        a bigint,
        b bigint,
        c char(2),
        d text
       );
    $$;



Stupid useless header with a © sign
\"2.6.190.56\",\"2.6.190.63\",\"33996344\",\"33996351\",\"GB\",\"United Kingdom\"
\"3.0.0.0\",\"4.17.135.31\",\"50331648\",\"68257567\",\"US\",\"United States\"
\"4.17.135.32\",\"4.17.135.63\",\"68257568\",\"68257599\",\"CA\",\"Canada\"
\"4.17.135.64\",\"4.17.142.255\",\"68257600\",\"68259583\",\"US\",\"United States\"
\"4.17.143.0\",\"4.17.143.15\",\"68259584\",\"68259599\",\"CA\",\"Canada\"
\"4.17.143.16\",\"4.18.32.71\",\"68259600\",\"68296775\",\"US\",\"United States\"
"

i.e. the username rule consumed all remaining input (that's why your report mentions failing at "end of input"). The username rule is defined as

(defrule username (+ (or (not (or ":" "@" )) doubled-at-sign doubled-colon))
  (:text t))

meaning that it can consume anything as long as it is neither ":" nor "@". Restricting username (e.g. disallowing whitespace or whatever is appropriate) should solve the problem.

As far as the error message goes, it's already way better, but as far as understanding what's wrong in the source file goes, I think we need more improvements.

Fixing the above problem should make the error messages much more useful. One problem will probably remain, though: in typical grammars, whitespace and comments are permitted almost everywhere. However, it is not helpful if virtually every error messages contains "you know, you could just insert whitespace or a comment at the problematic position". So I suppose we need a mechanism for excluding certain rules from error messages.

My understanding is that using ecase prevents the parser to do proper backtracking here, and so we won't get to your nicer error messages in case of typo, right?

Kind of. "Normally" the parser first applies rules without calling result producing code (:lambda, :destructure, etc.). In case the parse succeeds, result producing code is called to, well, produce parse results. This implies that it is not possible to influence the parse from within :lambda, :destructure, etc. Signaling an error in one of these will simply signal an error in the result production phase, but only after a successful parse.

The parse vs. result production phase separation is not absolute, though. "Semantic predicates" can be used to steer the parse based on intermediate results. A simple example would be

(defrule integer ...) ; returns parsed integer
(defrule odd-integer (oddp integer))

When odd-integer is tried, oddp is called with the production of the integer rule and odd-integer succeeds if oddp returns true.

dimitri · 2014-07-16T21:23:47Z

I did as you propose in dimitri/pgloader@0f3103d and I really like the error messages I get now. It's a huge improvement, thanks a lot for it!

Please consider merging that into main esrap already, even if some improvements can be think of for later revisions, such as avoiding mentioning some parts of the syntax tree such as the comments.

As per grammar review at nikodemus/esrap#26, improve the situation.

dimitri · 2014-07-16T21:26:50Z

I also fixed (I think) the dsn prefix parsing bits thanks to the patch linked here.

scymtym · 2014-07-18T22:29:27Z

I will try to clean up and merge the code as quickly as I can. No promises though.

dimitri · 2014-07-18T23:06:22Z

Thanks for your efforts and interest (and code) ;-)

scymtym · 2014-08-11T14:42:28Z

@dimitri Just to keep you informed: most of the improved error reporting code is approaching acceptable quality with tests and all. However, there is one design decision (scraping the required result information from the cache, "post-mortem" vs. storing more information in results while parsing) which needs at least benchmarking and potentially more thought.

Sorry for dragging this out like this but I would hate to sacrifice (much) performance for the improved error reporting. I think that is how @nikodemus would feel about this as well.

dimitri · 2014-08-11T16:24:07Z

Thanks a lot for the update, really. I understand that some design decisions are more tricky than others and time is needed. I'm still waiting for debian packaging work to release pgloader 3.1.0, I hope that with some luck we will be able to benefit from the new error reporting as soon as 3.1.0, let's see how it goes ;-)

fade · 2014-10-14T22:07:11Z

The better errors branch has already helped me dig myself out of a PEBCAK problem with pgloader, so thanks for the work on this. IMO, it's very valuable.

scymtym · 2016-01-30T13:50:17Z

A much improved version of this code is in https://github.com/scymtym/esrap master (which is the maintained upstream repository)

The WIP branch about better error messages made its way through the main code, so switch back to the mainline as available directly in Quicklisp. See nikodemus/esrap#26.

dimitri mentioned this issue Jun 24, 2014

Unhandled error condition with no information as to how to fix dimitri/pgloader#88

Closed

dimitri added a commit to dimitri/pgloader that referenced this issue Jul 16, 2014

Move some logic up to the parser.

e4948a3

As per grammar review at nikodemus/esrap#26, improve the situation.

dimitri mentioned this issue Aug 23, 2014

Error: Could not parse subexpression ";" when parsing dimitri/pgloader#109

Closed

dimitri mentioned this issue Mar 26, 2016

Error in parsing CSV load file dimitri/pgloader#376

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse errors are not user friendly... #26

Parse errors are not user friendly... #26

dimitri commented Jun 6, 2014

scymtym commented Jun 6, 2014

dimitri commented Jun 6, 2014

scymtym commented Jun 7, 2014

scymtym commented Jul 10, 2014

dimitri commented Jul 11, 2014

scymtym commented Jul 16, 2014

dimitri commented Jul 16, 2014

dimitri commented Jul 16, 2014

scymtym commented Jul 18, 2014

dimitri commented Jul 18, 2014

scymtym commented Aug 11, 2014

dimitri commented Aug 11, 2014

fade commented Oct 14, 2014

scymtym commented Jan 30, 2016

Parse errors are not user friendly... #26

Parse errors are not user friendly... #26

Comments

dimitri commented Jun 6, 2014

scymtym commented Jun 6, 2014

dimitri commented Jun 6, 2014

scymtym commented Jun 7, 2014

scymtym commented Jul 10, 2014

dimitri commented Jul 11, 2014

Regards,

scymtym commented Jul 16, 2014

dimitri commented Jul 16, 2014

dimitri commented Jul 16, 2014

scymtym commented Jul 18, 2014

dimitri commented Jul 18, 2014

scymtym commented Aug 11, 2014

dimitri commented Aug 11, 2014

fade commented Oct 14, 2014

scymtym commented Jan 30, 2016