Skip to content

ENH: tdb: ParseException returns correct line numbers#550

Merged
richardotis merged 5 commits intopycalphad:developfrom
richardotis:fix-tdb-error-lineno
Aug 2, 2024
Merged

ENH: tdb: ParseException returns correct line numbers#550
richardotis merged 5 commits intopycalphad:developfrom
richardotis:fix-tdb-error-lineno

Conversation

@richardotis
Copy link
Collaborator

@richardotis richardotis commented Jul 26, 2024

In the TDB parser, the pyparsing grammar only sees one command-delimited (!-delimited) line at a time. In the event of a parser exception, the error reporting metadata incorrectly returns a line number of 1.

This PR modifies the ParseException object raised by the parser to refer to the (almost) original input text. We also take the opportunity to shorten the error message returned by the string representation of the object. To ensure the column number printed in the error message aligns with the input text, we remove some pre-processing of the text where whitespace was stripped (the pyparsing grammar for TDB already skips whitespace).

Example

from pycalphad import Database
Database("""     PARAMETER G(BCC,FE:H;0) 298.15  +GHSERFE+1.5*GHSERHH
        +258000-3170*T+498*T*LN(T)-0.275*T**2; 1811.00  Y
        +232264+82*T+1*GHSERFE+1.5*GHSERHH; 6000.00  N

     PARAMETER G(BCC,FE:VA;0)      298.15 +GHSERFE; 6000 N ZIM !
""")

Current Behavior (develop)

Failed while parsing: PARAMETER G(BCC,FE:H;0) 298.15 +GHSERFE+1.5*GHSERHH +258000-3170*T+498*T*LN(T)-0.275*T**2; 1811.00 Y +232264+82*T+1*GHSERFE+1.5*GHSERHH; 6000.00 N  PARAMETER G(BCC,FE:VA;0) 298.15 +GHSERFE; 6000 N ZIM 
Tokens: None
[omit full traceback]
pyparsing.exceptions.ParseException: Expected {{'ELEMENT' W:(-/A-Za-z){1,2} W:(()-/-:A-Z_a-z) Re:('[-+]?([0-9]+\.(?!([0-9]|[eE])))|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)') Re:('[-+]?([0-9]+\.(?!([0-9]|[eE])))|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)') Re:('[-+]?([0-9]+\.(?!([0-9]|[eE])))|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)') LineEnd} | {'SPECIES' W:(*+--9A-Z_a-z) [Suppress:('%')] Group:({{W:(A-Za-z){1,2} [Re:('[-+]?([0-9]+\.(?!([0-9]|[eE])))|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)')]}}...) [Suppress:('/') W:(+-0-9)] LineEnd} | {'TYPE_DEFINITION' Suppress:(<SP><TAB><CR><LF>) !W:( !) SkipTo:(LineEnd)} | {'FUNCTION' W:(()-/-:A-Z_a-z) {{Re:('[-+]?([0-9]+\.(?!([0-9]|[eE])))|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)') | [',']...} {{SkipTo:(';') Suppress:(';') [Suppress:(',')]... [Re:('[-+]?([0-9]+\.(?!([0-9]|[eE])))|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)')] Suppress:([(Yy)])}}... Suppress:([(Nn)])} [Suppress:(W:(-0-:A-Z_a-z))] LineEnd} | {'ASSESSED_SYSTEMS' SkipTo:(LineEnd)} | {'DEFINE_SYSTEM_DEFAULT' SkipTo:(LineEnd)} | {'DEFAULT_COMMAND' SkipTo:(LineEnd)} | {'DATABASE_INFO' SkipTo:(LineEnd)} | {'VERSION_DATE' SkipTo:(LineEnd)} | {'REFERENCE_FILE' SkipTo:(LineEnd)} | {'ADD_REFERENCES' SkipTo:(LineEnd)} | {'LIST_OF_REFERENCES' SkipTo:(LineEnd)} | {'TEMPERATURE_LIMITS' SkipTo:(LineEnd)} | {'PHASE' W:(()-/-:A-Z_a-z) Suppress:(<SP><TAB><CR><LF>) !W:( !) Suppress:(<SP><TAB><CR><LF>) Suppress:(W:(0-9)) Group:({Re:('[-+]?([0-9]+\.(?!([0-9]|[eE])))|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)')}...) Suppress:(SkipTo:(LineEnd))} | {'CONSTITUENT' W:(()-/-:A-Z_a-z) Suppress:(<SP><TAB><CR><LF>) Suppress:(':') Group:(Group:({{[Suppress:(',')] W:(*+--9A-Z_a-z) [Suppress:('%')]}}...) [: Group:({{[Suppress:(',')] W:(*+--9A-Z_a-z) [Suppress:('%')]}}...)]...) Suppress:(':') LineEnd} | {'PARAMETER' {'BMAGN' | 'DF' | 'DQ' | 'ELRS' | 'G' | 'GD' | 'L' | 'MF' | 'MQ' | 'NT' | 'SIGM' | 'TC' | 'THCD' | 'THETA' | 'V0' | 'VA' | 'VC' | 'VISC' | 'VK' | 'VS' | 'XI'} Suppress:('(') W:(()-/-:A-Z_a-z) [Suppress:('&') W:(-/A-Za-z){1,2}] Suppress:(',') Group:(Group:({{[Suppress:(',')] W:(*+--9A-Z_a-z) [Suppress:('%')]}}...) [: Group:({{[Suppress:(',')] W:(*+--9A-Z_a-z) [Suppress:('%')]}}...)]...) [Suppress:(';') W:(0-9)] Suppress:(')') {{Re:('[-+]?([0-9]+\.(?!([0-9]|[eE])))|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)') | [',']...} {{SkipTo:(';') Suppress:(';') [Suppress:(',')]... [Re:('[-+]?([0-9]+\.(?!([0-9]|[eE])))|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)')] Suppress:([(Yy)])}}... Suppress:([(Nn)])} [Suppress:(W:(-0-:A-Z_a-z))] LineEnd} | {'ZEROVOLUME_SPECIES' SkipTo:(LineEnd)} | {'DIFFUSION' SkipTo:(LineEnd)}}, found 'G'  (at char 158), (line:1, col:159)

New Behavior

[omit full traceback]
pyparsing.exceptions.ParseException: Invalid TDB syntax.
     PARAMETER G(BCC,FE:H;0) 298.15  +GHSERFE+1.5*GHSERHH         +258000-3170*T+498*T*LN(T)-0.275*T**2; 1811.00  Y         +232264+82*T+1*GHSERFE+1.5*GHSERHH; 6000.00  N       PARAMETER G(BCC,FE:VA;0)      298.15 +GHSERFE; 6000 N ZIM 
                                                                                                                                                                                           ^, found 'G'  (at char 187), (line:5, col:16)

Note for this example we might say that returning an error at the end of line 3 would be more intuitive for the user, and that would be correct; for this case the parser cannot distinguish between a missing command delimiter and a reference key designation until encountering the G character (after the space) in the next full line. If we improve the grammar to distinguish these cases, the error message will automatically become more useful.

@codecov
Copy link

codecov bot commented Jul 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.77%. Comparing base (c24d881) to head (fd1ebb9).
Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #550      +/-   ##
===========================================
+ Coverage    91.74%   91.77%   +0.02%     
===========================================
  Files           77       77              
  Lines        11981    11996      +15     
===========================================
+ Hits         10992    11009      +17     
+ Misses         989      987       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@richardotis richardotis requested a review from bocklund August 2, 2024 16:10
@richardotis richardotis linked an issue Aug 2, 2024 that may be closed by this pull request
Copy link
Collaborator

@bocklund bocklund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Appears to be performance neutral for successful read times when tested against a large dataset of databases and a really nice quality of life improvement.

@richardotis richardotis merged commit 176909c into pycalphad:develop Aug 2, 2024
@richardotis richardotis deleted the fix-tdb-error-lineno branch August 2, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preserve line numbers in TDB parser tracebacks?

2 participants