Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating-point literals with comma decimal separator gives parsing error #4983

Closed
Tragen opened this issue Jun 3, 2019 · 8 comments · Fixed by #5175
Closed

Floating-point literals with comma decimal separator gives parsing error #4983

Tragen opened this issue Jun 3, 2019 · 8 comments · Fixed by #5175
Assignees
Labels
antlr Issue is easier to resolve with knowledge of Antlr4 bug Identifies work items for known bugs difficulty-03-duck Involves more challenging problems and/or developing within and revising the internals API hacktoberfest Tags issues for Hacktoberfest 2019 localization

Comments

@Tragen
Copy link

Tragen commented Jun 3, 2019

Rubberduck Version:

2019-06-03 12:06:23.8281;INFO-2.4.1.4627;Rubberduck.Common.LogLevelHelper;
	Rubberduck version 2.4.1.4627 loading:
	Operating System: Microsoft Windows NT 6.2.9200.0 x64
	Host Product: Visual Basic x86
	Host Version: 6.00.9782
	Host Executable: VB6.EXE;

Description
There was already a bug with this problem fixed in #1178.
My problem is that our source code is totally mixed with
FontSize = 8,25
and
FontSize = 8.25
also in the same files. VB doesn't have any problems with it, but RD gives many parsing errors.

To Reproduce
Create a form with 2 TextBoxes, manually edit the file and change FontSize.
I think that happened a long time ago with different Windows and VB6 language
versions.

The code looks like this:

_ExtentX        =   9393
_ExtentY        =   5980
FontSize        =   8,25
TextVertical    =   1
TextHorizontal  =   0
PictureVertical =   0
PictureHorizontal=   0
Begin VB.CheckBox Check1
   Appearance      =   0  'Flat
   BackColor       =   &H00E2D3C9&
   Caption         =   "Check1"
   ForeColor       =   &H80000008&
   Height          =   225
   Left            =   705
   TabIndex        =   5
   Top             =   90
   Width           =   3750
End
Begin MSForms.TextBox Text1
   Appearance      =   0  'Flat
   BorderStyle     =   0  'None
   BeginProperty Font
      Name            =   "MS Sans Serif"
      Size            =   8.25
      Charset         =   8
      Weight          =   700
      Underline       =   0  'False
      Italic          =   0  'False
      Strikethrough   =   0  'False
   EndProperty
   Height          =   2535
End

Expected behavior
RD should parse both values correctly

Logfile

2019-06-03 12:06:50.2190;WARN-2.4.1.4627;Rubberduck.Parsing.VBA.Parsing.TokenStreamParserBase;SLL mode failed while parsing the AttributesCode version of module FRM_CHARLOCK at symbol , at L20C28. Retrying using LL.;
2019-06-03 12:06:50.2995;ERROR-2.4.1.4627;Rubberduck.Parsing.VBA.Parsing.ModuleParser;Syntax error; offending token ',' at line 20, column 28 in the AttributesCode version of module FRM_CHARLOCK.;
2019-06-03 12:06:51.0552;WARN-2.4.1.4627;Rubberduck.Parsing.VBA.Parsing.TokenStreamParserBase;SLL mode failed while parsing the AttributesCode version of module FRM_CHARGE at symbol , at L22C28. Retrying using LL.;
@Tragen Tragen added the bug Identifies work items for known bugs label Jun 3, 2019
@Vogel612 Vogel612 added antlr Issue is easier to resolve with knowledge of Antlr4 difficulty-03-duck Involves more challenging problems and/or developing within and revising the internals API localization labels Jun 3, 2019
@Vogel612
Copy link
Member

Vogel612 commented Jun 3, 2019

I took the liberty to reformat this a bit and change the screenshotted code into text. This allows devs to just copy-paste it into their local environment for verification :)

@retailcoder retailcoder changed the title Font size with comma gives parsing error Floating-point literals with comma decimal separator gives parsing error Jun 3, 2019
@Vogel612
Copy link
Member

Vogel612 commented Jun 3, 2019

As far as I can tell, this issue arises from how the lexer defines floating point literals:

FLOATLITERAL :
FLOATINGPOINTLITERAL FLOATINGPOINTTYPESUFFIX?
| DECIMALLITERAL FLOATINGPOINTTYPESUFFIX;
fragment FLOATINGPOINTLITERAL :
DECIMALLITERAL EXPONENT
| DECIMALLITERAL '.' DECIMALLITERAL? EXPONENT?
| '.' DECIMALLITERAL EXPONENT?;

Note the explicit use of '.' as a decimal point.

Considering how "early in the parsing pipeline" this is, any changes to the rule are potentially catastrophic.
A possible patch might be something along the following lines:

-     | DECIMALLITERAL '.' DECIMALLITERAL? EXPONENT?
-     | '.' DECIMALLITERAL EXPONENT?;
+     | DECIMALLITERAL DECIMAL_SEPARATOR DECIMALLITERAL? EXPONENT?
+     | DECIMAL_SEPARATOR DECIMALLITERAL EXPONENT?;
+ fragment DECIMAL_SEPARATOR : [,.];

This would open us up to misparsing different numerical arguments in VBA code, though.
Considering an invocation like:

Foo(123,152)

We need to get an argumentList with two arguments here. The precedence chain in the current grammar dictates parsing for an expression, which falls into literalExpression -> numberLiteral. That numberLiteral rule could then parse this as a floating point literal.

That is obviously incorrect. After all, this is two interger literal arguments.

In my personal opinion a change like this is highly dangerous.
I'd be happy to accept a rigorously tested patch, but I don't think the changes necessary stand in relation to the benefit we gain here.

@retailcoder
Copy link
Member

I agree, we can't just parse a decimal separator interchangeably as a comma or a dot.

I think we might need a separate VBAFormLayoutLexer for this.

@Tragen
Copy link
Author

Tragen commented Jun 3, 2019

Shouldn't this
Foo(123,152)
not be
Foo(123, 152)
The space makes the difference.
But I see your point.

@bclothier
Copy link
Contributor

That would be a pretty-printed version; however, AIUI, Foo(123,152) is grammatically valid and is documented so in MS-VBAL. The parser follows MS-VBAL, rather than the version you see in the IDE editor and thus is more lenient in that respect.

@retailcoder
Copy link
Member

retailcoder commented Jun 3, 2019

The code you read in the VBE in a validated line of code isn't the code you typed in the editor: it's a translation of the compiled instruction - that's why/how the VBE "prettifies" code and "autocorrects" certain constructs; that space is an artifact of this translation, it's nowhere in the language specifications; (see argument list) - having parser logic depend on how the VBE prettifies user code is something we need to avoid whenever possible.

@Tragen
Copy link
Author

Tragen commented Jun 3, 2019

I understand. Thanks.
I still cannot use RD when I fix this errors as I get an OutOfMemory exception also.

@Vogel612 Vogel612 added the hacktoberfest Tags issues for Hacktoberfest 2019 label Sep 20, 2019
@MDoerner
Copy link
Contributor

MDoerner commented Oct 1, 2019

Since this error appears in the form section, which we do not evaluate anyway (We just have to parse it to get to the actual module.), I would like to suggest to just patch up the parser rule for the forms part with an additional alternative INTEGER COMMA INTEGER. That should work around the issue that apparently forms can be saved with different regional settings, in contrast to the actual VBA code.

@MDoerner MDoerner self-assigned this Oct 1, 2019
Semi-automatic bug tracker automation moved this from ToDo to Done Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
antlr Issue is easier to resolve with knowledge of Antlr4 bug Identifies work items for known bugs difficulty-03-duck Involves more challenging problems and/or developing within and revising the internals API hacktoberfest Tags issues for Hacktoberfest 2019 localization
Projects
Development

Successfully merging a pull request may close this issue.

5 participants