Remove column 1 restriction for labels with colons #635

ISSOtm · 2020-12-11T22:15:28Z

Partial fix for #457

What would be left to fully resolve that issue is doing the same for variable definitions (RB, EQU, etc.), which I tried implementing. There was one fundamental (correctness) problem, and a cleanness problem.

Implementation

After reading a (non-local) identifier that wasn't a keyword, the lexer discarded all leading whitespace. Then, without shifting (= using peek with a non-0 argument), it tries to match a symbol-declaring token (keyword or =), and returns T_LABEL if it does.

Correctness problem

If a macro invocation's arguments begins with such a token, which has valid use cases, it would be considered as a label. This is problematic especially for set and rl, which are valid tokens outside of assignments as well.

Cleanness problem

Using peek with an argument larger than 0 is not a good thing. This may expand arguments too early (in some particularly evil code with shift, I believe), and is less performance-efficient than if we had only peek(0). (I want to remove those in the long term).

An alternative would be to perform an exploratory lex, and cache the resulting token for the following lexer call... except that macro args may be lex-able properly, and the lexer reports errors as soon as possible. (I don't think adding a way to suppress errors for such an exploratory lex would be a good idea, either.)

Solutions

The grammar could be changed to have variable declarations use a prefix keyword, like other assemblers (name EQU value → EQU name value). Keep in mind that the root cause of this issue is that an identifier at the beginning of a line, if a macro name, must cause a lexer switch immediately after being lexed, so that macro args can be lexed properly.
A suggested alternative is to deprecate set, since = is a synonym; this would solve the correctness problem, but not the cleanness one. Further, it wouldn't solve the rl issue. (The reason why rl is currently not a problem is that the lexer never returns the token that the parser expects for that rule.)
I found that using {space} with space EQUS " " was a workaround to prevent the exploratory lex from detecting the token while preserving the output argument; however, it's a hack (additionally introducing a leading space in the argument, which might cause problems), and Allow {symbol} interpolation outside of strings #634 would break it by expanding {} pre-peek

ISSOtm · 2020-12-11T23:19:39Z

Note: update RGBASM(5) to mention that the restriction is relaxed for all labels, not just non-local ones, and for macro defs.

Maybe also mention that the colon must stick directly to the label name? This is actually not the case for non-local labels, but mentioning that in the documentation may make it a bit too confusing.

Rangi42 · 2020-12-12T00:10:20Z

For the sake of alternatives: you could deprecate set and rl (and dl). Renamed to rg and dg for long, or rd and dd for double-word.

aaaaaa123456789 · 2020-12-12T00:12:15Z

d for double word seems to be the most common usage. I've never seen g for this purpose.

Rangi42 · 2020-12-12T06:39:45Z

I think that removing set and changing rl/dl to rd/dd would be less disruptive than changing EQU and EQUS to be prefix instead of infix. (As for {space}, as you said, it's a hack and #634 will prevent it anyway.)

ISSOtm · 2020-12-12T10:50:28Z

This still requires some lookahead within the lexer, which is a bad idea for performance and cleanness, and also a maintenance penalty due to having to check specifically for such keywords. Then again, invoking macros with such a token as its first argument becomes impossible, which is merely problem whack-a-mole.

I agree it might be less disruptive, but I don't believe it's a good option. I do not like the latter two options I listed, I only wrote them in case someone had an idea how to resolve their shortcomings. I also realize that option 1 is quite problematic, since for example hardware.inc would have to be rewritten... but the fundamental problem is the grammar.

Rangi42 · 2021-01-11T01:44:54Z

This check:

	if (tokenType == T_ID && lexerState->atLineStart)
		return T_LABEL;

was requiring labels to be at the start of a line, and macro usages to be indented.

The modified check:

	if (tokenType == T_ID && (lexerState->atLineStart || peek(0) == ':'))
		return T_LABEL;

allows labels to be indented or not, but still requires macro usages to be indented, and can't handle mac rl 1 versus newRLLabel rl 1.

How about:

	if (tokenType == T_ID && (!sym || sym->type != SYM_MACRO || peek(0) == ':')
		return T_LABEL;

This should return T_LABEL for:

mac: MACRO
    println "\1"
ENDM
mac rl 1 ; T_ID, prints "rl 1"
    mac rl 1 ; T_ID, prints "rl 1"
foo rl 1 ; T_LABEL, defines foo
    foo rl 1 ; T_LABEL, complains about redefinition

That goes along with disallowing colon-less labels, but should not require reordering EQU/EQUS/SET/=, renaming rl/dl to rd/dd, or a weird hack with {space}.

ISSOtm · 2021-01-11T02:16:53Z

I refuse to make the behavior dependent on whether a symbol has been defined or not, or its type. It requires more context to understand how a given piece of code works, and I would get rid of EQUS's behavior if not for the large breakage that it'd cause.

SYM equs STRRPL("\1", "this", "that")
SYM equ 42
    purge SYM

First natural reaction is certainly a "what!?".

If mac rl 1 looks like a variable declaration, it should bark like a variable declaration. This is why I insist on finding another way to differentiate, if possible.

Rangi42 · 2021-01-11T03:57:33Z

Fair enough. That sort of SYM equ 42 code confuses me too; {SYM} equ 42 is now a valid equivalent.

Regarding the other solutions:

A changed definition syntax (like def SYM equ 42, def x = 10, etc, which would go with user-defined functions like def tiles(n) = n * $10 and with redef s equs strupr("{s}")) would IMO be the most consistent, but it's also very backwards-incompatible.
Deprecating set in favor of =, and renaming rl/dl to rd/dd, would make foo rd 1 distinct from mac rl 1 (foo set ... would already lex foo as a macro usage T_ID since colon-less labels are merged). However, this would make it impossible to pass = 1 or rd 1 as an arg to mac.
The {space} hack no longer works since Allow {symbol} interpolation outside of strings #634 merged.

Overall I'd rather keep the column 1 restriction, until maybe a 0.5.0 major version update when more severe changes might be acceptable.

ISSOtm · 2021-01-14T17:14:38Z

I think that the best solution is the def syntax, which we can introduce while deprecating the old syntaxes. (Labels wouldn't use def, and Lab: def X = 0 would also be allowed)

aaaaaa123456789 · 2021-01-14T17:26:13Z

I think that the best solution is the def syntax, which we can introduce while deprecating the old syntaxes. (Labels wouldn't use def, and Lab: def X = 0 would also be allowed)

def makes no sense for reassigning, though

Rangi42 · 2021-01-14T20:19:27Z

I don't look forward to updating all those definition lines, but if it's a major update that comes along with user-defined functions (which would replace some of our EQUS and macro defs anyway) I guess it's fine.

def makes no sense for reassigning, though

redef could also work for redefining an =.

ISSOtm · 2021-01-14T20:21:46Z

Should be regex-able. Further, this is the kind of deprecation that I would make last a couple versions, given the size of the change. Alternatively, the behavior could be toggled via a CLI option?

Rangi42 · 2021-01-14T20:28:34Z

If definition syntax is going to be changing anyway, and no longer resembling older assemblers so much, how about changing macro syntax? MACRO mac would fit in more than mac: MACRO, and would let users consistently recognize mac: as a label.

Rangi42 · 2021-01-14T20:41:07Z

So, the verbosity of def x = 1, def n equ $42, and so on still bothers me. How about something like the {space} hack but more robust? Allow \ in macro args just like \,, so that x equ 30 is always an assignment, and x\ equ 30 or x \ equ 30 is always passing ' equ 30' as \1 to x. (Currently that would complain "Begun line continuation, but encountered character 'e'"; under this proposal, that would be allowed inside macro args, and \ would only start a line continuation if it's followed solely by whitespace and possibly a comment. This would also more generally allow any of a macro's args to start or end with whitespace, without them always being stripped.)

Rangi42 · 2021-02-01T19:05:52Z

Re: renaming dl/rl, Sjasm also uses dd for 32-bit data. (RASM uses defi.)

Rangi42 · 2021-02-10T22:57:05Z

Should this merged as-is? (After rebasing to master, and updating rgbasm.5 to document that labels don't have to go in column 1.) It fully enables users to define labels, whether topLevel:, .local, or .localWithColon:, even past column 1. It doesn't fully resolve issue #457, but this is IMO a worthwhile addition on its own, for neater formatting of labels inside an if or macro or (upcoming) inline fragment.

Rangi42 · 2021-02-17T04:16:17Z

@ISSOtm You said in the #716 discussion that this isn't ready to merge. I'm not sure why, or at least, I think something mergeable could be made out of this if it doesn't do what I think.

These case ought to be distinguishable, regardless of leading whitespace:

    ColonLabel:
    .colonLocalLabel:
    .localLabelNoColon
    ColonMac: MACRO
    ENDM

The ambiguous case is this:

something stuff...

There something (no leading ., no trailing :) could be a macro invocation or a newly defined constant. Currently they're distinguished by whether something starts in column 1. They could also be distinguished by whether the following stuff is a definition token (set, =, equ, equs, rb, rw, rl) but that has the problems you discussed above and would be fragile anyway. They could also also be distinguished by whether something is already defined, but that's the C-style lexer hack and is rejected.

Anyway, even if macro invocations and definitions still have to depend on their starting column, I think allowing indented labels would be useful. For one thing, it would make indented multi-line inline fragments from #716 less awkward to write.

Partial fix for gbdev#457

ISSOtm · 2021-02-21T02:55:13Z

Unless CI fails, this should be merge-able. It doesn't solve the most wanted part of the issue, though.

As a temporary hack, we could allow indenting specifically for =... it would help slightly, since I believe = is the most common inside blocks.

Rangi42

The test case could also verify that indented mac: MACRO defines a macro.

ISSOtm added this to In progress in Roadmap Dec 24, 2020

Rangi42 mentioned this pull request Feb 12, 2021

Implement [[ inline fragments ]] #716

Draft

Remove column 1 restriction for labels with colons

43dcd01

Partial fix for gbdev#457

ISSOtm force-pushed the column-1 branch from 8b0f399 to 43dcd01 Compare February 21, 2021 02:49

Rangi42 approved these changes Feb 21, 2021

View reviewed changes

Roadmap automation moved this from In progress to Reviewer approved Feb 21, 2021

Roadmap automation moved this from Reviewer approved to Review in progress Feb 21, 2021

Rangi42 requested changes Feb 21, 2021

View reviewed changes

ISSOtm mentioned this pull request Feb 23, 2021

Allow declaration of data structures #98

Open

Rangi42 merged commit dafef5a into gbdev:master Feb 23, 2021

Roadmap automation moved this from Review in progress to Done Feb 23, 2021

Rangi42 mentioned this pull request Feb 25, 2021

[Feature debate] Remove column constraint on assignments and macro calls #457

Closed

ISSOtm removed this from Done in Roadmap Feb 28, 2021

ISSOtm deleted the column-1 branch March 19, 2021 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove column 1 restriction for labels with colons #635

Remove column 1 restriction for labels with colons #635

ISSOtm commented Dec 11, 2020

ISSOtm commented Dec 11, 2020

Rangi42 commented Dec 12, 2020

aaaaaa123456789 commented Dec 12, 2020

Rangi42 commented Dec 12, 2020 •

edited

ISSOtm commented Dec 12, 2020

Rangi42 commented Jan 11, 2021

ISSOtm commented Jan 11, 2021

Rangi42 commented Jan 11, 2021

ISSOtm commented Jan 14, 2021

aaaaaa123456789 commented Jan 14, 2021

Rangi42 commented Jan 14, 2021

ISSOtm commented Jan 14, 2021

Rangi42 commented Jan 14, 2021

Rangi42 commented Jan 14, 2021

Rangi42 commented Feb 1, 2021 •

edited

Rangi42 commented Feb 10, 2021 •

edited

Rangi42 commented Feb 17, 2021

ISSOtm commented Feb 21, 2021

Rangi42 left a comment

Remove column 1 restriction for labels with colons #635

Remove column 1 restriction for labels with colons #635

Conversation

ISSOtm commented Dec 11, 2020

Implementation

Correctness problem

Cleanness problem

Solutions

ISSOtm commented Dec 11, 2020

Rangi42 commented Dec 12, 2020

aaaaaa123456789 commented Dec 12, 2020

Rangi42 commented Dec 12, 2020 • edited

ISSOtm commented Dec 12, 2020

Rangi42 commented Jan 11, 2021

ISSOtm commented Jan 11, 2021

Rangi42 commented Jan 11, 2021

ISSOtm commented Jan 14, 2021

aaaaaa123456789 commented Jan 14, 2021

Rangi42 commented Jan 14, 2021

ISSOtm commented Jan 14, 2021

Rangi42 commented Jan 14, 2021

Rangi42 commented Jan 14, 2021

Rangi42 commented Feb 1, 2021 • edited

Rangi42 commented Feb 10, 2021 • edited

Rangi42 commented Feb 17, 2021

ISSOtm commented Feb 21, 2021

Rangi42 left a comment

Choose a reason for hiding this comment

Rangi42 commented Dec 12, 2020 •

edited

Rangi42 commented Feb 1, 2021 •

edited

Rangi42 commented Feb 10, 2021 •

edited