Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delimited Text Syntax and Unbalanced Delimiters #99

Open
tajmone opened this issue Feb 24, 2023 · 3 comments
Open

Delimited Text Syntax and Unbalanced Delimiters #99

tajmone opened this issue Feb 24, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@tajmone
Copy link
Contributor

tajmone commented Feb 24, 2023

@pml-lang, PML 4.0.0 now adopts the Delimited Text Syntax for raw-text blocks as the proper way to format raw-blocks (see also tajmone/Sublime-PML#40).
I've noticed that the PDML documentation mentions:

The closing delimiter line must contain at least as many delimiter characters as the opening delimiter line.

And I've tested it and discovered that in fact the closing delimiter can be longer than the opening delimiter, e.g.

[code
===============
print("Hello");
===================
]

I find this choice rather problematic and unconventional.

Usually markup syntaxes require that both delimiters are balanced (i.e. same characters and length), the reason being that it prevents issues in cases where the raw-text contains delimiters.

E.g. a PML listing block like this one requires that all delimiters in the raw-text are shorter than the enclosing block delimiters:

[code
    ====================
    [code
        ===============
        print("Hello");
        ===================
    ]
    ====================
]

Of course, the situation is manageable (one just needs to ensure that the outer delimiters are longer than any inner delimiters, or use different characters: " or ~), but I don't understand the benefits of supporting delimiters with unbalanced lengths.

Strictly balanced delimiters reduce the chances of clashes between the block delimiters and any raw-text contents that might cause a false positive match for the closing delimiter — especially when including long external listings via insert_code directives.

Furthermore, as far as I can tell it's impossible to correctly implement support for different-length delimiters in most syntax packages for PML because they rely on RegEx backreferences ($1 or \1) to match the same delimiter again — one could achieve this only by ignoring trailing characters, but the correct approach here would be to match the whole line, starting from the line-start anchor ^ up to the end-of-line $:

^(?=( |\t)*["=~]{3,})

Ultimately, it seems to me that supporting unbalanced delimiters has little or no benefits, but is a potential source of problems for both end users and syntax maintainers.

Any specific reason why you decided to allow closing delimiters to be longer than the opening delimiter?

@tajmone tajmone added the enhancement New feature or request label Feb 24, 2023
@pml-lang
Copy link
Owner

The only benefit is that the parser doesn't complain if the user types more characters for the closing delimiter than for the opening delimiter.

However, this also creates the problems you mentioned (e.g. plugin developments using regexes).

Therefore it's not worth to keep that lenient parsing rule.

I will change the rule to:

The closing delimiter line must contain the same number of delimiter characters as the opening delimiter line.

@pml-lang
Copy link
Owner

I will also add chapter "Nodes With Raw Text Content" to the PML user manual, because the rules are currently only explained in the PDML docs.

@tajmone
Copy link
Contributor Author

tajmone commented Feb 24, 2023

Excellent! I think it's the right choice, and right now it's unlikely to be a problematic change, whereas in the future it could have broken existing projects (e.g. in case of accidental extra-lengths that went by unnoticed).

I also noticed that PMLC is tolerant for whitespace following the delimiter. I've just tweaked Sublime PML to allow them, so it's not a major issue in terms of affecting editors support, but you might want to consider whether in the "big picture" is better to enforce a strict notation (no trailing spaces) or allow the parser to be tolerant.

Although this might not be a big issue in this context, the idea is to keep the PML syntax as consistent as possible — be it tolerant and forgiving, or strict and enforcing.

I understand that in PML whitespace is generally not significant, but in this specific context indentation is important all the way through (from the opening delimiter, up to the closing delimiter), so it might make sense to consider whether trailing spaces should be consistent too for the delimiters.

I will also add chapter "Nodes With Raw Text Content" to the PML user manual, because the rules are currently only explained in the PDML docs.

Yes, that's really needed. I haven't yet had a chance to check the new repository with the unified documentation, but I will in the coming days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants