Delimited Text Syntax and Unbalanced Delimiters #99

tajmone · 2023-02-24T02:03:50Z

@pml-lang, PML 4.0.0 now adopts the Delimited Text Syntax for raw-text blocks as the proper way to format raw-blocks (see also tajmone/Sublime-PML#40).
I've noticed that the PDML documentation mentions:

The closing delimiter line must contain at least as many delimiter characters as the opening delimiter line.

And I've tested it and discovered that in fact the closing delimiter can be longer than the opening delimiter, e.g.

[code
===============
print("Hello");
===================
]

I find this choice rather problematic and unconventional.

Usually markup syntaxes require that both delimiters are balanced (i.e. same characters and length), the reason being that it prevents issues in cases where the raw-text contains delimiters.

E.g. a PML listing block like this one requires that all delimiters in the raw-text are shorter than the enclosing block delimiters:

[code
    ====================
    [code
        ===============
        print("Hello");
        ===================
    ]
    ====================
]

Of course, the situation is manageable (one just needs to ensure that the outer delimiters are longer than any inner delimiters, or use different characters: " or ~), but I don't understand the benefits of supporting delimiters with unbalanced lengths.

Strictly balanced delimiters reduce the chances of clashes between the block delimiters and any raw-text contents that might cause a false positive match for the closing delimiter — especially when including long external listings via insert_code directives.

Furthermore, as far as I can tell it's impossible to correctly implement support for different-length delimiters in most syntax packages for PML because they rely on RegEx backreferences ($1 or \1) to match the same delimiter again — one could achieve this only by ignoring trailing characters, but the correct approach here would be to match the whole line, starting from the line-start anchor ^ up to the end-of-line $:

^(?=( |\t)*["=~]{3,})

Ultimately, it seems to me that supporting unbalanced delimiters has little or no benefits, but is a potential source of problems for both end users and syntax maintainers.

Any specific reason why you decided to allow closing delimiters to be longer than the opening delimiter?

The text was updated successfully, but these errors were encountered:

pml-lang · 2023-02-24T04:34:36Z

The only benefit is that the parser doesn't complain if the user types more characters for the closing delimiter than for the opening delimiter.

However, this also creates the problems you mentioned (e.g. plugin developments using regexes).

Therefore it's not worth to keep that lenient parsing rule.

I will change the rule to:

The closing delimiter line must contain the same number of delimiter characters as the opening delimiter line.

pml-lang · 2023-02-24T04:38:57Z

I will also add chapter "Nodes With Raw Text Content" to the PML user manual, because the rules are currently only explained in the PDML docs.

tajmone · 2023-02-24T04:45:14Z

Excellent! I think it's the right choice, and right now it's unlikely to be a problematic change, whereas in the future it could have broken existing projects (e.g. in case of accidental extra-lengths that went by unnoticed).

I also noticed that PMLC is tolerant for whitespace following the delimiter. I've just tweaked Sublime PML to allow them, so it's not a major issue in terms of affecting editors support, but you might want to consider whether in the "big picture" is better to enforce a strict notation (no trailing spaces) or allow the parser to be tolerant.

Although this might not be a big issue in this context, the idea is to keep the PML syntax as consistent as possible — be it tolerant and forgiving, or strict and enforcing.

I understand that in PML whitespace is generally not significant, but in this specific context indentation is important all the way through (from the opening delimiter, up to the closing delimiter), so it might make sense to consider whether trailing spaces should be consistent too for the delimiters.

I will also add chapter "Nodes With Raw Text Content" to the PML user manual, because the rules are currently only explained in the PDML docs.

Yes, that's really needed. I haven't yet had a chance to check the new repository with the unified documentation, but I will in the coming days.

tajmone added the enhancement New feature or request label Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delimited Text Syntax and Unbalanced Delimiters #99

Delimited Text Syntax and Unbalanced Delimiters #99

tajmone commented Feb 24, 2023

pml-lang commented Feb 24, 2023

pml-lang commented Feb 24, 2023

tajmone commented Feb 24, 2023

Delimited Text Syntax and Unbalanced Delimiters #99

Delimited Text Syntax and Unbalanced Delimiters #99

Comments

tajmone commented Feb 24, 2023

pml-lang commented Feb 24, 2023

pml-lang commented Feb 24, 2023

tajmone commented Feb 24, 2023