Node Repetition via Asterisk #78

tajmone · 2022-07-05T01:03:05Z

Instead of writing a chain of multiple non-breaking-space [sp] or newline [nl] nodes:

[sp][sp][sp]
[nl][nl]

we could instead adopt a multiplier-style syntax to indicate node repetitions:

[sp*3]
[nl*2]

Usually when one needs to use the [sp] node is because a single space is not enough, so this notation would allow to save space by using a single node to represent multiple spaces. Probably the need for multiple [nl] nodes is less frequent, since it's usually employed just to hard-break within a paragraph, but there might be cases where multiple newlines are needed — and, in any case, it might be worth supporting this notation for both nodes, for consistency sake.

Although this notation doesn't align with the general PML syntax adopted so far, IMO it makes sense in this context due to its simplicity and inactivity — since it resembles the customary multiplication syntax it's very easy to remember.

Both of these nodes are childless and don't support any type of attributes, which might further justify this notation applying to them as an exception to the rule.

The only other context where a similar notation might make sense it's table cells, where the [tc] node could adopt the x*y notation to indicate columns and rows spanning — e.g. [tc 2*3] for a cell spanning two columns and three rows; [tc 4*0]/[tc 4*] for spanning four columns; [tc 0*5]/[tc *5] spanning five rows, etc.; where a zero-value can be simply omitted, since the position of the other value in respect to the asterisk clearly indicates whether it refers to a column (lhs) or a row (rhs).

The text was updated successfully, but these errors were encountered:

pml-lang · 2022-07-05T07:33:27Z

adopt a multiplier-style syntax to indicate node repetitions

Great idea!

it might be worth supporting this notation for both nodes, for consistency sake

Yes.

Although this notation doesn't align with the general PML syntax adopted so far, ...

We need to avoid "special syntaxes for special cases". KISS! I also think that a special syntax is not necessary in this case, because lenient parsing (a standard PML feature) can be applied.

The standard (non-lenient) syntax could be:

[sp (count=3)]

Because this node has only attributes (no child nodes) we can omit the parenthesis and write:

[sp count=3]

Because count could be defined as the default attribute, the name can be omitted too and we can simply write:

[sp 3]

the [tc] node could adopt the x*y notation to indicate columns and rows spanning

IMO this syntax would not be very readable, because one has to remember the meaning and order of the x,y values. Moreover, distinguishing between [tc 4*] and [tc *4] is a bit challenging and error-prone.

[tc 2*3] for a cell spanning two columns and three rows

In this case I would suggest to stick with the standard syntax, which is more verbose, but also more readable, e.g.

[tc (cell_span=2 row_span=3) cell data]

I might be wrong, but I think that row/cell spanning arent frequent enough to justify a special syntax.

However, if succinct syntax is really a concern, we could support the following syntax that uses a standard attribute:

[tc (span=c2,r3) cell data]

Then we could apply lenient parsing again, to allow:

[tc (c2,r3) cell data] // cell and row span
[tc (c2) cell data]    // only cell span
[tc (r3) cell data]    // only row span

Side note: Column and row spans are currently not support through standard PML attributes of the tc node. However, one can use HTML attributes to apply spans:

[tc (html_colspan=2 html_rowspan=3) cell data]

tajmone · 2022-07-06T03:31:22Z

Space and Newline

I also think that a special syntax is not necessary in this case, because lenient parsing (a standard PML feature) can be applied. [...]
[sp (count=3)]
[sp count=3]
[sp 3]

Makes sense, and the ultra-lenient version ultimately looks even shorter, yet without breaking consistency.

Table Cells Spanning

In this case I would suggest to stick with the standard syntax, which is more verbose, but also more readable, e.g.
[tc (cell_span=2 row_span=3) cell data]

It should be colspan instead of cell_span, for it's always the cell that's spanning, regarding of the direction (maybe it was a typo?).

But then, this is a bit verbose and also very similar to the current way of using HTML attributes: (html_colspan=2 html_rowspan=3), so it would introduce little benefits compared to using HTML attributes (at least, from the typing perspective).

However, if succinct syntax is really a concern, we could support the following syntax that uses a standard attribute:
[tc (span=c2,r3) cell data]
[tc (c2,r3) cell data] // cell and row span
[tc (c2) cell data]    // only cell span
[tc (r3) cell data]    // only row span

I would say that in tables space is always a concern because ideally the goal would be to keep delimiters as unobtrusive as possible to make it visually clear where each cell and column end. I quite like this syntax proposal: it can be very compact in its lenient version, and also unambiguous thanks to prefixes c and r for column and row, which should also make it easier to parse in editor syntaxes.

When it comes to tables, we should also keep into account that there more attributes that might be needed to be expose common features, e.g. to control whether a table has borders, or how borders are applied to rows and cells (e.g. no vertical inner-borders, no inner-borders at all, etc.), or to support "stripes" (alternate row background colors as visual cues to separate rows tables without inner-borders), and other attributes.

Tables are an important feature in technical documentation, so end users can't have enough features when it comes to controlling tables layout and styles. Also, rich tables is possibly the main feature that separates AsciiDoc from the other markup syntaxes, most of which don't support tables with spanning cells.

So, before introducing new table attributes it's worth exploring what other attributes might potentially be added in the future, in order to ensure that all attributes ultimately work well with each other and that tables syntax remains intuitive and doesn't end up being too verbose.

Have a look at how many table related features are available in Asciidoctor:

Asciidoctor » Table Syntax and Attribute Reference

These give you a good idea of how various end users work with tables in real life projects, and which features will eventually be demanded of PML too.

Personally, I think that AsciiDoc has done a good job in allowing fine-grain control over tables via a very brief notation system, by adopting symbols combinations instead of words — but indeed, AsciiDoc tables are also the feature that requires most practice to learn by memory, since there tons of symbols combinations and their order precedence to remember, so ultimately it's more practical than intuitive really. But then, from the perspective of the frequent user, this is an advantage and not a problem, whereas having to deal with syntax verbosity after having mastered the syntax it's much worst.

When it comes to tables markups, the only truly "readable" and human friendly notations are those that mimic the actual layout (as in Markdown and pandoc pipe-tables), but these don't support cells spanning. Complex tables have to give up layout-based styling in exchange for explicit settings, there's no working around this (none that I've seen yet, at least). These settings will either be driven by keywords or symbols.

Consider the following AsciiDoc cell definition:

2.3+^.<e| cell data

where the 2.3+^.<e operators preceding the | cell delimiter indicate:

2.3 — this cell will span 2 columns and 3 rows.
^.< — its contents will be horizontally center-aligned and vertically aligned to the top.
e — the text will be styled in emphasis (instead of default table style).

You can see how packing all that info in just eight characters is much less intrusive than having to spell them out explicitly. Compare to how the standard PML syntax approach might look like if all these features where exposed via node attributes, even considering lenient parsing and shorthand keywords:

[tc (c2,r3 halign=c valign=t text_styke=em) cell data]

... of course, these are all hypothetical attributes which don't yet exist, but they mimic the logic that governs new attributes, so they should be fairly representative examples. As you can see, in terms of verbosity there's a huge difference from the 8-chars definition from AsciiDoc. In a complex table, where each cell overrides the default styling via attributes, you can easily end up with a source table that is very hard to read, not being able to quickly distinguish between row/cell delimiters and their contents, which can easily lead to missing out errors in table contents.

Tables are generally hard to work with, not just in lightweight markup languages but definitely more so in them. Yet, they are also one of the elements for which more features are demanded, since it's important to ensure that tables look good in the final document, and often in order to tame their layout users need to leverage attributes quite heavily (e.g. to prevent wrapping a certain column due to auto-width adjustment of columns, which is usually achieved via non-breaking spaces, etc.).

That is to say that, whereas the decision for the [sp] and [nl] nodes might be easier to undertake, when it comes to tables more caution is advised before venturing into new attributes, because unless one takes into account the big picture of how tables will ultimately cover all the various features, it could easily lead to syntax regrets.

While I'm all in favor for syntax consistency, there are cases where it might be worth considering whether an exception should be made in favor of symbols-based notation that allow packing complex settings in few characters. Maybe, an alternative notation could be introduced to optionally handle multiple attributes definitions via packed symbols — even HTML/CSS allows similar shorthands, e.g. when defining multiple attributes in sequence for an element, like background: <...> instead of chaining background-color, background-image, etc., where the compact version accept them all at once in a specific order.

Short notations like those you proposed above (c<n> and r<n> for column and row spanning values) are a good alternative to using non-alphanumeric symbols like ^ < > * etc., and could still lead to less verbose definitions; but the need for key-value pairs (since only the default pair can benefit from parsing leniency) might still lead to overall verbosity. This topic of compact syntax alternatives is worth exploring IMO.

tajmone closed this as completed Jul 5, 2022

tajmone reopened this Jul 5, 2022

pml-lang added the enhancement New feature or request label Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node Repetition via Asterisk #78

Node Repetition via Asterisk #78

tajmone commented Jul 5, 2022

pml-lang commented Jul 5, 2022

tajmone commented Jul 6, 2022

Node Repetition via Asterisk #78

Node Repetition via Asterisk #78

Comments

tajmone commented Jul 5, 2022

pml-lang commented Jul 5, 2022

tajmone commented Jul 6, 2022

Space and Newline

Table Cells Spanning