Word Wrap overhaul - v0.7.0 (beta 1) #67

panoply · 2024-05-21T15:31:33Z

Description

The current logic for word wrap is handled on the lexer level for markup languages. In the Sparser / PrettyDiff implementation, the process for wrapping would be applied in both lexing and beautification cycles but this requires additional augmentation of the content wrap. The original handling made sense given that Sparser would be used in isolation, whereas in Æsthetic, the sparser algorithm is tightly coupled with the beautification processing.

Generally speaking, the current approach is fine BUT it will not produce correct word wrap on first run, instead it will take 2 beautification runs to get the desired output, while also requiring additional handling that can otherwise be skipped if wrap logic would be instead processed in the formatting (beautification) cycle. This overhaul encompasses major refactoring to be done at the core, with likely regression to be had, however it is a matter of necessity at this point.

Example

The main problem with the current tactic is that leading indentation levels are not being taken into consideration when wrapping in the parse (lexing) cycle and when we enter the formatting cycle we need to augment tokens in the data structure which have already undergone augmentation.

Current Lexing Cycle

Take the following code snippet, with an assumed wrap limit of 50 the following will occur during the lexing cycle:

<p>
Lorem ipsum, dolor sit amet consectetur adipisicing elit. Facilis quasi corrupti ipsam impedit nostrum odio.

Nulla accusantium repellat officiis voluptate similique aut sint reiciendis totam, aliquid, voluptatum qui consequuntur placeat!
</p>

During the lexing cycle, the above will be transformed to the following, assuming a wrap limit of 50

<p>
Lorem ipsum, dolor sit amet consectetur
adipisicing elit. Facilis quasi corrupti ipsam
impedit nostrum odio.

Nulla accusantium repellat officiis voluptate 
similique aut sint reiciendis totam, aliquid, 
voluptatum qui consequuntur placeat!
</p>

The resulting uniformed data structure will look something like this (omitting additional references for the sake of example):

{
  token: [
    '<p>',
    'Lorem ipsum, dolor sit amet consectetur\n
     adipisicing elit. Facilis quasi corrupti ipsam\n
     impedit nostrum odio.\n\n
     Nulla accusantium repellat officiis voluptate\n
     similique aut sint reiciendis totam, aliquid,\n
     voluptatum qui consequuntur placeat!',
    '</p>'
  ],
  types: [
    'start',   
    'content',
    'end'
  ]
}

The current approach will insert \n characters at the end of the text content provided, performing wrap without taking into consideration the indentation level to be imposed given the text content is contained within a <p> element, the wrap level will not be correctly applied.

As aforementioned, the logic in PrettyDiff was to patch this handling during the beautification cycle. The current logic in Æsthetic has actually skipped that additional process altogether, or it is either only processing at certain points or on certain tokens (i.e, attributes). The new tactic here will completely eliminate the imposed processing being done in the lexing cycle, instead the wrapping operations will be done during formatting.

New Lexing Cycle

The new approach here will be to significantly eliminate the operations happening during the lexing cycle, specifically the wrapping being incurred. Instead of capturing the entire text region as a token and suffixing the \n where wrap applies, instead new content types will be inserted into the data structure. Newline occurrences will signal to a new record insertion, unless the markup stripTextWrapLines rule is set the true, in such cases newlines will be stripped.

Based on the above, the new data structure will represent the following structure:

{
  token: [
    '<p>',
    'Lorem ipsum, dolor sit amet consectetur adipisicing elit. Facilis quasi corrupti ipsam impedit nostrum odio.',
    'Nulla accusantium repellat officiis voluptate similique aut sint reiciendis totam, aliquid, voluptatum qui consequuntur placeat!',
    '</p>'
  ],
  types: [
    'start',   
    'content',
    'content',
    'end'
  ]
}

Notice how above, the text content token entries will be accurately represented based on the provided input, opposed to augmented to adhere to wrap. Our newline separated text content inserts a new record. Based on this structure, we can now simply handle wrap during the beautification cycle in a single operation, and most importantly we can ensure that indentation is taken into account whilst applying wrap. The new output will instead be reflected as:

<p>
  Lorem ipsum, dolor sit amet consectetur
  adipisicing elit. Facilis quasi corrupti ipsam
  impedit nostrum odio. Nulla accusantium repellat
  officiis voluptate similique aut sint reiciendis
  totam, aliquid, voluptatum qui consequuntur
  placeat!
</p>

Possible Regression

The overhaul will open up some headaches on the liquid handling front, specifically when carrying out the following operations:

forceFilter
forceArgument

These 2 rules are currently processing on the lexing level, so consideration needs to be had on that front. Other relation rulesets will also be found in the attributes handling operations, but I do believe this is also handled in the current implementation.

The text was updated successfully, but these errors were encountered:

panoply added Enhancement New feature or request HTML HTML Language Liquid Liquid Language labels May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word Wrap overhaul - v0.7.0 (beta 1) #67

Word Wrap overhaul - v0.7.0 (beta 1) #67

panoply commented May 21, 2024

Word Wrap overhaul - v0.7.0 (beta 1) #67

Word Wrap overhaul - v0.7.0 (beta 1) #67

Comments

panoply commented May 21, 2024

Description

Example

Current Lexing Cycle

New Lexing Cycle

Possible Regression