0.4.0: indentation issues #40

winterkind · 2020-01-20T15:46:28Z

Using 0.4.0 I tested one of our XML files.

Settings:

tabWidth: 4
printWidth: 120

Before:

<core:FragmentDefinition
  xmlns:core="sap.ui.core"
  xmlns:m="sap.m"
  xmlns:dev="sap.ead.c4s.ui.view.control"
  xmlns:devEl="sap.ead.c4s.ui.view.element"
  xmlns="http://www.w3.org/1999/xhtml">
  <m:VBox
    renderType="Bare"
    items="{
        path: 'contextMenu>/Info/Context',
        templateShareable: false
      }">
    <dev:SmallBreadcrumbs
      items="{
          path: 'contextMenu>',
          templateShareable: false
        }">
      <dev:items>
        <devEl:SmallBreadcrumbItem depth="{contextMenu>depth}">
          <m:HBox renderType="Bare">
            <m:HBox renderType="Bare">
              <dev:DataExplorerIcon
                explorerType="{contextMenu>DataExplorerType}"
                explorerSubType="{contextMenu>DataExplorerSubType}"/>
              <m:Label
                text="{
                    path: 'contextMenu>',
                    formatter: '._navigationPathTitleFormatter'
                  }"
                class="c4sSmallTileHeaderTitle"/>
            </m:HBox>
          </m:HBox>
        </devEl:SmallBreadcrumbItem>
      </dev:items>
    </dev:SmallBreadcrumbs>
  </m:VBox>
</core:FragmentDefinition>

After:

<core:FragmentDefinition
    xmlns:core="sap.ui.core"
    xmlns:m="sap.m"
    xmlns:dev="sap.ead.c4s.ui.view.control"
    xmlns:devEl="sap.ead.c4s.ui.view.element"
    xmlns="http://www.w3.org/1999/xhtml"
>
  <m:VBox renderType="Bare" items="{
        path: 'contextMenu>/Info/Context',
        templateShareable: false
      }">
    <dev:SmallBreadcrumbs items="{
          path: 'contextMenu>',
          templateShareable: false
        }">
      <dev:items>
        <devEl:SmallBreadcrumbItem depth="{contextMenu>depth}">
          <m:HBox renderType="Bare">
            <m:HBox renderType="Bare">
              <dev:DataExplorerIcon
                                explorerType="{contextMenu>DataExplorerType}"
                                explorerSubType="{contextMenu>DataExplorerSubType}"
                            />
              <m:Label
                                text="{
                    path: 'contextMenu>',
                    formatter: '._navigationPathTitleFormatter'
                  }"
                                class="c4sSmallTileHeaderTitle"
                            />
            </m:HBox>
          </m:HBox>
        </devEl:SmallBreadcrumbItem>
      </dev:items>
    </dev:SmallBreadcrumbs>
  </m:VBox>
</core:FragmentDefinition>

Perceived issues:

Starting in line 8, a tabWidth of 2 seems to be used.
Surprisingly high indentations in lines 21 - 23, 25, 29, 30.

The text was updated successfully, but these errors were encountered:

kddnewton · 2020-01-20T16:00:52Z

Okay so a couple of things are going on here. Technically, the plugin is functioning as designed. Though I realize it's not the best user experience at the moment.

Unfortunately in XML whitespace is very significant unless you're inside of <>. So what's happening is when prettier is formatting these groups, it's maintaining the exact same amount of whitespace you had inside the start and end tags to maintain parity. Otherwise for a lot of XML use cases prettier would actually be changing the content unwittingly.

With lines 21-23, 25, 29, and 30, you're actually seeing where the indentation would be if whitespace were insignificant.

I'm really not entirely sure of the way around this. I've been thinking maybe an xmlIgnoreWhitespace option for those that don't care about the exact whitespace count, but it would have to be default off.

winterkind · 2020-01-21T12:57:49Z

Thanks for the explanation! To dumb it down for me (I am not the XML expert as you probably already noticed...):

Row 8 in the result is indented by two spaces because row 7 in the original is also indented by two spaces? Meaning tabWidth is not applied? In lines such as 21 - 23 it is applied, just not with the parent tag as basis but with their real "level" within the tree?

In that case I agree with your idea of having an option for that, so that tabWidth is applied to rows with tags as well. Because with this kind of files my ideal result would indent row 8 by 4 spaces, row 12 by 8 spaces etc.

PS: preserving the indentation inside multi-row strings is of course correct, I am not talking about those!

kddnewton · 2020-01-21T13:07:00Z

So basically, when you have something like:

<foo>
  <bar />
</foo>

in XML that means that the content inside of the foo tag is actually: ["\n ", bar, "\n"]. The whitespace in this case is significant. As much as it looks like it's not because we've kind of come to agree that the spacing should look like a tree. But imagine a different example like:

<foo>
  bar <baz /> qux
</foo>

Now the content inside of the foo tag is actually: ["\n bar ", baz, " qux\n"].

kddnewton · 2020-01-21T13:08:18Z

So yeah, basically I'm thinking of trying to provide a rule that says if you don't have mixed content inside the tags, then we could probably just consider the whitespace as insignificant and indent as we like, but right now it would definitely change things otherwise and potentially break XML integrations.

bd82 · 2020-01-21T13:14:29Z

Specification wise XML is indeed whitespace sensitive.

However what % of real XML files are really whitespace sensitive?
Perhaps the default of ignoring whitespace in contents would be more functionally useful even if not 100% correct while those users who have meaningful whitespace would need to explicitly turn the WS ignoring off?

If we do not ignore whitespace it is very difficult to get a real "pretty XML"...
But I do not know what is the % of users who have meaningful whitespace?

This issue is what stumped me when I was playing around with a prettier plugin for
XML a few months ago... 😢

kddnewton · 2020-01-21T13:17:03Z

I think it pretty much has to be an option. Otherwise this is really only prettier-ifying open/close tags and prolog content.

kddnewton · 2020-01-21T13:17:46Z

Now to figure out what to name it. xmlIgnoreWhitespace? xmlWhitespaceInsignificant? xmlWhitespaceSignificantOnlyOnMixedContent?

thorn0 · 2020-01-21T16:40:35Z

For HTML, the analogous option of Prettier is called --html-whitespace-sensitivity.

Also schema languages (DTD, XML Schema, RELAX NG, Schematron) are used to define, among other things, whitespace rules for elements. If the document has an associated schema, you should be able to get the needed info about whitespace significance from there.

Finally, you can probably somehow use the standard xml:space attribute, but I don't know much about it.

bd82 · 2020-01-21T17:48:40Z

My 2 cents:

I would go with the option name --html-whitespace-sensitivity linked above.
but only keep two of the three options:

"strict" - Whitespaces are considered sensitive.
"ignore" - Whitespaces are considered insensitive.

And use "ignore" by default and see if any users complain. 😄

I'd probably also emphasize this option and its default behavior in the documentation.
and only in the longer term evaluate more granular control e.g via xml:space attribute.

thorn0 · 2020-01-21T18:05:42Z

Sounds good for starters.

The "css" value would make sense too as XML is one of the two syntaxes of HTML5.

And it might be possible to come up with a simple, but good enough heuristic for another value, "guess". E.g. "if an element doesn't have any non-whitespace text nodes among its children, then its whitespace-only child text nodes can be ignored". Something like that.

winterkind · 2020-01-22T08:12:16Z

I would go with the option name --html-whitespace-sensitivity linked above.
but only keep two of the three options:

"strict" - Whitespaces are considered sensitive.

"ignore" - Whitespaces are considered insensitive.

And use "ignore" by default and see if any users complain.

That sounds very reasonable. In our specific case, all whitespaces are insignificant, except for those inside multi-row strings. So if a value of ignore would also ignore those, then a third value like stringsOnly would be good, which would only keep whitespace inside of multi-row strings.

thorn0 · 2020-01-22T09:29:01Z

@winterkind BTW, did you try formatting your files with Prettier's HTML formatter and --html-whitespace-sensitivity ignore? Should work fine in your case: playground.

winterkind · 2020-01-22T11:50:50Z

@thorn0 You are right - that looks very good, also with our larger files. It just needs some convincing at this point. The "babel" parser complains about HTML comments and the explicit HTML parser has issues with our default namespace not being HTML.

But as I said: with the "babel" parser and after removing comments, the output is excellent.

thorn0 · 2020-01-22T12:15:42Z

@winterkind I meant the html parser. What issues exactly did you have with it? Could you demonstrate them using a link to the playground? (Better in a new issue in https://github.com/prettier/prettier to not hijack this issue with things not related directly to plugin-xml.)

winterkind · 2020-01-22T14:46:30Z

@thorn0 I don't think there is anything wrong with the HTML parser. To quickly show what I meant see here.

It is XML with a given default namespace and a Title tag inside that namespace. The HTML parser throws an error because it assumes this is the Title of HTML (that's my guess at least). From my point of view that is okay because my content is not HTML but XML.

I liked your comment because it showed the effect that a whitespace option could have for our XML.

bd82 · 2020-01-22T15:28:12Z

I am unsure if HTML is always a valid (syntactically) XML.

https://stackoverflow.com/questions/5558502/is-html5-valid-xml

thorn0 · 2020-01-22T15:36:39Z

@bd82 It's usually not, but that doesn't mean Prettier's HTML parser (Angular's actually) can't be smart enough to support XML. It's something I'd like to investigate when I get some time.

infotexture · 2020-01-22T17:24:07Z

Re. #40 (comment):

Also schema languages (DTD, XML Schema, RELAX NG, Schematron) are used to define, among other things, whitespace rules for elements. If the document has an associated schema, you should be able to get the needed info about whitespace significance from there.

Yep, proper context-sensitive whitespace handling is basic table stakes for an XML processing toolchain, so a simple on/off, strict/ignore approach wouldn't be sufficient, the plug-in would need to toggle the mode based on the current element.

Finally, you can probably somehow use the standard xml:space attribute, but I don't know much about it.

The schema languages typically handle that for each XML dialect, so the schema will typically use @xml:space to say something like “preserve all whitespace in elements named <codeblock>”.

For example, from the DITA spec:

@xml:space
This attribute is provided on <pre>, <lines>, and on elements specialized from those.
It ensures that parsers in editors and transforms respect the white space, including line-end
characters, that is part of the data in those elements. It is intended to be part of the default
properties of these elements, and not for authors to change or delete. When defined, it has a
fixed value of "preserve".

kddnewton · 2020-01-25T19:11:46Z

Okay, I think what I'd like to go with is creating an --xml-whitespace-sensitivity option that has ignore, strict, and smart (or something like that. The first two are obviously far easier to implement (the last one requires a lot of knowledge about the schema and elements, I'm very interested in implementing it, but not as a first pass).

kddnewton · 2020-01-25T19:54:16Z

@winterkind would you be able to test the whitespace-ignore branch on this (and set the xmlWhitespaceSensitivity to ignore)? Want to make sure it works for your use case.

kddnewton · 2020-01-27T13:47:45Z

Actually I've just gone ahead and released a v0.6.0 with the changes. I'm going to close this and if we find a need for the smart mode in the future we can build it. For now @winterkind try setting xmlWhitespaceSensitivity to "ignore" and it should be better for you.

winterkind · 2020-01-29T08:36:27Z

Hi, sorry for not responding earlier; I was sick for a few days. I really appreciate your help with this.

I tried 0.6.0 with the new option and it looks very good. I'll test with some more of our files and if I find anything else, I'll create new issues.

kddnewton closed this as completed Jan 27, 2020

sbidoul mentioned this issue Feb 29, 2020

No so pretty XML formatting by prettier OCA/maintainer-quality-tools#632

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.4.0: indentation issues #40

0.4.0: indentation issues #40

winterkind commented Jan 20, 2020

kddnewton commented Jan 20, 2020

winterkind commented Jan 21, 2020

kddnewton commented Jan 21, 2020

kddnewton commented Jan 21, 2020

bd82 commented Jan 21, 2020

kddnewton commented Jan 21, 2020

kddnewton commented Jan 21, 2020 •

edited

thorn0 commented Jan 21, 2020 •

edited

bd82 commented Jan 21, 2020

thorn0 commented Jan 21, 2020 •

edited

winterkind commented Jan 22, 2020

thorn0 commented Jan 22, 2020 •

edited

winterkind commented Jan 22, 2020 •

edited

thorn0 commented Jan 22, 2020 •

edited

winterkind commented Jan 22, 2020

bd82 commented Jan 22, 2020

thorn0 commented Jan 22, 2020

infotexture commented Jan 22, 2020 •

edited

kddnewton commented Jan 25, 2020

kddnewton commented Jan 25, 2020

kddnewton commented Jan 27, 2020

winterkind commented Jan 29, 2020

0.4.0: indentation issues #40

0.4.0: indentation issues #40

Comments

winterkind commented Jan 20, 2020

kddnewton commented Jan 20, 2020

winterkind commented Jan 21, 2020

kddnewton commented Jan 21, 2020

kddnewton commented Jan 21, 2020

bd82 commented Jan 21, 2020

kddnewton commented Jan 21, 2020

kddnewton commented Jan 21, 2020 • edited

thorn0 commented Jan 21, 2020 • edited

bd82 commented Jan 21, 2020

thorn0 commented Jan 21, 2020 • edited

winterkind commented Jan 22, 2020

thorn0 commented Jan 22, 2020 • edited

winterkind commented Jan 22, 2020 • edited

thorn0 commented Jan 22, 2020 • edited

winterkind commented Jan 22, 2020

bd82 commented Jan 22, 2020

thorn0 commented Jan 22, 2020

infotexture commented Jan 22, 2020 • edited

kddnewton commented Jan 25, 2020

kddnewton commented Jan 25, 2020

kddnewton commented Jan 27, 2020

winterkind commented Jan 29, 2020

kddnewton commented Jan 21, 2020 •

edited

thorn0 commented Jan 21, 2020 •

edited

thorn0 commented Jan 21, 2020 •

edited

thorn0 commented Jan 22, 2020 •

edited

winterkind commented Jan 22, 2020 •

edited

thorn0 commented Jan 22, 2020 •

edited

infotexture commented Jan 22, 2020 •

edited