Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.4.0: indentation issues #40

Closed
winterkind opened this issue Jan 20, 2020 · 22 comments
Closed

0.4.0: indentation issues #40

winterkind opened this issue Jan 20, 2020 · 22 comments

Comments

@winterkind
Copy link

Using 0.4.0 I tested one of our XML files.

Settings:

tabWidth: 4
printWidth: 120

Before:

<core:FragmentDefinition
  xmlns:core="sap.ui.core"
  xmlns:m="sap.m"
  xmlns:dev="sap.ead.c4s.ui.view.control"
  xmlns:devEl="sap.ead.c4s.ui.view.element"
  xmlns="http://www.w3.org/1999/xhtml">
  <m:VBox
    renderType="Bare"
    items="{
        path: 'contextMenu>/Info/Context',
        templateShareable: false
      }">
    <dev:SmallBreadcrumbs
      items="{
          path: 'contextMenu>',
          templateShareable: false
        }">
      <dev:items>
        <devEl:SmallBreadcrumbItem depth="{contextMenu>depth}">
          <m:HBox renderType="Bare">
            <m:HBox renderType="Bare">
              <dev:DataExplorerIcon
                explorerType="{contextMenu>DataExplorerType}"
                explorerSubType="{contextMenu>DataExplorerSubType}"/>
              <m:Label
                text="{
                    path: 'contextMenu>',
                    formatter: '._navigationPathTitleFormatter'
                  }"
                class="c4sSmallTileHeaderTitle"/>
            </m:HBox>
          </m:HBox>
        </devEl:SmallBreadcrumbItem>
      </dev:items>
    </dev:SmallBreadcrumbs>
  </m:VBox>
</core:FragmentDefinition>

After:

<core:FragmentDefinition
    xmlns:core="sap.ui.core"
    xmlns:m="sap.m"
    xmlns:dev="sap.ead.c4s.ui.view.control"
    xmlns:devEl="sap.ead.c4s.ui.view.element"
    xmlns="http://www.w3.org/1999/xhtml"
>
  <m:VBox renderType="Bare" items="{
        path: 'contextMenu>/Info/Context',
        templateShareable: false
      }">
    <dev:SmallBreadcrumbs items="{
          path: 'contextMenu>',
          templateShareable: false
        }">
      <dev:items>
        <devEl:SmallBreadcrumbItem depth="{contextMenu>depth}">
          <m:HBox renderType="Bare">
            <m:HBox renderType="Bare">
              <dev:DataExplorerIcon
                                explorerType="{contextMenu>DataExplorerType}"
                                explorerSubType="{contextMenu>DataExplorerSubType}"
                            />
              <m:Label
                                text="{
                    path: 'contextMenu>',
                    formatter: '._navigationPathTitleFormatter'
                  }"
                                class="c4sSmallTileHeaderTitle"
                            />
            </m:HBox>
          </m:HBox>
        </devEl:SmallBreadcrumbItem>
      </dev:items>
    </dev:SmallBreadcrumbs>
  </m:VBox>
</core:FragmentDefinition>

Perceived issues:

  1. Starting in line 8, a tabWidth of 2 seems to be used.
  2. Surprisingly high indentations in lines 21 - 23, 25, 29, 30.
@kddnewton
Copy link
Member

Okay so a couple of things are going on here. Technically, the plugin is functioning as designed. Though I realize it's not the best user experience at the moment.

Unfortunately in XML whitespace is very significant unless you're inside of <>. So what's happening is when prettier is formatting these groups, it's maintaining the exact same amount of whitespace you had inside the start and end tags to maintain parity. Otherwise for a lot of XML use cases prettier would actually be changing the content unwittingly.

With lines 21-23, 25, 29, and 30, you're actually seeing where the indentation would be if whitespace were insignificant.

I'm really not entirely sure of the way around this. I've been thinking maybe an xmlIgnoreWhitespace option for those that don't care about the exact whitespace count, but it would have to be default off.

@winterkind
Copy link
Author

Thanks for the explanation! To dumb it down for me (I am not the XML expert as you probably already noticed...):

Row 8 in the result is indented by two spaces because row 7 in the original is also indented by two spaces? Meaning tabWidth is not applied? In lines such as 21 - 23 it is applied, just not with the parent tag as basis but with their real "level" within the tree?

In that case I agree with your idea of having an option for that, so that tabWidth is applied to rows with tags as well. Because with this kind of files my ideal result would indent row 8 by 4 spaces, row 12 by 8 spaces etc.

PS: preserving the indentation inside multi-row strings is of course correct, I am not talking about those!

@kddnewton
Copy link
Member

So basically, when you have something like:

<foo>
  <bar />
</foo>

in XML that means that the content inside of the foo tag is actually: ["\n ", bar, "\n"]. The whitespace in this case is significant. As much as it looks like it's not because we've kind of come to agree that the spacing should look like a tree. But imagine a different example like:

<foo>
  bar <baz /> qux
</foo>

Now the content inside of the foo tag is actually: ["\n bar ", baz, " qux\n"].

@kddnewton
Copy link
Member

So yeah, basically I'm thinking of trying to provide a rule that says if you don't have mixed content inside the tags, then we could probably just consider the whitespace as insignificant and indent as we like, but right now it would definitely change things otherwise and potentially break XML integrations.

@bd82
Copy link

bd82 commented Jan 21, 2020

Specification wise XML is indeed whitespace sensitive.

However what % of real XML files are really whitespace sensitive?
Perhaps the default of ignoring whitespace in contents would be more functionally useful even if not 100% correct while those users who have meaningful whitespace would need to explicitly turn the WS ignoring off?

If we do not ignore whitespace it is very difficult to get a real "pretty XML"...
But I do not know what is the % of users who have meaningful whitespace?

This issue is what stumped me when I was playing around with a prettier plugin for
XML a few months ago... 😢

@kddnewton
Copy link
Member

I think it pretty much has to be an option. Otherwise this is really only prettier-ifying open/close tags and prolog content.

@kddnewton
Copy link
Member

kddnewton commented Jan 21, 2020

Now to figure out what to name it. xmlIgnoreWhitespace? xmlWhitespaceInsignificant? xmlWhitespaceSignificantOnlyOnMixedContent?

@thorn0
Copy link
Member

thorn0 commented Jan 21, 2020

For HTML, the analogous option of Prettier is called --html-whitespace-sensitivity.

Also schema languages (DTD, XML Schema, RELAX NG, Schematron) are used to define, among other things, whitespace rules for elements. If the document has an associated schema, you should be able to get the needed info about whitespace significance from there.

Finally, you can probably somehow use the standard xml:space attribute, but I don't know much about it.

@bd82
Copy link

bd82 commented Jan 21, 2020

My 2 cents:

I would go with the option name --html-whitespace-sensitivity linked above.
but only keep two of the three options:

  • "strict" - Whitespaces are considered sensitive.
  • "ignore" - Whitespaces are considered insensitive.

And use "ignore" by default and see if any users complain. 😄

I'd probably also emphasize this option and its default behavior in the documentation.
and only in the longer term evaluate more granular control e.g via xml:space attribute.

@thorn0
Copy link
Member

thorn0 commented Jan 21, 2020

Sounds good for starters.

The "css" value would make sense too as XML is one of the two syntaxes of HTML5.

And it might be possible to come up with a simple, but good enough heuristic for another value, "guess". E.g. "if an element doesn't have any non-whitespace text nodes among its children, then its whitespace-only child text nodes can be ignored". Something like that.

@winterkind
Copy link
Author

I would go with the option name --html-whitespace-sensitivity linked above.
but only keep two of the three options:

  • "strict" - Whitespaces are considered sensitive.
  • "ignore" - Whitespaces are considered insensitive.

And use "ignore" by default and see if any users complain.

That sounds very reasonable. In our specific case, all whitespaces are insignificant, except for those inside multi-row strings. So if a value of ignore would also ignore those, then a third value like stringsOnly would be good, which would only keep whitespace inside of multi-row strings.

@thorn0
Copy link
Member

thorn0 commented Jan 22, 2020

@winterkind BTW, did you try formatting your files with Prettier's HTML formatter and --html-whitespace-sensitivity ignore? Should work fine in your case: playground.

@winterkind
Copy link
Author

winterkind commented Jan 22, 2020

@thorn0 You are right - that looks very good, also with our larger files. It just needs some convincing at this point. The "babel" parser complains about HTML comments and the explicit HTML parser has issues with our default namespace not being HTML.

But as I said: with the "babel" parser and after removing comments, the output is excellent.

@thorn0
Copy link
Member

thorn0 commented Jan 22, 2020

@winterkind I meant the html parser. What issues exactly did you have with it? Could you demonstrate them using a link to the playground? (Better in a new issue in https://github.com/prettier/prettier to not hijack this issue with things not related directly to plugin-xml.)

@winterkind
Copy link
Author

@thorn0 I don't think there is anything wrong with the HTML parser. To quickly show what I meant see here.

It is XML with a given default namespace and a Title tag inside that namespace. The HTML parser throws an error because it assumes this is the Title of HTML (that's my guess at least). From my point of view that is okay because my content is not HTML but XML.

I liked your comment because it showed the effect that a whitespace option could have for our XML.

@bd82
Copy link

bd82 commented Jan 22, 2020

I am unsure if HTML is always a valid (syntactically) XML.

@thorn0
Copy link
Member

thorn0 commented Jan 22, 2020

@bd82 It's usually not, but that doesn't mean Prettier's HTML parser (Angular's actually) can't be smart enough to support XML. It's something I'd like to investigate when I get some time.

@infotexture
Copy link
Contributor

infotexture commented Jan 22, 2020

Re. #40 (comment):

Also schema languages (DTD, XML Schema, RELAX NG, Schematron) are used to define, among other things, whitespace rules for elements. If the document has an associated schema, you should be able to get the needed info about whitespace significance from there.

Yep, proper context-sensitive whitespace handling is basic table stakes for an XML processing toolchain, so a simple on/off, strict/ignore approach wouldn't be sufficient, the plug-in would need to toggle the mode based on the current element.

Finally, you can probably somehow use the standard xml:space attribute, but I don't know much about it.

The schema languages typically handle that for each XML dialect, so the schema will typically use @xml:space to say something like “preserve all whitespace in elements named <codeblock>”.

For example, from the DITA spec:

@xml:space
This attribute is provided on <pre>, <lines>, and on elements specialized from those.
It ensures that parsers in editors and transforms respect the white space, including line-end
characters, that is part of the data in those elements. It is intended to be part of the default
properties of these elements, and not for authors to change or delete. When defined, it has a
fixed value of "preserve".

@kddnewton
Copy link
Member

Okay, I think what I'd like to go with is creating an --xml-whitespace-sensitivity option that has ignore, strict, and smart (or something like that. The first two are obviously far easier to implement (the last one requires a lot of knowledge about the schema and elements, I'm very interested in implementing it, but not as a first pass).

@kddnewton
Copy link
Member

@winterkind would you be able to test the whitespace-ignore branch on this (and set the xmlWhitespaceSensitivity to ignore)? Want to make sure it works for your use case.

@kddnewton
Copy link
Member

Actually I've just gone ahead and released a v0.6.0 with the changes. I'm going to close this and if we find a need for the smart mode in the future we can build it. For now @winterkind try setting xmlWhitespaceSensitivity to "ignore" and it should be better for you.

@winterkind
Copy link
Author

Hi, sorry for not responding earlier; I was sick for a few days. I really appreciate your help with this.

I tried 0.6.0 with the new option and it looks very good. I'll test with some more of our files and if I find anything else, I'll create new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants