Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespaces #6

Closed
wooorm opened this issue Jul 3, 2016 · 9 comments
Closed

Namespaces #6

wooorm opened this issue Jul 3, 2016 · 9 comments
Labels
💬 type/discussion This is a request for comments

Comments

@wooorm
Copy link
Member

wooorm commented Jul 3, 2016

TL;DR

I’m thinking out loud. We need namespace information. I can think of three solutions. Not sure which is best.

Introduction

HTML has the concept of elements: things like <strong></strong> are normal elements. There’s a subcategory of “foreign elements”: those from MathML (mi) or from SVG (rect).

A practical example of why this information is needed is because of tag-name normalisation: in HTML, tag-names are case-insensitive. In SVG or MathML, they are not. And, unfortunately tag-names themselves cannot be used to detect whether an element is foreign or not, because there are elements which exist in multiple spaces. For example: var in HTML and MathML, and a in HTML and SVG.

Take the following code:

<!doctype html>
<title>Foreign elements in HTML</title>
<h1>HTML</h1>
<a href="#">HTML link</a>
<var>htmlVar</var>
<svg>
  <a href="#">SVG link</a>
  <span>SVG</span>
  <a href="#">SVG link</a>
</svg>
<math>
  <mi>mathMLVar</mi>
  <span>MathML</span>
  <mi>mathMLVar</mi>
</math>

When running the following script:

var length = document.all.length;
var index = -1;
var node;
while (++index < length) node = document.all[index], console.log([node.tagName, node.namespaceURI, node.textContent]);

Yields:

[Log] ["HTML", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML↵HTML↵HTML link↵htmlVar↵↵ …G↵  SVG link↵↵↵  mathMLVar↵  MathML↵  mathMLVar↵↵"] (3)
[Log] ["HEAD", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML↵"] (3)
[Log] ["TITLE", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML"] (3)
[Log] ["BODY", "http://www.w3.org/1999/xhtml", "HTML↵HTML link↵htmlVar↵↵  SVG link↵  SVG↵  SVG link↵↵↵  mathMLVar↵  MathML↵  mathMLVar↵↵"] (3)
[Log] ["H1", "http://www.w3.org/1999/xhtml", "HTML"] (3)
[Log] ["A", "http://www.w3.org/1999/xhtml", "HTML link"] (3)
[Log] ["VAR", "http://www.w3.org/1999/xhtml", "htmlVar"] (3)
[Log] ["svg", "http://www.w3.org/2000/svg", "↵  SVG link↵  "] (3)
[Log] ["a", "http://www.w3.org/2000/svg", "SVG link"] (3)
[Log] ["SPAN", "http://www.w3.org/1999/xhtml", "SVG"] (3)
[Log] ["A", "http://www.w3.org/1999/xhtml", "SVG link"] (3)
[Log] ["math", "http://www.w3.org/1998/Math/MathML", "↵  mathMLVar↵  "] (3)
[Log] ["mi", "http://www.w3.org/1998/Math/MathML", "mathMLVar"] (3)
[Log] ["SPAN", "http://www.w3.org/1999/xhtml", "MathML"] (3)
[Log] ["MI", "http://www.w3.org/1999/xhtml", "mathMLVar"] (3)

Note 1: Non-foreign elements break out of their foreign context.
Note 2: HTML is case-insensitive (normalised to upper-case), foreign elements are case-sensitive.

Proposal

I propose either of the following:

  • Add namespace on some nodes (notably, root, <mathml>, <svg>). To determine the namespace of a node, check its closest ancestor with a namespace.
  • Add namespace on root nodes (and wrap <svg> and <mathml> in roots). To determine the namespace of a node, check its closest root for a namespace. This changes the semantics of roots somewhat.
  • Add namespace on any element.

The downsides of the first two as that it’s hard to determine the namespace from an element in a syntax tree without ancestral getters. However, both make moving nodes around quite easy.
The latter is verbose, but does allow for easy access. However, it makes it easy for things to go wrong when shuffling nodes around.

Note: detecting namespaces upon creation (in rehype-parse), is very do-able. I’d like to make the usage of hastscript and transformers very easy too, though!

Do let me know about your thoughts on this!

@wooorm wooorm added 🧒 semver/minor This is backwards-compatible change 🧑 semver/major This is a change 🙉 open/needs-info This needs some more info labels Jul 3, 2016
@eush77
Copy link

eush77 commented Jul 4, 2016

I have no strong opinion on this, but I lean towards one of the first two options. The third option seems unnecessary cluttered and error-prone (easier to be messed up by plugins).

Since namespaces are naturally nested, it seems logical to have a rule that a namespace is determined by closest namespace property (on whatever node it is), and it doesn't seem beneficial to require namespace property to be on a node of a fixed type (but it's fine by me, too).

@wooorm
Copy link
Member Author

wooorm commented Jul 4, 2016

👍

I’m leaning towards the first. It’ll be easy to add (bookkeeping is already sone when parsing, and one walk down could opt to add only necessary namespaces), and it’ll be easy to handle for plug-in authors.

@eush77
Copy link

eush77 commented Jul 4, 2016

Is there anything else that namespaces would be used for between parsing and compilation, except to determine the proper tag/attribute casing of an element?

@wooorm
Copy link
Member Author

wooorm commented Jul 4, 2016

I can think of lots of parse differences, but those are handled internally (I’m about to switch to a much better parser, parse5) already. Then, there’s compilation differences, but those can of course be handled there pretty OK (as it’s in rehype-stringify).

Major use case for user-land would be to not walk into SVG / MathML by accident, I think. Hmm. That can be checked easily by determining whether an element is svg / math, though...

@eush77
Copy link

eush77 commented Jul 4, 2016

Major use case for user-land would be to not walk into SVG / MathML by accident

Have you thought about making it explicit then? Maybe some other property like { subtree: "..." } instead of children? So that in HTML-land the whole SVG is just a single black-box element, which can still be manipulated if needed.

That can be checked easily by determining whether an element is svg / math, though...

How would it be different from namespaces?

@wooorm
Copy link
Member Author

wooorm commented Jul 4, 2016

How would it be different from namespaces?

Not really different, just one less property.

Have you thought about making it explicit then? Maybe some other property like { subtree: "..." } instead of children? So that in HTML-land the whole SVG is just a single black-box element, which can still be manipulated if needed.

That’s also possible, and it would make HAST more like programming language syntax trees.
I prefer the earlier idea of sub-roots for black boxes though. Still using children, but with notable semantics. And possible in the current, quite minimal, Unist interface.

There’s also an edge case where HTML is in SVG/MathML, which itself is in HTML. If either namespaces or subroots were used, it would be possible to walk into trees, remembering the current namespace, and transforming just the HTML namespaced elements.

@wooorm wooorm mentioned this issue Jul 5, 2016
Closed
1 task
@wooorm wooorm added the no label Jul 25, 2016
@wooorm
Copy link
Member Author

wooorm commented Jul 25, 2016

Not doing it for now. Maybe a utility which, when given a node, checks if it’s a foreign element, would do the trick.

@kgryte
Copy link

kgryte commented Aug 23, 2017

Linking to rehypejs/rehype#2 (comment).

@wooorm
Copy link
Member Author

wooorm commented Jun 24, 2018

I thought about it a bit and I’d like to work on this now. For starters, this issue is now first tracked in wooorm/property-information#6. When that is done, we can work on updating it throughout the ecosystem.

I think we may be able to do without namespaces. But maybe we need to have, just like template, a content property for foreign content instead.

I’ll close this now, again, if anyone has any further comments please post them there!

@wooorm wooorm closed this as completed Jun 24, 2018
@wooorm wooorm changed the title Feature: Namespaces Namespaces Aug 12, 2019
@wooorm wooorm added 💬 type/discussion This is a request for comments and removed 🧑 semver/major This is a change 🧒 semver/minor This is backwards-compatible change 🙉 open/needs-info This needs some more info labels Aug 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💬 type/discussion This is a request for comments
Development

No branches or pull requests

3 participants