Namespaces #6

wooorm · 2016-07-03T19:22:25Z

TL;DR

I’m thinking out loud. We need namespace information. I can think of three solutions. Not sure which is best.

Introduction

HTML has the concept of elements: things like <strong></strong> are normal elements. There’s a subcategory of “foreign elements”: those from MathML (mi) or from SVG (rect).

A practical example of why this information is needed is because of tag-name normalisation: in HTML, tag-names are case-insensitive. In SVG or MathML, they are not. And, unfortunately tag-names themselves cannot be used to detect whether an element is foreign or not, because there are elements which exist in multiple spaces. For example: var in HTML and MathML, and a in HTML and SVG.

Take the following code:

<!doctype html>
<title>Foreign elements in HTML</title>
<h1>HTML</h1>
<a href="#">HTML link</a>
<var>htmlVar</var>
<svg>
  <a href="#">SVG link</a>
  <span>SVG</span>
  <a href="#">SVG link</a>
</svg>
<math>
  <mi>mathMLVar</mi>
  <span>MathML</span>
  <mi>mathMLVar</mi>
</math>

When running the following script:

var length = document.all.length;
var index = -1;
var node;
while (++index < length) node = document.all[index], console.log([node.tagName, node.namespaceURI, node.textContent]);

Yields:

[Log] ["HTML", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML↵HTML↵HTML link↵htmlVar↵↵ …G↵  SVG link↵↵↵  mathMLVar↵  MathML↵  mathMLVar↵↵"] (3)
[Log] ["HEAD", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML↵"] (3)
[Log] ["TITLE", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML"] (3)
[Log] ["BODY", "http://www.w3.org/1999/xhtml", "HTML↵HTML link↵htmlVar↵↵  SVG link↵  SVG↵  SVG link↵↵↵  mathMLVar↵  MathML↵  mathMLVar↵↵"] (3)
[Log] ["H1", "http://www.w3.org/1999/xhtml", "HTML"] (3)
[Log] ["A", "http://www.w3.org/1999/xhtml", "HTML link"] (3)
[Log] ["VAR", "http://www.w3.org/1999/xhtml", "htmlVar"] (3)
[Log] ["svg", "http://www.w3.org/2000/svg", "↵  SVG link↵  "] (3)
[Log] ["a", "http://www.w3.org/2000/svg", "SVG link"] (3)
[Log] ["SPAN", "http://www.w3.org/1999/xhtml", "SVG"] (3)
[Log] ["A", "http://www.w3.org/1999/xhtml", "SVG link"] (3)
[Log] ["math", "http://www.w3.org/1998/Math/MathML", "↵  mathMLVar↵  "] (3)
[Log] ["mi", "http://www.w3.org/1998/Math/MathML", "mathMLVar"] (3)
[Log] ["SPAN", "http://www.w3.org/1999/xhtml", "MathML"] (3)
[Log] ["MI", "http://www.w3.org/1999/xhtml", "mathMLVar"] (3)

Note 1: Non-foreign elements break out of their foreign context.
Note 2: HTML is case-insensitive (normalised to upper-case), foreign elements are case-sensitive.

Proposal

I propose either of the following:

Add namespace on some nodes (notably, root, <mathml>, <svg>). To determine the namespace of a node, check its closest ancestor with a namespace.
Add namespace on root nodes (and wrap <svg> and <mathml> in roots). To determine the namespace of a node, check its closest root for a namespace. This changes the semantics of roots somewhat.
Add namespace on any element.

The downsides of the first two as that it’s hard to determine the namespace from an element in a syntax tree without ancestral getters. However, both make moving nodes around quite easy.
The latter is verbose, but does allow for easy access. However, it makes it easy for things to go wrong when shuffling nodes around.

Note: detecting namespaces upon creation (in rehype-parse), is very do-able. I’d like to make the usage of hastscript and transformers very easy too, though!

Do let me know about your thoughts on this!

The text was updated successfully, but these errors were encountered:

eush77 · 2016-07-04T17:44:45Z

I have no strong opinion on this, but I lean towards one of the first two options. The third option seems unnecessary cluttered and error-prone (easier to be messed up by plugins).

Since namespaces are naturally nested, it seems logical to have a rule that a namespace is determined by closest namespace property (on whatever node it is), and it doesn't seem beneficial to require namespace property to be on a node of a fixed type (but it's fine by me, too).

wooorm · 2016-07-04T17:50:40Z

👍

I’m leaning towards the first. It’ll be easy to add (bookkeeping is already sone when parsing, and one walk down could opt to add only necessary namespaces), and it’ll be easy to handle for plug-in authors.

eush77 · 2016-07-04T18:06:53Z

Is there anything else that namespaces would be used for between parsing and compilation, except to determine the proper tag/attribute casing of an element?

wooorm · 2016-07-04T18:20:51Z

I can think of lots of parse differences, but those are handled internally (I’m about to switch to a much better parser, parse5) already. Then, there’s compilation differences, but those can of course be handled there pretty OK (as it’s in rehype-stringify).

Major use case for user-land would be to not walk into SVG / MathML by accident, I think. Hmm. That can be checked easily by determining whether an element is svg / math, though...

eush77 · 2016-07-04T18:54:01Z

Major use case for user-land would be to not walk into SVG / MathML by accident

Have you thought about making it explicit then? Maybe some other property like { subtree: "..." } instead of children? So that in HTML-land the whole SVG is just a single black-box element, which can still be manipulated if needed.

That can be checked easily by determining whether an element is svg / math, though...

How would it be different from namespaces?

wooorm · 2016-07-04T19:12:06Z

How would it be different from namespaces?

Not really different, just one less property.

Have you thought about making it explicit then? Maybe some other property like { subtree: "..." } instead of children? So that in HTML-land the whole SVG is just a single black-box element, which can still be manipulated if needed.

That’s also possible, and it would make HAST more like programming language syntax trees.
I prefer the earlier idea of sub-roots for black boxes though. Still using children, but with notable semantics. And possible in the current, quite minimal, Unist interface.

There’s also an edge case where HTML is in SVG/MathML, which itself is in HTML. If either namespaces or subroots were used, it would be possible to walk into trees, remembering the current namespace, and transforming just the HTML namespaced elements.

wooorm · 2016-07-25T13:17:22Z

Not doing it for now. Maybe a utility which, when given a node, checks if it’s a foreign element, would do the trick.

kgryte · 2017-08-23T22:46:12Z

Linking to rehypejs/rehype#2 (comment).

For reference: - rehypejs/rehype-react#5 - rehypejs/rehype#2 - syntax-tree/hast#6

wooorm · 2018-06-24T22:00:48Z

I thought about it a bit and I’d like to work on this now. For starters, this issue is now first tracked in wooorm/property-information#6. When that is done, we can work on updating it throughout the ecosystem.

I think we may be able to do without namespaces. But maybe we need to have, just like template, a content property for foreign content instead.

I’ll close this now, again, if anyone has any further comments please post them there!

wooorm added 🧒 semver/minor This is backwards-compatible change 🧑 semver/major This is a change 🙉 open/needs-info This needs some more info labels Jul 3, 2016

wooorm mentioned this issue Jul 3, 2016

Namespaces rehypejs/rehype#2

Closed

wooorm mentioned this issue Jul 5, 2016

2.0.0 #8

Closed

1 task

wooorm added the no label Jul 25, 2016

wooorm closed this as completed Jul 25, 2016

wooorm mentioned this issue Jun 19, 2017

Add support for SVG rehypejs/rehype-react#5

Closed

kgryte added a commit to stdlib-js/stdlib that referenced this issue Aug 24, 2017

Add post-processing step to address rehype issues

ddf49d8

For reference: - rehypejs/rehype-react#5 - rehypejs/rehype#2 - syntax-tree/hast#6

wooorm reopened this Aug 24, 2017

wooorm removed the no label Nov 17, 2017

wooorm mentioned this issue Nov 17, 2017

Fix addAttribute for React syntax-tree/hast-to-hyperscript#8

Closed

wooorm closed this as completed Jun 24, 2018

wooorm changed the title ~~Feature: Namespaces~~ Namespaces Aug 12, 2019

wooorm added 💬 type/discussion This is a request for comments and removed 🧑 semver/major This is a change 🧒 semver/minor This is backwards-compatible change 🙉 open/needs-info This needs some more info labels Aug 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Namespaces #6

Namespaces #6

wooorm commented Jul 3, 2016

eush77 commented Jul 4, 2016

wooorm commented Jul 4, 2016

eush77 commented Jul 4, 2016 •

edited

wooorm commented Jul 4, 2016

eush77 commented Jul 4, 2016

wooorm commented Jul 4, 2016

wooorm commented Jul 25, 2016

kgryte commented Aug 23, 2017

wooorm commented Jun 24, 2018

Namespaces #6

Namespaces #6

Comments

wooorm commented Jul 3, 2016

TL;DR

Introduction

Proposal

eush77 commented Jul 4, 2016

wooorm commented Jul 4, 2016

eush77 commented Jul 4, 2016 • edited

wooorm commented Jul 4, 2016

eush77 commented Jul 4, 2016

wooorm commented Jul 4, 2016

wooorm commented Jul 25, 2016

kgryte commented Aug 23, 2017

wooorm commented Jun 24, 2018

eush77 commented Jul 4, 2016 •

edited