# How to remove nodes from a tree structure in Dyalog APL

Let us take a simple example of some XHTML. We are going to remove tags with `class="remove"` but keep their children, lifting all descendants one level in the tree.

The following is stored in a character vector `xhtml`:

In [1]:
nl←⎕UCS 10
xhtml ← '<div class="remove">',nl
xhtml,← '  <h1>Title</h1>',nl
xhtml,← '  <div>',nl
xhtml,← '    <p>nested p in nested div</p>',nl
xhtml,← '  </div>',nl
xhtml,← '  <p>Some text</p>',nl
xhtml,← '</div>',nl
xhtml,← nl
xhtml,← '<div>',nl
xhtml,← '  <p class="remove">',nl
xhtml,← '    Here is text with <strong>bold</strong> tag inside.',nl
xhtml,← '  </p>',nl
xhtml,← '</div>',nl

In [2]:
xhtml

First we use `⎕XML` to parse the tree structure into a depth-vector representation.

The result of `⎕XML` is a matrix with columns:
- depth `d` integer vector of node depth in order of a depth-first pre-order traversal.
- XML tag `t` nested vector of character vectors
- value `v` nested vector of character vectors
- attributes `a` nested vector of nested matrices, each with 2-element rows of the attribute key and value
- kind `k` is a numeric vector which indicates whether the row contains an element, child element, character data etc. according to the table in [the `⎕XML` documentation](https://help.dyalog.com/latest/#Language/System%20Functions/xml.htm)

We extract these columns into a vector variable for each column using split-transpose `↓⍉`:

In [3]:
(d t v a k) ← ↓⍉ ⎕XML xhtml

Now we identify the tags to remove.

We want to find nodes with a row in their attributes matrix `'class' 'remove'`. The high-rank version of the *membership* function uses index-of to see whether a cell `⍺` is present as a major cell in `⍵`.

In [4]:
E←{(≢⍵)≥⍵⍳⍺}

In [5]:
⎕←remove←'class' 'remove'∘E¨a

The easiest way to identify descendants is using the parent vector `p`, defined from `d` with the following idiom:

In [6]:
2{p[⍵]←⍺[⍺⍸⍵]}⌿⊢∘⊂⌸d⊣p←⍳≢d

Any top-level node is its own parent.

In [7]:
i←⍳≢d
↑i d p t v

We can now identify nodes whose parent we are going to remove. That is, the children of our tags to remove.

In [8]:
⍸remove
p∊⍸remove

To get all descendants, we iterate on this expression until no more children are found.

In [9]:
⎕←desc←{⍵∨p∊⍸⍵}⍣≡remove

Note that node 4 is now included as because we descended the tree, while node 7 is included because our traversal function `{⍵∨p∊⍸⍵}` includes the original nodes to remove.

At this point we could remove the nodes and all their descendants with the *compress* function:

In [10]:
⎕XML ⍉↑(d t v a)⌿¨⍨⊂~desc

But instead we want to keep the descendants and lift them one level in the hierarchy. To do this, we simply take one from the depths of the descendants. Then we remove only those nodes we originally intended to remove.

In [11]:
((desc)⌿d) -← 1
⎕XML ⍉↑(d t v a)⌿¨⍨⊂~remove

We have removed tags with `class="remove"` and lifted their descendants in the tree structure.