# Exploration of flat tree representations in APL
The namespace representation of hierarchical structures in Dyalog APL is relatively intuitive, but can suffer from drawbacks.
- the node names are restricted to valid APL names, unless some special translation scheme is used as in `⎕JSON`
- extracting data or manipulating the tree involves heavy recursion, and creating new nodes can involve the need to set variable names dynamically

The most recent significant publications about handling tree structures in APL comes from Hsu. It is relatively accessible as PhD theses go, but still many find it difficult to map between the vocabulary of generic tree transformations and specific compiler passes and the kind of tree handling they may need to do in day-to-day business.

Two examples of commonly found tree structures in today's computing landscape are HTML documents and JSON objects. The system functions `⎕XML` and `⎕JSON` are both able to parse text representations into depth-vector representations of their respective tree structures. However, the data is returned in a nested structure which is inefficient to query and manipulate.

In this notebook, I compare several approaches to extracting data from trees for ergonomics and performance.

1. Extract the same node from what is basically a table in JSON - a list of objects all with the same structure
1. Removing nodes from a tree

In [1]:
nl←⎕UCS 10
xhtml ← '<div class="remove">',nl
xhtml,← '  <h1>Title</h1>',nl
xhtml,← '  <div>',nl
xhtml,← '    <p>nested p in nested div</p>',nl
xhtml,← '  </div>',nl
xhtml,← '  <p>Some text</p>',nl
xhtml,← '</div>',nl
xhtml,← nl
xhtml,← '<div>',nl
xhtml,← '  <p class="remove">',nl
xhtml,← '    Here is text with <strong>bold</strong> tag inside.',nl
xhtml,← '  </p>',nl
xhtml,← '</div>',nl

In [2]:
(d t v a k)←↓⍉⎕XML xhtml

In [3]:
t v a

In [6]:
(it iv ia)←↑⍣≡¨t v a

In [7]:
⍴¨it iv ia

Character matrices for tags and values is not too bad (unless the length of values is very uneven), but the 4D array for attributes is likely to have a lot of empty data.

Another thing is to maintain a lengths list for values. Tags and attributes have no leading or trailing blanks.

In [8]:
lv←≢¨v

In [9]:
lv