-
Notifications
You must be signed in to change notification settings - Fork 16
Nodes
Infoboxer page tree consists of nodes. Node is some piece of document,
which either contains text (Text
node), or other nodes (like Italic
with
text inside or Paragraph
), or it can be empty (HR
node or BR
node).
The basic methods for each node are:
-
Node#text
-- plaintext representation of node contents -
Node#params
-- for example,{level: 3}
for heading, or{class: 'wikitable'}
for table -
Node#children
(for all compound nodes) andNode#parent
Also, many nodes has some convinience methods and additional attributes,
like Wikilink#link
, or Image#caption
, or Template#name
-- all of
them can be found in API docs.
When infoboxer returns you list of nodes, it is wrapped in Nodes class, which is basically Array with some additions like:
-
Nodes#text
returns joined text of all nodes -
Nodes#fetch('variable')
fetches variables from all templates in nodes list -
Nodes#...
TODO
The idea is simple and already seen in DOM tree navigators like Nokogiri or jQuery: in most common cases you can work with list of nodes the same way you work with only node.
"Invisible" nodes: idea of Node#text
is to provide "plain readable"
version of page fragment; so, some node types give intentionally empty
text. This relates to and templates (the templates matter is
complicated, though)
para = Infoboxer::Parser.paragraphs('')
para.text
# But
para.lookup(Ref).text
Paragraph-level nodes return text, ending with "\n\n". This way
paragraph's text can be just .join
-ed to obtain pretty rendered
paragraphs. But if you want to just output TOC or something like this,
extra "\n\n"-s can be irritating. For such cases there's method with
cumbersome name #text_
-- which is kinda synonym for node.text.strip
page = Infoboxer.wp.get('Argentina')
page.headings.each{|h| puts ' ' * h.level + h.text}
# Output:
# ...
# But
page.headings.each{|h| puts ' ' * h.level + h.text_}
# Output:
# ...
Tables is rendered (somewhat experimentally) with [terminal-table] gem. This looks pretty good on demo, but I'm not sure at all that this approach is not an overkill. Let's try and decide.
puts Infoboxer.wp.get('Sri Lanka').tables.first.text
# Output:
# +----------------------------------------+--------------+------------+---------+-------------+
# | Administrative Divisions of Sri Lanka |
# +----------------------------------------+--------------+------------+---------+-------------+
# | Province | Capital | Area (km) | Area | Population |
# | | | | (sq mi) | |
# | Central | Kandy | 5,674 | | 2,556,774 |
# | Eastern | Trincomalee | 9,996 | | 1,547,377 |
# | North Central | Anuradhapura | 10,714 | | 1,259,421 |
# | Northern | Jaffna | 8,884 | | 1,060,023 |
# | North Western | Kurunegala | 7,812 | | 2,372,185 |
# | Sabaragamuwa | Ratnapura | 4,902 | | 1,919,478 |
# | Southern | Galle | 5,559 | | 2,465,626 |
# | Uva | Badulla | 8,488 | | 1,259,419 |
# | Western | Colombo | 3,709 | | 5,837,294 |
# +----------------------------------------+--------------+------------+---------+-------------+
Next: Tree navigation basics