Skip to content
zverok edited this page Jun 24, 2015 · 9 revisions

Infoboxer page tree consists of nodes. Node is some piece of document, which either contains text (Text node), or other nodes (like Italic with text inside or Paragraph), or it can be empty (HR node or BR node).

The basic methods for each node are:

  • Node#text -- plaintext representation of node contents
  • Node#params -- for example, {level: 3} for heading, or {class: 'wikitable'} for table
  • Node#children (for all compound nodes) and Node#parent

Also, many nodes has some convinience methods and additional attributes, like Wikilink#link, or Image#caption, or Template#name -- all of them can be found in API docs.

Nodes collection

When infoboxer returns you list of nodes, it is wrapped in Nodes class, which is basically Array with some additions like:

  • Nodes#text returns joined text of all nodes
  • Nodes#fetch('variable') fetches variables from all templates in nodes list
  • Nodes#... TODO

The idea is simple and already seen in DOM tree navigators like Nokogiri or jQuery: in most common cases you can work with list of nodes the same way you work with only node.

Node text gotchas

"Invisible" nodes: idea of Node#text is to provide "plain readable" version of page fragment; so, some node types give intentionally empty text. This relates to and templates (the templates matter is complicated, though)

para = Infoboxer::Parser.paragraphs('')
para.text

# But
para.lookup(Ref).text

Paragraph-level nodes return text, ending with "\n\n". This way paragraph's text can be just .join-ed to obtain pretty rendered paragraphs. But if you want to just output TOC or something like this, extra "\n\n"-s can be irritating. For such cases there's method with cumbersome name #text_ -- which is kinda synonym for node.text.strip

page = Infoboxer.wp.get('Argentina')
page.headings.each{|h| puts ' ' * h.level + h.text}
# Output:

# ...

# But
page.headings.each{|h| puts ' ' * h.level + h.text_}
# Output:

# ...

Tables is rendered (somewhat experimentally) with [terminal-table] gem. This looks pretty good on demo, but I'm not sure at all that this approach is not an overkill. Let's try and decide.

puts Infoboxer.wp.get('Sri Lanka').tables.first.text
# Output:
# +----------------------------------------+--------------+------------+---------+-------------+
# |  Administrative Divisions of Sri Lanka |
# +----------------------------------------+--------------+------------+---------+-------------+
# | Province                               |  Capital     |  Area (km) |  Area   |  Population |
# |                                        |              |            | (sq mi) |             |
# | Central                                | Kandy        |  5,674     |         |  2,556,774  |
# | Eastern                                |  Trincomalee |  9,996     |         |  1,547,377  |
# | North Central                          | Anuradhapura |  10,714    |         |  1,259,421  |
# | Northern                               |  Jaffna      |  8,884     |         |  1,060,023  |
# | North Western                          | Kurunegala   |  7,812     |         |  2,372,185  |
# | Sabaragamuwa                           |  Ratnapura   |  4,902     |         |  1,919,478  |
# | Southern                               | Galle        |  5,559     |         |  2,465,626  |
# | Uva                                    |  Badulla     |  8,488     |         |  1,259,419  |
# | Western                                | Colombo      |  3,709     |         |  5,837,294  |
# +----------------------------------------+--------------+------------+---------+-------------+

Next: Tree navigation basics