Skip to content

Tree navigation basics

zverok edited this page Aug 7, 2015 · 8 revisions

Exploring the data

The content model of the tree tries to be straightforward, not very deep and easily understandable:

  • on the first level there's paragraphs, headings, lists and tables;
  • inside them is inline markup: bolds and italics, links, images, templates and allowed HTML tags;
  • each tree node type has its own class with obvious name: Infoboxer::Tree::Paragraph, Infoboxer::Tree::Heading, Infoboxer::Tree::UnorderedList and so on.

Tree navigation is done like this:

include Infoboxer::Tree

# Node#lookup
page.lookup(Wikilink) # all wikilinks on page
page.lookup(Heading, level: 3) # all headings of level 3 only
page.lookup(Wikilink){|l| l.text.include?('federation')}

# if you don't want to include Infoboxer::Tree, class-y symbols are
# also working:
page.lookup(:Wikilink) # all wikilinks on page

# Node#lookup_children
page.lookup(:Paragraph).first.lookup_children(:Italic)
# => only italics which are direct children of the para (doesn't returns
#    italics inside links, for example)

# Node#lookup_parents
page.lookup(:ListItem).first.lookup_parents(:UnorderedList)

# Node#lookup_siblings
page.lookup(:ListItem).first.lookup_siblings(index: 4)

Each lookup returns Nodes type, and it has methods to just continue your lookup like this:

page.lookup(UnorderedList).lookup_children(text: /Argentinian/)

Arguments passed to any lookup_* method is a list of selectors, which can contain those values:

  • Node class (like ListItem) or class-name symbols (like :ListItem);
  • Symbol (like :empty?) -- Node is checked for having this method and returning truthy value from it;
  • Hash of "symbol => pattern" values, where symbol is any node getter, and the pattern is value to check against (checks are performed with ===, so you can do things like text: /something/);
  • block, which receives node and returns true or false.

It's not an XPath-strength solution, yet it is straightforward and flexible (and it is pure Ruby).

See also API docs.

Surprisingly, that's enough power to get virtually everything Wikipedia can provide. Yet there's more!

Next: Navigation shortcuts