Skip to content

Navigation shortcuts

zverok edited this page Aug 7, 2015 · 1 revision

So, you already received some page from wikipedia and inspected and navigated data structure. All in all, that's enough to extract information of any kind. But Infoboxer can do it smoother.

(JFYI: API docs for Navigation::Shortcuts module lists all of theese in more orderly manner.)

Shortcuts for receiving node lists of some type:

page = Infoboxer.wp.get('Argentina')

# Get all paragraphs on a page
page.paragraphs
# => list of all paragraph-level nodes in page

# And other basic node kinds:
page.wikilinks
page.external_links
page.images
page.tables
page.templates
page.lists
page.headings

# Refine your query:
page.headings(level: 3)

# Special shortcut for template names:
page.templates('see')
# or even
page.templates(/^Infobox/)

# Wikilinks namespace
page.wikilinks
# => only default namespace wikilinks

page.wikilinks('Category')
# => wikilinks in 'Category' namespace

page.wikilinks(nil)
# => all wikilinks in all the namespaces

# All of the methods above work not only for entire page, but for any
# node on it, like:
page.tables.first.images

Shorcuts for examining node style:

node = page.wikilinks.first
node.bold? # is this link INSIDE bold tag?
node.italic?
node.heading?
node.heading?(3) # is it inside heading level 3?

# (Slightly) more useful example:
Infoboxer.wp.get('Einstein (disambiguation)').
  wikilinks.select(&:bold?)
# => only bold disambiguation links,
#    which typically mark the most common use(s) of this word

See API docs for full list of methods available.

Any more ideas? Drop me a line! (Or pull request ;)

Next: Navigating by sections