Skip to content
zverok edited this page Jun 23, 2015 · 8 revisions

Retrieving Wikipedia pages with infoboxer

From Wikipedia

# One page
Infoboxer.wikipedia.get('Argentina')
# also aliased is Infoboxer.wp

# Several pages (in one API request)
Infoboxer.wikipedia.get('Argentina', 'Bolivia', 'Chile')

# From non-English Wikipedia
Infoboxer.wikipedia('fr').get('Argentine')
# or, if it looks cleaner for you
Infoboxer.wikipedia(:fr).get('Argentine')

From sister projects

Wikimedia sister projects are all the publicly available wikis operated by the Wikimedia Foundation, including Wikipedia.

Infoboxer.wiktionary.get('test')
Infoboxer.wikiquote.get('Vonnegut')
Infoboxer.commons.get('Category:Kittens')
Infoboxer.wikivoyage.get('Chiang Mai')

From Wikia wikis

Wikia hosts a lot of of interesting Wikis, all published under copyleft and very interesting to study (the largest and most complete of them created by books, TV shows and games fans). So, Infoboxer provides shortcut for this, too:

# Default language
Infoboxer.wikia('tardis').get('Eleventh Doctor')

# Other language:
Infoboxer.wikia('tardis', :fr).get('Onzième Docteur')

From any MediaWiki installation

As simple as that:

Infoboxer.wiki('http://mydomain.com').get('My Product')

Note: this assumes you have api.php installed as usual at /w/api.php. If it is not so, use slightly more verbose version with full api URL:

Infoboxer.wiki('http://mydomain.com/myapipath/api.php').get('My Product')

Setting User-Agent header

(You should do it before any significant amount of data extraction, per [Wikipedia terms|Wikipedia terms and conditions]):

UA = 'MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com)'

# All requests to all wikis will be with your User-Agent:
Infoboxer.user_agent = UA

# or, alternatively, just for one target site:
client = Infoboxer.wikipedia(user_agent: UA)

Next: Extracting data