Skip to content
zverok edited this page Dec 11, 2015 · 8 revisions

Retrieving Wikipedia pages with infoboxer

From Wikipedia

# One page
Infoboxer.wikipedia.get('Argentina')
# also aliased is Infoboxer.wp

# Several pages (in one API request)
Infoboxer.wikipedia.get('Argentina', 'Bolivia', 'Chile')

# From non-English Wikipedia
Infoboxer.wikipedia('fr').get('Argentine')
# or, if it looks cleaner for you
Infoboxer.wikipedia(:fr).get('Argentine')

From sister projects

Wikimedia sister projects are all the publicly available wikis operated by the Wikimedia Foundation, including Wikipedia.

Infoboxer.wiktionary.get('test')
Infoboxer.wikiquote.get('Vonnegut')
Infoboxer.commons.get('Category:Kittens')
Infoboxer.wikivoyage.get('Chiang Mai')

From Wikia wikis

Wikia hosts a lot of of interesting Wikis, all published under copyleft and very interesting to study (the largest and most complete of them created by books, TV shows and games fans). So, Infoboxer provides shortcut for this, too:

# Default language
Infoboxer.wikia('tardis').get('Eleventh Doctor')

# Other language:
Infoboxer.wikia('tardis', :fr).get('Onzième Docteur')

From any MediaWiki installation

As simple as that:

Infoboxer.wiki('http://mydomain.com').get('My Product')

Note: this assumes you have api.php installed as usual at /w/api.php. If it is not so, use slightly more verbose version with full api URL:

Infoboxer.wiki('http://mydomain.com/myapipath/api.php').get('My Product')

(New in 0.2.0!) Page lists

There are many "page list generators" in MediaWiki API, but Infoboxer currently supports only some of them. Though, quite useful:

Infoboxer.wp.category('Countries in South America')
# => list of pages from category

Infoboxer.wp.search('intitle:"List of tramway systems"')
# => list of pages corresponding to search request

Infoboxer.wp.prefixsearch('List of tramway systems')
# => list of pages with titles starting from request

Setting User-Agent header

(You should do it before any significant amount of data extraction, per Wikipedia terms):

UA = 'MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com)'

# All requests to all wikis will be with your User-Agent:
Infoboxer.user_agent = UA

# or, alternatively, just for one target site:
client = Infoboxer.wikipedia(user_agent: UA)

Next: Extracting data