Skip to content

Wikipedia terms and conditions

zverok edited this page Jun 24, 2015 · 4 revisions

When using Infoboxer for massive data extraction from Wikipedia, you should consider this:

  • Before using the data, you should consider Wikipedia's license. Here is some explanation of how to properly reuse the content

  • There's no official API request limits, and documentation explicitly states that

    If you make your requests in series rather than in parallel (i.e. wait for the one request to finish before sending a new request, such that you're never making more than one request at the same time), then you should definitely be fine.

  • Official documentation explicitly requires you to specify User-Agent header. Infoboxer provides some default header, but docs say:

    Don't use the default User-Agent provided by your client library, but make up a custom header that identifies your script or service and provides some type of means of contacting you (e.g., an e-mail address).

With Infoboxer, you do the latter like this:

UA = 'MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com)'

# All requests to all wikis will be with your User-Agent:
Infoboxer.user_agent = UA

# or, alternatively, just for one target site:
client = Infoboxer.wikipedia(user_agent: UA)
client.get('Argentina')