Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Wikidata id as id where available #1

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tmtmtmtm
Copy link
Contributor

If we can finda a wikidata ID for the linked page, set it as the ID. This
will protect us slightly from wikipedia-level renamings.

We're going to look up Wikidata IDs for each of the links on the page.
For every link to a suitable page in the same wiki (ie not a Special page, or one in a different namespace)
look up its Wikidata ID, and attach that as an attribute of the link, suitable for the scraper to extract again.

wikidata-fetcher neatly encapsulates that lookup for us already, including
doing them in suitably sized batches.
If we can finda a wikidata ID for the linked page, set it as the ID.
This will protect us slightly from wikipedia-level renamings.
Copy link
Collaborator

@chrismytton chrismytton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good! Just a couple of queries about the dependencies.

@@ -21,3 +21,5 @@ gem 'scraperwiki', github: 'openaustralia/scraperwiki-ruby',
gem 'table_unspanner', github: 'everypolitician/table_unspanner'
gem 'vcr'
gem 'webmock'
gem 'wikidata-fetcher', '>= 0.19.1', github: 'everypolitician/wikidata-fetcher'
gem 'wikisnakker', github: 'everypolitician/wikisnakker'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like wikisnakker is being used at all here, is it actually needed?

@@ -21,3 +21,5 @@ gem 'scraperwiki', github: 'openaustralia/scraperwiki-ruby',
gem 'table_unspanner', github: 'everypolitician/table_unspanner'
gem 'vcr'
gem 'webmock'
gem 'wikidata-fetcher', '>= 0.19.1', github: 'everypolitician/wikidata-fetcher'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell we're only using the wikidata-client dependency from wikidata-fetcher, so perhaps it would be better to have a dependency on that instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants