-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Wikidata id as id where available #1
base: master
Are you sure you want to change the base?
Conversation
We're going to look up Wikidata IDs for each of the links on the page.
eda729c
to
72dec66
Compare
For every link to a suitable page in the same wiki (ie not a Special page, or one in a different namespace) look up its Wikidata ID, and attach that as an attribute of the link, suitable for the scraper to extract again. wikidata-fetcher neatly encapsulates that lookup for us already, including doing them in suitably sized batches.
If we can finda a wikidata ID for the linked page, set it as the ID. This will protect us slightly from wikipedia-level renamings.
72dec66
to
7fceae7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good! Just a couple of queries about the dependencies.
@@ -21,3 +21,5 @@ gem 'scraperwiki', github: 'openaustralia/scraperwiki-ruby', | |||
gem 'table_unspanner', github: 'everypolitician/table_unspanner' | |||
gem 'vcr' | |||
gem 'webmock' | |||
gem 'wikidata-fetcher', '>= 0.19.1', github: 'everypolitician/wikidata-fetcher' | |||
gem 'wikisnakker', github: 'everypolitician/wikisnakker' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like wikisnakker
is being used at all here, is it actually needed?
@@ -21,3 +21,5 @@ gem 'scraperwiki', github: 'openaustralia/scraperwiki-ruby', | |||
gem 'table_unspanner', github: 'everypolitician/table_unspanner' | |||
gem 'vcr' | |||
gem 'webmock' | |||
gem 'wikidata-fetcher', '>= 0.19.1', github: 'everypolitician/wikidata-fetcher' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell we're only using the wikidata-client
dependency from wikidata-fetcher
, so perhaps it would be better to have a dependency on that instead?
If we can finda a wikidata ID for the linked page, set it as the ID. This
will protect us slightly from wikipedia-level renamings.