-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more data on governors from Pombola #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions more for my curiosity aside this looks good.
scraper.rb
Outdated
popolo = JSON.parse(open(url, &:read)) | ||
pombola_id_to_id = ID_TO_POMBOLA_ID.collect do |t| | ||
["core_person:#{t[1]}", t[0]] | ||
end.to_h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not completely clear why you don't just store the data in this format in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, given map and collect are the same thing, at least AIUI, why collect here and map later in the block? (this may be my ignorance of ruby idiom showing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I don't think I had any good reason for doing that. I've done as you suggest and made a more helpful mapping in the first place :)
|
||
field :email do | ||
pombola_contact_details_of_type('email') | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was having a look at this there was the odd thing in email that wasn't actually a valid email e.g. 'foo'. I wonder if some basic sanity checking here might be in order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I've added a commit which does some stricter checking of the various contact details (and which makes some attempt to make sure URLs have a protocol, etc.)
We can get much richer data on these governors by matching some of them with their entries in Pombola. (There are two who I can't find in Pombola, though.) This commit introduces a hardcoded mapping between these governors and their IDs in Pombola, and the following fields based on Pombola data: email phone twitter facebook_url birth_date gender identifier__shineyoureye image_url website_url other_names
dacb9f8
to
06e404b
Compare
We can get much richer data on these governors by matching some of
them with their entries in Pombola. (There are two who I can't find in
Pombola, though.)
This commit introduces a hardcoded mapping between these governors and
their IDs in Pombola, and the following fields based on Pombola data:
email
phone
twitter
facebook_url
birth_date
gender
identifier__shineyoureye
image_url
website_url
other_names