Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more data on governors from Pombola #1

Merged
merged 1 commit into from
Feb 27, 2017
Merged

Add more data on governors from Pombola #1

merged 1 commit into from
Feb 27, 2017

Conversation

mhl
Copy link
Contributor

@mhl mhl commented Feb 23, 2017

We can get much richer data on these governors by matching some of
them with their entries in Pombola. (There are two who I can't find in
Pombola, though.)

This commit introduces a hardcoded mapping between these governors and
their IDs in Pombola, and the following fields based on Pombola data:

email
phone
twitter
facebook_url
birth_date
gender
identifier__shineyoureye
image_url
website_url
other_names

Copy link
Member

@struan struan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions more for my curiosity aside this looks good.

scraper.rb Outdated
popolo = JSON.parse(open(url, &:read))
pombola_id_to_id = ID_TO_POMBOLA_ID.collect do |t|
["core_person:#{t[1]}", t[0]]
end.to_h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not completely clear why you don't just store the data in this format in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, given map and collect are the same thing, at least AIUI, why collect here and map later in the block? (this may be my ignorance of ruby idiom showing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I don't think I had any good reason for doing that. I've done as you suggest and made a more helpful mapping in the first place :)


field :email do
pombola_contact_details_of_type('email')
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was having a look at this there was the odd thing in email that wasn't actually a valid email e.g. 'foo'. I wonder if some basic sanity checking here might be in order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I've added a commit which does some stricter checking of the various contact details (and which makes some attempt to make sure URLs have a protocol, etc.)

We can get much richer data on these governors by matching some of
them with their entries in Pombola. (There are two who I can't find in
Pombola, though.)

This commit introduces a hardcoded mapping between these governors and
their IDs in Pombola, and the following fields based on Pombola data:

  email
  phone
  twitter
  facebook_url
  birth_date
  gender
  identifier__shineyoureye
  image_url
  website_url
  other_names
@mhl mhl merged commit 06e404b into master Feb 27, 2017
@octopusinvitro octopusinvitro deleted the add-pombola-data branch March 6, 2017 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants