Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape the new 6th Legislative Council pages instead #1

Merged
merged 1 commit into from Oct 26, 2016

Conversation

mhl
Copy link
Contributor

@mhl mhl commented Oct 24, 2016

There has recently been an election in Hong Kong. We have archived the
data from the 5th Legislative Council, but now need to update the scraper
to fetch members from the new 6th Legislative Council instead.

(Note for the pull request, not in the commit mesage: the plan is to deploy
this under the everypolitician-scrapers account on morph.io.)

Copy link
Contributor

@tmtmtmtm tmtmtmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discovered that it's generally only sensible to have the scraper generate the terms table if the data in it is actually being scraped from that source. Otherwise, it's a lot better to just have the terms table in everypolitician-data maintained there. Switching this to a manually maintained file would avoid most of the work here, and be a lot easier to change in future.

@tmtmtmtm
Copy link
Contributor

Also, if the database needs to be deleted before run, this is a good opportunity to move the scraper on morph from Struan's account into the everypolitician-scrapers one (the only value in keeping it in an existing personal account is if there's history that would go missing on a fresh run). Then anyone in the team will be able to stop / start it, set the necessary morph vars, etc.

@mhl
Copy link
Contributor Author

mhl commented Oct 25, 2016

OK, I've archived the terms table with just term 5 in this pull request: everypolitician/everypolitician-data#19158

There was an election in September and the 5th Legislative Council is
over; the scraper also no longer worked because since then the pages
for the 5th term have been changed to remove links to the people
pages.

This commit changes the scraper to scrape the 6th term pages instead.
@mhl mhl changed the title Updates for the 6th Legislative Council Scrape the new 6th Legislative Council pages instead Oct 25, 2016
@mhl
Copy link
Contributor Author

mhl commented Oct 25, 2016

@tmtmtmtm I've revised this - it's a much simpler change now.

Copy link
Contributor

@tmtmtmtm tmtmtmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tmtmtmtm tmtmtmtm merged commit fe4dc11 into master Oct 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants