New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of data cross-checking script #14

Open
showerst opened this Issue Oct 10, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@showerst
Copy link

showerst commented Oct 10, 2018

Just for fun and to test out the schema, i threw together a script that loops through our members, and cross-checks them against google Civic info API.

https://github.com/showerst/people/blob/vip/scripts/vip-crosscheck.py

I think this is the kind of thing we could set up to run once in a while and ping us about potential changes proactively. Maybe even auto-open issues.

Here's some sample output:

scripts/vip-crosscheck.py  ar "<my api key>"
Mismatch with VIP: Les A. Warren not found in ocd-division/country:us/state:ar/sldl:25. Our names are: Les Warren
Mismatch with VIP: Vacant not found in ocd-division/country:us/state:ar/sldl:45. Our names are: Jeremy Gillam
Mismatch with VIP: R. Trevor Drown not found in ocd-division/country:us/state:ar/sldl:68. Our names are: Trevor Drown
Mismatch with VIP: Dave Wallace not found in ocd-division/country:us/state:ar/sldu:22. Our names are: David Wallace
Mismatch with VIP: Charlotte Vining Douglas not found in ocd-division/country:us/state:ar/sldl:75. Our names are: Charlotte V. Douglas
Mismatch with VIP: Vacant not found in ocd-division/country:us/state:ar/sldu:33. Our names are: Jeremy Hutchinson
Mismatch with VIP: Milton Nicks Jr. not found in ocd-division/country:us/state:ar/sldl:50. Our names are: Milton Nicks, Jr.
@csnardi

This comment has been minimized.

Copy link
Contributor

csnardi commented Dec 12, 2018

Perhaps this could also be made easier with something like automatic PRs or issue generation when a scraper encounters differing information from what's in this repo? I know the people/committee scrapers have been turned off for now, but it seems impractical to expect users to run the scrapers on their own to manually update the data.

@showerst

This comment has been minimized.

Copy link

showerst commented Dec 17, 2018

@csnardi yeah I like this idea. Eventually we'll have to have a better system than "learn how to setup openstates, uncomment the people scrapers, fix an errors, run a local scrape, and then convert the files" as well, we just haven't had the hours to get things totally mature yet. Hopefully this hacky system with get us through january then we can spend some time improving the tooling significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment