Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim leading zeroes from house number #110

Merged
merged 5 commits into from May 19, 2016

Conversation

trescube
Copy link
Contributor

@trescube trescube commented May 9, 2016

There are OA house numbers that are prefixed with 0 that can cause lower scores for inputs that wouldn't contain these leading 0's. For example: 07741 West Crocus Drive and 02 West Dawn Drive.

This PR also refactors the functional tests to be less "real world" dependent which really didn't exercise much of the import pipeline.

Fixed #109

Previously, only literal `0` house numbers were filtered out in USA/CAN.  There are presently ~67k US/CA OA records with house numbers reduceable to `0` above and beyond literal `0`.  This removes them.
Over 120k OA addresses have leading 0's that can cause search and display issues.  This fix trims them away.
@missinglink
Copy link
Member

missinglink commented May 10, 2016

nitpick: it looks like #108 is included in this PR, you could target the merge of this code in to remove-house-numbers-reducable-to-0 instead of master to make it easier to review.

something happened to Travis although it seems to be their fault No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself

👍

The functional tests were a combination of two OA data files containing several thousand entries and a 300k line JSON file of the expected results.  This made test maintenance very tedious and the tests only exercised a small part of the pipeline.  I lovingly crafted 2 artisanal data files in Brooklyn with fictitious entries that contained all the cases that the overall stream should affect.  The functional test also had deduplication and admin lookup turned off, so this PR makes those two pieces of behavior injectable.
@orangejulius
Copy link
Member

orangejulius commented May 12, 2016

So the intention behind the massive functional test is that it would catch interesting issues now and then. After a few months I don't think we've seen any case other than "almost every expected value has to be updated to reflect a known change we made". Should we get rid of them in the other importers too?

👍

@dianashk
Copy link
Contributor

I think we should keep them until we have time to revisit and create smaller, representative functional test files for each, like Stephen has done here.

@trescube trescube merged commit 5ef606b into master May 19, 2016
@orangejulius orangejulius deleted the trim-leading-zeroes-from-house-number branch May 29, 2016 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Correct house numbers prefixed with 0
4 participants