Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data quality #18

Open
exussum12 opened this issue Aug 10, 2020 · 7 comments
Open

Data quality #18

exussum12 opened this issue Aug 10, 2020 · 7 comments

Comments

@exussum12
Copy link

There appears to be some places wih issues with their location

Eg
Concept catering c/o Chester golf club,CH2 8AR

Should be CH4 8AR

Bijou by the Sea,AB56 5DJ

Should be AB56 4DJ

Looks to be ~50 places where the postcode is incorrect

@barrychocolate
Copy link

barrychocolate commented Aug 17, 2020

I have 96 establishments where i can't match the postcode to the National Statistics Postcode Lookup.

@kmpoppe
Copy link

kmpoppe commented Aug 19, 2020

@exussum12 Those entries seem to be typos quite clearly (I always wonder how that can happen in a digital process but oh well 🤷 )
@barrychocolate aren't there establishments that have "nice" postcodes that work with Royal Mail but aren't listed in the NSPL because they wouldn't normally exist?

@exussum12
Copy link
Author

@kmpoppe I agree, wasnt sure if they can be fixed though (sending a PR doesnt seem like it would help as the CSV is likely the output of some other data, rather than a true master source)

Some other missing postcodes actually do seem legit, for example some of the Trafford Center in Manchester from memory had a postcode I could cross reference with google, but didnt exist in the ONS postcode data.

@barrychocolate
Copy link

@barrychocolate aren't there establishments that have "nice" postcodes that work with Royal Mail but aren't listed in the NSPL because they wouldn't normally exist?

It seems that way. I will try using OS Codepoint and see if that is any better.

Also, the reason for the typos is that while an address lookup facility is available for those registering for the scheme, the user also has the ability to manual enter or modify an address. I suspect this is the reason for some of the data quality issues we are seeing.

@kmpoppe
Copy link

kmpoppe commented Aug 23, 2020

So, I've been fiddling around with the data a lot, there are around 450 establishments with invalid Postcodes as per Codepoint Open. I'll go ahead and aggregate all the stuff we've got here, once #3 gets resolved in a fashion that makes it Crown Copyright or anything I can make my crunching public ;)

@barrychocolate
Copy link

I tried Codepoint Open but found the biggest drawback with using Codepoint Open is that it only covers GB. There are 423 establishments with a Northern Irish BT postcode that Codepoint won't match..

When I used the Office of National Statistics Postcode Directory (which includes terminated postcodes which businesses may still use) it has a better match rate with only 98 unmatched. Providing the UPRNs (where they have them) would likely solve some of these unmatched. So that is what i have stuck with for my project.

@kmpoppe
Copy link

kmpoppe commented Aug 23, 2020

@barrychocolate I've set up a MongoDB Cloud with the data, would you like to work on that with what you have? Feel free to contact me directly, twitter and telegram, see here :-)
Kai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants