Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean database of UNKNOWN and validates against it #69

Open
dstufft opened this issue Oct 11, 2013 · 3 comments
Open

Clean database of UNKNOWN and validates against it #69

dstufft opened this issue Oct 11, 2013 · 3 comments
Labels
data quality needs discussion a product management/policy issue maintainers and users should discuss

Comments

@dstufft
Copy link
Member

dstufft commented Oct 11, 2013

Currently the database is littered with "UNKNOWN". This "helpfully" comes from distutils who will fill it in for a missing required value. We should strip these from the database and strip it from new incoming data.

@dstufft dstufft modified the milestone: Become PyPI Mar 14, 2015
@dstufft
Copy link
Member Author

dstufft commented Aug 4, 2015

The file upload API strips the UNKNOWN values from the request.POST data before doing anything else with it now.

@dstufft dstufft removed this from the Become PyPI milestone Aug 30, 2015
@nlhkabu nlhkabu added the requires triaging maintainers need to do initial inspection of issue label Jul 2, 2016
@berkerpeksag
Copy link
Contributor

Perhaps we can add a template filter to strip all UNKNOWN fields for now?

@miketheman
Copy link
Member

Still valid data quality issue.

A data migration job needs to exist that finds UNKNOWN and replaces with either empty strings or nulls - whichever the natural representation is for said column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality needs discussion a product management/policy issue maintainers and users should discuss
Projects
None yet
Development

No branches or pull requests

6 participants