You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In metadata auto-detection script: If a column is a real-world sdtype (such as latitude), then the auto-detection seems willing to make it a primary key even if its values are not unique. This is a problem because primary keys are all expected to be unique. Metadata auto-detection should never make something a primary key if it contains repeating values.
Steps to reproduce
In the example below, column latitude has repeating values. There is no primary key:
The metadata detection marks latitude as a primary key column even though it contains repeating values. This should not be possible. As a result, the validation fails.
metadata.validate_data(data)
InvalidDataError: The provided data does not match the metadata:
Key column 'latitude' contains repeating values: [-1.43, -12.66, -18.24, '+ 47 more']
This issue only happens with real-world sdtypes. If I change the name of latitude column to something else, then the detected sdtype is no longer latitude but numerical. In this case, the script correctly exempts this column from being a primary key.
The text was updated successfully, but these errors were encountered:
Environment Details
Error Description
In metadata auto-detection script: If a column is a real-world sdtype (such as
latitude
), then the auto-detection seems willing to make it a primary key even if its values are not unique. This is a problem because primary keys are all expected to be unique. Metadata auto-detection should never make something a primary key if it contains repeating values.Steps to reproduce
In the example below, column
latitude
has repeating values. There is no primary key:The metadata detection marks
latitude
as a primary key column even though it contains repeating values. This should not be possible. As a result, the validation fails.Other Context
Another rule for primary keys is that they must be non-null. (Seee Metadata auto-detection should not assign a primary key if there are NaN values in it #1740). This rule is always working for all sdtypes, as expected
This issue only happens with real-world sdtypes. If I change the name of
latitude
column to something else, then the detected sdtype is no longerlatitude
butnumerical
. In this case, the script correctly exempts this column from being a primary key.The text was updated successfully, but these errors were encountered: