You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Lux, we detect attributes that look like an ID and avoid visualizing them.
There are several issues related to the current type detection mechanisms:
The function check_if_id_like needs to be improved so that we are not relying on attribute_contain_id check too much, i.e. even if the attribute name does not contain ID but looks like an ID, we should still label it as an ID. The cardinality check almost_all_vals_unique is a good example since most ID fields are largely unique. Another check we could implement is checking that the ID is spaced by a regular interval (e.g., 200,201,202,...), this is somewhat of a weak signal, since it not a necessary property of ID.
BUG: We only trigger ID detection currently if the data type of the attribute is detected as an integer (source). We should fix this bug so that string attributes that are ID like (e.g., a CustomerID in the Churn dataset like "7590-VHVEG") are also detected as IDs.
Some test data can be found here, feel free to find your own on Kaggle or elsewhere. For a pull request, please include tests to try out the bugfix on several datasets to verify that ID fields are being detected and that non-ID fields are not detected.
The text was updated successfully, but these errors were encountered:
* string id detection bug (#88)
* remove bolded Filter description
* id type visualized as nominal
* expanded nominal integer type criteria
* added additional type tests
* making univariate sorted but not top-k
In Lux, we detect attributes that look like an ID and avoid visualizing them.
![image](https://user-images.githubusercontent.com/5554675/93588377-e4cdc280-f9dd-11ea-8de5-0ed2757d3333.png)
![](https://user-images.githubusercontent.com/5554675/93588430-fadb8300-f9dd-11ea-84a1-ba985fe9410a.png)
There are several issues related to the current type detection mechanisms:
check_if_id_like
needs to be improved so that we are not relying onattribute_contain_id
check too much, i.e. even if the attribute name does not contain ID but looks like an ID, we should still label it as an ID. The cardinality checkalmost_all_vals_unique
is a good example since most ID fields are largely unique. Another check we could implement is checking that the ID is spaced by a regular interval (e.g., 200,201,202,...), this is somewhat of a weak signal, since it not a necessary property of ID.BUG: We only trigger ID detection currently if the data type of the attribute is detected as an integer (source). We should fix this bug so that string attributes that are ID like (e.g., a CustomerID in the Churn dataset like "7590-VHVEG") are also detected as IDs.Some test data can be found here, feel free to find your own on Kaggle or elsewhere. For a pull request, please include tests to try out the bugfix on several datasets to verify that ID fields are being detected and that non-ID fields are not detected.
The text was updated successfully, but these errors were encountered: