Improve type detection mechanism for IDs #88

dorisjlee · 2020-09-18T10:40:41Z

In Lux, we detect attributes that look like an ID and avoid visualizing them.

There are several issues related to the current type detection mechanisms:

The function check_if_id_like needs to be improved so that we are not relying on attribute_contain_id check too much, i.e. even if the attribute name does not contain ID but looks like an ID, we should still label it as an ID. The cardinality check almost_all_vals_unique is a good example since most ID fields are largely unique. Another check we could implement is checking that the ID is spaced by a regular interval (e.g., 200,201,202,...), this is somewhat of a weak signal, since it not a necessary property of ID.

BUG: We only trigger ID detection currently if the data type of the attribute is detected as an integer (source). We should fix this bug so that string attributes that are ID like (e.g., a CustomerID in the Churn dataset like "7590-VHVEG") are also detected as IDs.

Some test data can be found here, feel free to find your own on Kaggle or elsewhere. For a pull request, please include tests to try out the bugfix on several datasets to verify that ID fields are being detected and that non-ID fields are not detected.

The text was updated successfully, but these errors were encountered:

* string id detection bug (#88) * remove bolded Filter description * id type visualized as nominal * expanded nominal integer type criteria * added additional type tests * making univariate sorted but not top-k

dorisjlee added bug Something isn't working easy Easy to fix; Good issues for newcomers labels Sep 18, 2020

dorisjlee linked a pull request Jan 18, 2021 that will close this issue

Id function improvised #234

Merged

dorisjlee closed this as completed in #234 Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve type detection mechanism for IDs #88

Improve type detection mechanism for IDs #88

dorisjlee commented Sep 18, 2020 •

edited

Loading

Improve type detection mechanism for IDs #88

Improve type detection mechanism for IDs #88

Comments

dorisjlee commented Sep 18, 2020 • edited Loading

dorisjlee commented Sep 18, 2020 •

edited

Loading