Check this nice list of issues #69

kwinkunks · 2023-09-23T11:00:05Z

Nice list from pandas_dq:

It detects ID columns
It detects zero-variance columns
It identifies rare categories (less than 5% of categories in a column)
It finds infinite values in a column
It detects mixed data types (i.e. a column that has more than a single data type)
It detects outliers (i.e. a float column that is beyond the Inter Quartile Range)
It detects high cardinality features (i.e. a feature that has more than 100 categories)
It detects highly correlated features (i.e. two features that have an absolute correlation higher than 0.8)
It detects duplicate rows (i.e. the same row occurs more than once in the dataset)
It detects duplicate columns (i.e. the same column occurs twice or more in the dataset)
It detects skewed distributions (i.e. a feature that has a skew more than 1.0)
It detects imbalanced classes (i.e. target variable has one class more than other in a significant way)
It detects feature leakage (i.e. a feature that is highly correlated to target with correlation > 0.8)

The text was updated successfully, but these errors were encountered:

kwinkunks mentioned this issue Sep 23, 2023

Mention & differentiate from similar packages in README #65

Open

kwinkunks added the idea An idea to research and test label Sep 23, 2023

Provide feedback