Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check this nice list of issues #69

Open
kwinkunks opened this issue Sep 23, 2023 · 0 comments
Open

Check this nice list of issues #69

kwinkunks opened this issue Sep 23, 2023 · 0 comments
Labels
idea An idea to research and test

Comments

@kwinkunks
Copy link
Member

Nice list from pandas_dq:

  1. It detects ID columns
  2. It detects zero-variance columns
  3. It identifies rare categories (less than 5% of categories in a column)
  4. It finds infinite values in a column
  5. It detects mixed data types (i.e. a column that has more than a single data type)
  6. It detects outliers (i.e. a float column that is beyond the Inter Quartile Range)
  7. It detects high cardinality features (i.e. a feature that has more than 100 categories)
  8. It detects highly correlated features (i.e. two features that have an absolute correlation higher than 0.8)
  9. It detects duplicate rows (i.e. the same row occurs more than once in the dataset)
  10. It detects duplicate columns (i.e. the same column occurs twice or more in the dataset)
  11. It detects skewed distributions (i.e. a feature that has a skew more than 1.0)
  12. It detects imbalanced classes (i.e. target variable has one class more than other in a significant way)
  13. It detects feature leakage (i.e. a feature that is highly correlated to target with correlation > 0.8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea An idea to research and test
Projects
None yet
Development

No branches or pull requests

1 participant