Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: preprocessing transformation priorities #32

Open
jitingxu1 opened this issue Mar 20, 2024 · 2 comments
Open

feat: preprocessing transformation priorities #32

jitingxu1 opened this issue Mar 20, 2024 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed

Comments

@jitingxu1
Copy link
Collaborator

jitingxu1 commented Mar 20, 2024

Building upon the deliverables outlined in issue #19, the objective is to enhance the coverage of ibisml machine learning preprocessing transformations, prioritizing key areas for improvement.

Please share your favorite ML transformation for your daily ML tasks and provide additional context as to why you find it particularly useful.

Assumption

  • Raw feature creation is done using ibis
  • tabular data

Priority definition:

  • P0: Essential tasks vital for the model development, Essential before our initial release.

  • P1: Desirable tasks that can enhance the model, based on feedback and further optimization.

  • P2: Additional tasks aimed at improving the model, based on feedback and further optimization.

Priorities

Preprocessing Module Ibis-ml Step sklearn Priority Status Note Model Needed
Encoding CatgoricalEncode OrdinalEncoder P0 Done
Encoding CountEncode P1 Done
Feature Engineering CreatePolynomialFeatures PolynomialFeatures P0 Done
Non-linear Transformation Math Transformation (log, sqrt,) P1 Done ibis
Standardization and Scaling ScaleStandard StandardScaler P0 Done KNN, MLPBased, SVM
Encoding TargetEncode TargetEncoder P0 Done
Feature Reduction DropZeroVariance VarianceThreshold P0 Done
Imputing HandleUnivariateOutliers SimpleImputer P0 Done
Feature Engineering ratio variable creation P0 Done ibis
Discretition DiscretizeKBins KBinsDiscretizer P0 Done
Discretition Feature binarization Binarizer P1 Done
Standardization and Scaling ScaleMinMax MinMaxScaler P0 Done KNN, MLPBased, SVM
Custom Transformer Custom transform FunctionTransformer P0 Done
Encoding OneHotEncode OneHotEncoder P0 Done
Imputing Outlier - Impute and capping P0 Done Log/Linear Reg
Feature Reduction Continuous Target Mutual Info P1 Not started
Feature Reduction Discrete Target Mutual information P1 Not started
Feature Engineering - Text Count Transfomer CountVectorizer P2 Not started
Feature Engineering - Text TFIDF Transformer TfidfTransformer P2 Not started
Encoding label binarizer LabelBinarizer P2 Not started
Encoding label encode LabelEncoder P2 Not started
Standardization and Scaling MaxAbsScaler MaxAbsScaler P2 Not started
Standardization and Scaling RobustScaler RobustScaler P1 Not started KNN, MLPBased, SVM
Imputing Missing value - Nearest Neighbor KNNImputer P1 Not started Doable
Non-linear Transformation QuantileTransformer QuantileTransformer P1 Not started
Non-linear Transformation Inverse and Logit transformation P2 Not started
Imputing Missing value - Linear reg P1 Not started Not Support
Imputing Missing value - bagged trees P1 Not started Not Support
Feature Reduction Filter col with missing rate threshold P1 Not started
Feature Reduction Filter Feature by high correlation P2 Not started Doable
Non-linear Transformation PowerTransformer PowerTransformer P1 Not started MLPBased, SVM
Feature Reduction PCA P1 Not started Not Support
Imputing Missing Value - rolling window Imputing P2 Not started
Feature Engineering Spline transformer SplineTransformer P1 Not started

Reference:

@zhenzhongxu
Copy link

@jitingxu1 @deepyaman can we ensure this is up to date? Thank you.

@deepyaman
Copy link
Collaborator

@jitingxu1 @deepyaman can we ensure this is up to date? Thank you.

@zhenzhongxu This is up-to-date already!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed
Projects
Status: No status
Development

No branches or pull requests

4 participants