# Feature Engineering in Python

- [link](https://towardsdatascience.com/feature-engineering-in-python-2fdb9bb8ee7a/);

> Applied machine learning is basically feature engineering. (Andrew Ng)

- Common feature engineering - handling missing values, handling outliers, binning numerical values, encoding categorical features, numerical transformations, scaling numerical features, extracting parts of the date;
- [Feature engineering tutorials](https://github.com/rasgointelligence/feature-engineering-tutorials) - this is the most important part; doing the tutorials (w/ optional comments):

- [x] Feature Profiling
    - [x] [pandas-profiling](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-profiling/pandas-profiling.ipynb)
    - [x] [SweetViz](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-profiling/SweetViz-profiling.ipynb)
- [ ] Data Cleaning
    - [x] [Missing Data](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/data-cleaning/pandas-missing-data.ipynb)
    - [x] [Duplicate Data](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/data-cleaning/pandas-duplicate-data.ipynb)
    - [x] [Data Type Mismatch](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/data-cleaning/pandas-data-type-mismatch.ipynb)
    - [x] [Date Gaps in Time Series](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/data-cleaning/pandas-date-gaps.ipynb) - need to look at this again
- [ ] Feature Transformation
    - [ ] Time-series
        - [ ] [Lag](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/time-series/pandas-lag.ipynb)
        - [ ] [Moving Average](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/time-series/pandas-moving-average.ipynb)
        - [ ] [Weekly Resampled Aggregation](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/time-series/pandas-aggregate-weekly.ipynb)
        - [ ] [Weekly Rolling Aggregation](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/time-series/pandas-aggregate-rolling-weekly.ipynb)
        - [ ] [Velocity and Acceleration](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/time-series/pandas-velocity-acceleration.ipynb)
        - [ ] [Energy](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/time-series/pandas-energy.ipynb)
        - [ ] [Mean Difference](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/time-series/pandas-mean-difference.ipynb)
        - [ ] [Mean Absolute Difference](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/time-series/pandas-mean-absolute-difference.ipynb)
        - [ ] [tsfresh](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/time-series/tsfresh.ipynb)
    - [ ] Categorical
        - [ ] [One-hot encoding](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/categorical/one-hot-encoding.ipynb)
        - [ ] [Target encoding](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/categorical/target-encoding.ipynb)
        - [ ] [Leave One Out encoding](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/categorical/leave-one-out-encoding.ipynb)
    - [ ] Numerical
        - [ ] [Standard scaler](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/numerical/standard-scaler.ipynb)
        - [ ] [Min-Max scaler](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/numerical/min-max-scaler.ipynb)
        - [ ] [Robust scaler](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-transformation/numerical/robust-scaler.ipynb)
- [ ] Model Selection
    - [ ] Train-Test Split
        - [ ] [Time Series Split](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/sklearn-time-series-split.ipynb)
        - [ ] [Train-Test Split](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/sklearn-train-test-split.ipynb)
        - [ ] K-Fold or Cross-Validation
            - [ ] [Random](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/sklearn-cross-validation-split.ipynb)
            - [ ] [Stratified](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/sklearn-stratified-cross-validation-split.ipynb)
            - [ ] [Group](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/sklearn-group-cross-validation-split.ipynb)
    - [ ] Model Comparison
        - [ ] [PyCaret](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/model-comparison/pycaret-comparison.ipynb)
    - [ ] Model Training
        - [ ] Catboost
            - [ ] [Classification](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/model-training/catboost-classification.ipynb)
            - [ ] [Regression](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/model-training/catboost-regression.ipynb)
    - [ ] Model Metrics
        - [ ] Binary Classification
            - [ ] [AUC](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/model-metrics/catboost-categorical-AUC.ipynb)
            - [ ] [Log Loss](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/model-metrics/catboost-categorical-logloss.ipynb)
        - [ ] Regression
            - [ ] [MAE](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/model-metrics/catboost-regression-mae.ipynb)
            - [ ] [MAPE](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/model-metrics/catboost-regression-mape.ipynb)
            - [ ] [RMSE](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/model-metrics/catboost-regression-rmse.ipynb)
            - [ ] [R^2](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/model-selection/model-metrics/catboost-regression-r2.ipynb)
- [ ] Feature Importance
    - [ ] [Scikit-learn](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-importance/Sklearn%20Feature%20Importance.ipynb)
    - [ ] [XGBoost](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-importance/XGBoost%20Feature%20Importance.ipynb)
    - [ ] [catboost](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-importance/Catboost%20Feature%20Importance.ipynb)
- [ ] Feature Selection
    - [ ] Model Agnostic
        - [ ] [Low Variance](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-selection/model-agnostic/Low%20Variance.ipynb)
        - [ ] Univariate Feature Selection
            - [ ] [F-test](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-selection/model-agnostic/F%20Test.ipynb)
            - [ ] [Mutual Information](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-selection/model-agnostic/Mutual%20Information.ipynb)
    - [ ] Model Based
        - [ ] _Lasso-based Selection (Coming soon)_
        - [ ] Feature Importance
            - [ ] [Scikit-learn Tree-based](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-selection/model-based/sklearn-feature-selection-gini.ipynb)
            - [ ] [Permutation Importance](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-selection/model-based/sklearn-feature-selection-permutation.ipynb)
            - [ ] [SHAP Values](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-selection/model-based/sklearn-feature-selection-shap.ipynb)
        - [ ] Sequential Feature Selection
            - [ ] Forward Stepwise Selection (Coming soon)
            - [ ] Backwards Stepwise Selection
                - [ ] [Scikit-learn Tree-based](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-selection/model-based/sequential-feature-selection/sklearn-backward-stepwise-selection.ipynb)
                - [ ] [catboost](https://github.com/rasgointelligence/feature-engineering-tutorials/blob/main/feature-selection/model-based/sequential-feature-selection/catboost-backward-stepwise-selection.ipynb)

 (the best way is just to fork it and use google collab for the exercises; [here](https://annas-archive.org/md5/bf5170bc02f59d108cf7ad20f03d11bf) you can find my fork)