# **Data preprocessing**

## 1. Feature extraction  

> DictVectorizer  
> FeatureHasher  

## 2. Data cleaning  

Handling missing values (sklearn.impute)  

> SimpleImputer  
> KNNImputer  

Note: MissingIndicator provides indicators for missing values.  


## Numeric transformers  

> Feature scaling  
> Polynomial transformation  
> Discretization  

### Feature scaling  

> StandardScaler  
> MinMaxScaler   
> MaxAbsScaler  
> FunctionTransformer (use custom function for scaling. Eg. log2)  

### Polynomial transformation  

Generates a new feature matrix consisting of all polynomial
combinations of the features with degree less than or equal
to the speciﬁed degree.  

> PolynomialFeatures(degree=n)  
>> where 'n' is the desired degree of polynomial  

### Discretization  

> KBinsDiscretizer(n_bins=n, strategy='uniform', encode='ordinal')  


## Categorical transformers  
> Feature encoding  
> Label encoding  

Transformers  
> OneHotEncoder  
> LabelEncoder (can transform only one-dimensional data)  
> OrdinalEncoder (can operate on multidimensional data)  
> LabelBinarizer (can transform only one-dimensional data)  
> MultiLabelBinarizer  

> add_dummy_feature

# **Feature selection**  

`sklearn.feature_selection`

> Filter based  
> Wrapper based  

Note: Tree based and kernel based feature selection algorithms
will be covered in later weeks.  

### Filter based feature selection methods  

> VarianceThreshold  

> Univariate feature selection  

>> SelectKbest  
>> SelectPercentile  

>> SelectFpr (false postive rate)  
>> SelectFdr (false discovery rate)  
>> Select Fwe (family-wise error rate)  

>> GenericUnivariateSelect (can implement any of the above five methods by using the kwarg 'mode')  

Each of the above univariate feature selection methods use some common univariate statistical tests and univariate scoring functions. They do these tests/scoring by comparing a single feature with the label.  

Univariate scoring function:  
Three classes of scoring functions proposed:  

> Mutual information (MI)  
>> mutual_info_regression  
>> mutual_info_classif  

> Chi-square  
>> chi2  

> F-statistics  
>> f_regression  
>> f_classif  

**Note**: MI and F-statistics can be used in both classification and regression problems whereas chi-square can be used only in classification problems.  

**IMPORTANT**: Do not use regression feature scoring
function with a classiﬁcation problem. It will
lead to useless results.

### Wrapper based feature selection methods  

**Note**: Unlike ﬁlter based methods, wrapper based
methods use estimator class rather than a
scoring function.  

> RFE (Recursive Feature Elimination) - Need to specify the number of features desired.  
>> RFECV (RFE with cross-validation) - Used when we do not want to specify the number of features desired.  

> `SelectFromModel`  

Note: Both of the above obtain feature importance from coef_, feature_importances_ or an importance_getter callable from the trained estimator  

> SFS (Sequential Feature Selection)  
>> Forward selection  
>> Backward selection  

Note: Choice of forward or backward depends on original number of features and desired number of features based on efficiency. May not yield equivalent results. Does not require the underlying model to expose a coef_ or feature_importances_ attributes unlike in RFE and SelectFromModel.  

## Applying transformations to diverse features  




Composite Transformer  
`sklearn.compose` 

> ColumnTransformer  
> TransformedTargetRegressor  

## **Dimensionality reduction**  


Another way to reduce the number of feature
is through unsupervised dimensionality
reduction techniques.  

`sklearn.decomposition`  

### PCA (Principle component analysis)  

`sklearn.decomposition.PCA`  



## **Chaining transformers**  


`sklearn.pipeline`  

> Pipeline  
>> Pipeline()  
>> make_pipeline  

> FeatureUnion  