R package for Variable Selection, Curve Fitting, Variable Conversion, Normalisation and Accuracy Measures
This will include following:
- Information Value
- Gini Index
- Gini Impurity
- Entropy Gain
- Misclassification Error
- Variable Ranking Methods - voting / scoring / weighted scoring / weighted voting
- Generic Scoring Function (for Regressiona and Classification)
- Variable Inflation Factor
- Other Variable Impacts for regression
- Template for Curve Fitting for Contineous and Categorical Variable
- Curve Comparision Methods
- Curve Indentification
- Curve Tuning
- Curve to Normal Conversion
- Non - Curve / Random / Many Matching curve Decision Criterion
- Goodness of Fit Test: a) Kolomogorov- Simronov Test b) Carmer-Von Mises Test c) Anderson-Darling Test d) Shapiro -Wilk Test e) Chi-Squared Test f) Akaike Information Criterion (AIC) g) Hosmer - Lemeshow Test
-
Continuous to Categorical a) Range Binning b) WOE Criterion Binning c) Dependent Binning
-
Categrical to Contineous a) One - Hot Encoding with and without reference b) Label Encoding c) Weightage Encoding d) Boosted Encoding ( Based on CatBoost Methodology by Yandex)
- Unit Mean
- Unit SD
- Unit Mean And SD
- Min - Max
- Box-Cox
- Log
- Exponential
- Mean Difference
- Median Difference
- Mean Difference wiht SD
- Median Difference with SD
**Will also try to include predict function for applying variable conversion and normalisation on raw data.
- RMSE
- MAE
- MAPE
- R-squared
- AIC
- BIC
- AUC
- Kendall's Tau
- Gini Index
- Weights
- Extension to caret's ConfusionMatrix
**Will also try to include methods for finding best and/or biased limit for probablity cut-off of calssification problem