Skip to content

prince-sw/ensemble-fs

Repository files navigation

feature-selection

Introduction

In this project, we have performed performance analysis of fifteen feature selection methods by comparing 'accuracy' performance metric of each method over five classification algorithms. We have used ten publicly available datasets for this purpose.

Feature Selection Methods Used:

  1. Pair-wise Correlation
  2. Regularized Self Representation
  3. Variance Threshold
  4. Logistic Regression based selection
  5. Random Forest (Gini importance)
  6. Boruta Algorithm
  7. LASSO Algorithm
  8. Extra Tree Classifier
  9. Mutual Information Classifier
  10. Chi-Square Test
  11. Recursive Feature Elimination with RF
  12. Correlation
  13. Cosine Similarity and Standard deviation with Exponent
  14. Laplacian Score
  15. Iterative Laplacian Score

Classification Algorithms Used:

  1. Decision Trees
  2. Logistic Regression
  3. Random Forest
  4. KNN
  5. Naive Bayes

Datasets Used:

  1. Iris
  2. Breast Cancer
  3. Pima Indians Diabetes
  4. Cirrhosis Prediction
  5. Parkinson's Disease
  6. Heart Disease
  7. Sonar
  8. Stroke Prediction
  9. Wine Quality
  10. Abalone

Results:

Two screenshots of the obtained results are given below.

12

Screenshot 2023-01-09 194959

K is the number of best features taken. k=2 implies 2 best features given by each feature selection methods have been used to perform classification, based on which accuracy was calculated.

Accuracy = (TP + TN)/(TP + TN + FP + FN): where T is True, F is False, P is Positive and N is Negative.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published