Skip to content

An application of a genetic algorithm for the determination of the best features for data classification. The dataset provided consists of 5-second audio frames of the steps of two people.

Notifications You must be signed in to change notification settings

matflow/GA-feature-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Genetic Algorithm -feature-selection

An application of a genetic algorithm for the determination of the best features for data classification, using Python's scikit-learn API and Pandas. The dataset provided consists of 5-second audio frames of the steps of two people. The original dataset has elements with 34 features of audio belonging to two classes, but it can be used with any other dataset provided it's sorted like "dataset_steps.csv" (elements in the rows and features in the columns).

The genetic algorithm for feature selection consists of:

  • Cromosome objects that contain the dataset with a unique number of the original features, which have:
    • A fitness method which calcuates fitness using the accuracy of the Random Forest Classifier algorithm.
    • A breeding method which reproduces two cromosomes using a random crossover point with a mutation rate.
  • A function for the parent selection criteria, which uses Roulette Wheel Selection to chose two cromosomes to generate a child cromosome. (used to create 90% of the next generation)
  • Elitism to generate 10% of the next generation.

The accuracy with the original dataset using all 34 features was of 86.77%. The accuracy of the best individual after running the genetic algorithm for 100 generations was 91.73% with 15 features (for the run in the jupyter notebook). However, the algorithm might have to be run multiple times for better results in case a fast convergence occurs.

About

An application of a genetic algorithm for the determination of the best features for data classification. The dataset provided consists of 5-second audio frames of the steps of two people.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published