Genetic Algorithm -feature-selection

An application of a genetic algorithm for the determination of the best features for data classification, using Python's scikit-learn API and Pandas. The dataset provided consists of 5-second audio frames of the steps of two people. The original dataset has elements with 34 features of audio belonging to two classes, but it can be used with any other dataset provided it's sorted like "dataset_steps.csv" (elements in the rows and features in the columns).

The genetic algorithm for feature selection consists of:

Cromosome objects that contain the dataset with a unique number of the original features, which have:
- A fitness method which calcuates fitness using the accuracy of the Random Forest Classifier algorithm.
- A breeding method which reproduces two cromosomes using a random crossover point with a mutation rate.
A function for the parent selection criteria, which uses Roulette Wheel Selection to chose two cromosomes to generate a child cromosome. (used to create 90% of the next generation)
Elitism to generate 10% of the next generation.

The accuracy with the original dataset using all 34 features was of 86.77%. The accuracy of the best individual after running the genetic algorithm for 100 generations was 91.73% with 15 features (for the run in the jupyter notebook). However, the algorithm might have to be run multiple times for better results in case a fast convergence occurs.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
dataset_steps.csv		dataset_steps.csv
feature-select.ipynb		feature-select.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genetic Algorithm -feature-selection

About

Releases

Packages

Languages

matflow/GA-feature-selection

Folders and files

Latest commit

History

Repository files navigation

Genetic Algorithm -feature-selection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages