In this project, we aim to apply different machine learning models onto two different datasets for binary classification. However, our scripts can be applied to different datasets.
- Loads datasets
- Cleans data
- Splits datasets into training and test datasets
- Additionnaly to cleaned data, outputs datasets on which we operated either tsne or PCA
We implement different models using Sklearn. The training is based on a GridSearch model that helps find the optimum parameters for a maximum accuracy across multiple validation sets using cross-validation. It contains the following models:
- SVM
- KNN
- Log regression
- Stochastic gradient descent
- Neural networks
- Random forests
- Decision Trees
All these models are trained using the gridsearch method.
In the program main.py
, we output the best accuracies reached by each model (fitted using gridsearch) and the corresponding
optimum parameters.
- Imane Momayiz
- Guillaume Sallé
- Romain Namyst
- Thomas Kronland
- Karel Kedemos