- Choose any dataset for multiclass classification on Kaggle (go to "Datasets" section, choose "Filter" and enter "multiclass classification" into the "Tags" field).
- Perform classification with few methods. I expect you to use at least SVM (linear and rbf) and random forest.
- Try getting the best result with each of the methods. I expect you to use at least GridSearch for hyperparameters tuning.
- Try feature engineering. I expect you to use at least PCA for dimensionality reduction.
- Calculate accuracy and confusion matrix for each of the methods.
- Draw conclusions. Which method is the best? Why? If the dataset has any articles linked, compare your results with the state of the art.
- I expect a confident usage of sklearn methods.
- I expect understanding of basics of models assessment.
- I expect you to be able to learn PCA method on your own.
- I expect the ability of succinct, cohesive, and coherent expression of your thoughts, i.e. clearly state (in a few sentences) what is the problem you are solving, what approaches do you propose, and what conclusions can be drawn regarding these approaches in the context of the problem.