Skip to content

Latest commit

 

History

History
13 lines (12 loc) · 1.18 KB

midterm_2.md

File metadata and controls

13 lines (12 loc) · 1.18 KB

Problem

  • Choose any dataset for multiclass classification on Kaggle (go to "Datasets" section, choose "Filter" and enter "multiclass classification" into the "Tags" field).
  • Perform classification with few methods. I expect you to use at least SVM (linear and rbf) and random forest.
  • Try getting the best result with each of the methods. I expect you to use at least GridSearch for hyperparameters tuning.
  • Try feature engineering. I expect you to use at least PCA for dimensionality reduction.
  • Calculate accuracy and confusion matrix for each of the methods.
  • Draw conclusions. Which method is the best? Why? If the dataset has any articles linked, compare your results with the state of the art.

Grading criteria:

  • I expect a confident usage of sklearn methods.
  • I expect understanding of basics of models assessment.
  • I expect you to be able to learn PCA method on your own.
  • I expect the ability of succinct, cohesive, and coherent expression of your thoughts, i.e. clearly state (in a few sentences) what is the problem you are solving, what approaches do you propose, and what conclusions can be drawn regarding these approaches in the context of the problem.