Skip to content

Scikit-learn models, parameter gridsearch, classification with feature selection, clustering with decomposition

Notifications You must be signed in to change notification settings

jsonbao/adults-dataset-machinelearning

 
 

Repository files navigation

##Abstract

In this experiment, we’ll use the adults’ dataset to predict the marital-status. We will first clean and preprocess the data to work with Scikit-Learn models. We will compare the performance of different models using validation methods and grid-search for optimal hyper-parameters. We first analyze the problem with classification models such as KNN, Decision Trees, Random Forest. Then we’ll explore clustering models such as K-means Minibatch, Mean-Shift and DBSCAN. Within the dataset features, several seem to be highly correlated. We can improve performance by using feature selection and dimensionality reduction. Lastly, since clustering is difficult to match 1 to 1 to the target label, we’ll compare the distribution of the cluster centroid stats with the true label stats.

About

Scikit-learn models, parameter gridsearch, classification with feature selection, clustering with decomposition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%