##Abstract
In this experiment, we’ll use the adults’ dataset to predict the marital-status. We will first clean and preprocess the data to work with Scikit-Learn models. We will compare the performance of different models using validation methods and grid-search for optimal hyper-parameters. We first analyze the problem with classification models such as KNN, Decision Trees, Random Forest. Then we’ll explore clustering models such as K-means Minibatch, Mean-Shift and DBSCAN. Within the dataset features, several seem to be highly correlated. We can improve performance by using feature selection and dimensionality reduction. Lastly, since clustering is difficult to match 1 to 1 to the target label, we’ll compare the distribution of the cluster centroid stats with the true label stats.