GitHub - jsonbao/adults-dataset-machinelearning: Scikit-learn models, parameter gridsearch, classification with feature selection, clustering with decomposition

##Abstract

In this experiment, we’ll use the adults’ dataset to predict the marital-status. We will first clean and preprocess the data to work with Scikit-Learn models. We will compare the performance of different models using validation methods and grid-search for optimal hyper-parameters. We first analyze the problem with classification models such as KNN, Decision Trees, Random Forest. Then we’ll explore clustering models such as K-means Minibatch, Mean-Shift and DBSCAN. Within the dataset features, several seem to be highly correlated. We can improve performance by using feature selection and dimensionality reduction. Lastly, since clustering is difficult to match 1 to 1 to the target label, we’ll compare the distribution of the cluster centroid stats with the true label stats.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Figures		Figures
data		data
lib		lib
predictions		predictions
README.md		README.md
plotting.py		plotting.py
project_yong.pdf		project_yong.pdf
run_classification.py		run_classification.py
run_clustering.py		run_clustering.py
scikit_model_optimization.py		scikit_model_optimization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figures

Figures

data

data

lib

lib

predictions

predictions

README.md

README.md

plotting.py

plotting.py

project_yong.pdf

project_yong.pdf

run_classification.py

run_classification.py

run_clustering.py

run_clustering.py

scikit_model_optimization.py

scikit_model_optimization.py

Repository files navigation

About

Releases

Packages

Languages

jsonbao/adults-dataset-machinelearning

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages