Predict a person's income using Sklearn tool and object oriented programming technique in Python
Assumptions:
-
Training dataset: Adult Data Set available from UCI https://archive.ics.uci.edu/ml/datasets/Adult
-
Testing dataset: adult.test.txt also available from UCI
-
Binary classification for outcome: the person either makes <=50K or >50K annually
Project is completed with 4 processes:
-
Data Preparation: Download, extract, clean and store the data >>> Result: adult.csv, cleanData.csv and cleanData.sqlite
-
Data Exploration: Visualize the distribution of each variable and also their relationships with income outcome >>> Result: All the PNG pictures
-
Data Modeling: Create predictive model using logistic regression and improve with random forest classifier
-
Prediction: Transform the test data to feed in the predictive model
Running this program is simple as 1,2,3:
-
Pull/download this repo
-
Make sure you have Python 2, Numpy, Pandas and Sklearn modules
-
Run the Main.py script (browse to the project directory and run command python Main.py)