General

In this work, we address the task of predicting if an individual is earning 50K or more a year. It is a binary classification problem. This work is a part of an in-class Kaggle Competition for Knowledge Discovery and Data Mining Course offered at NUS, Singapore in Semester 2 of AY 2019-2020.

Our final binary F1-score is 86,946% on the private leaderboard and we were ranked 10th / 55.

Set up

To clone this github repository, use the following command:

$ git clone https://github.com/hanaecarrie/CS5228_kaggle_income50K_classification.git

Make sure you have Python 3.6 or above installed, as well as the following packages:

numpy
csv
os
pandas
matplotlib
seaborn
sklearn
pickle
scipy
lightgbm
catboost
xgboost
IPython
subprocess
keras
collections
math
time
glob
re

Data description, exploration and visualisation

The Kaggle dataset consists of a separate training and test dataset, both consisting of 24,421 records each. The training dataset suffers from class imbalance, with 75.15% of the samples being from the negative class (≤ $50K, label=0) and the remaining 24.85% being positive samples (> $50K, label=1). The dataset consists of 13 attributes described below:

Here are some plots from the report to summarise the dataset features and look at their relationships.

Results and report

The full report is available here

The description of the different preprocessed data:

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Data Exploration		Data Exploration
Feature Engineering		Feature Engineering
LightGBM		LightGBM
Multi Layer Perceptron		Multi Layer Perceptron
Random Forest		Random Forest
Stacked Ensemble Model		Stacked Ensemble Model
cs5228		cs5228
dump		dump
figures		figures
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

General

Set up

Data description, exploration and visualisation

Results and report

About

Releases

Packages

Contributors 2

Languages

hanaecarrie/CS5228_kaggle_income50K_classification

Folders and files

Latest commit

History

Repository files navigation

General

Set up

Data description, exploration and visualisation

Results and report

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages