Perform Big Data Analytics on UCI Census Income Dataset for Income Prediction.
Steps to run the code:
- For DataAnalysis, we have used the DataAnalysis.R file which uses the census.csv file as an input and contains 48842 rows.
- For Bigdata perspective, we have replicated the census.csv to twice its size ie 97684 and named it census2.csv. Decisiontreebgdata.R file uses census2.csv as an input.This is also using the Mydata.csv which is the processed and cleaned form of census2.csv.
Reference: 1.https://mathematicaforprediction.wordpress.com/2014/03/30/classification-and-association-rules-for-census-income-data/ 2.https://www.knowbigdata.com/blog/predicting-income-level-analytics-casestudy-r 3.https://archive.ics.uci.edu/ml/datasets/Census+Income 4.http://scg.sdsu.edu/dataset-adult_r/