In this Pythomn script, I process the admission to Teaching vs. Research Universities data. The dataset contains missing values for some of the variables. I will preprocess the data, remove the outliers, and replace the missing values.
Additionaly, I performed the consolidation and bin a categorical data.
Then, I will pick a binomial variable that presents university type to conduct some supervised learning analysis.
In the end, I will generate the Confusion Matrix, calculate the accuracy rate, precision rate, recall rate, error rate, and F1 score to examine how well my model fitted. I will also plot the ROC and calculated the AUC to visualize the results.
More details are decribed for each step.