The goal of this project was to use supervised learning classification techniques to predict the outcome of a Permanent Residency application in the US and analyze the potential influencers for this outcome. Data was obtained from the Dept. of Labor regarding individual applications and economic data regarding the applicant's country of citizenship and year of application was brought into analyze changes in the process over time. The analysis was conducted using Python and classification techniques such as Logistic Regression, Decision Trees, Random Forest, Naive Bayes and XG Boost. In tuning the models I optimized accuracy - to correctly predict the outcome - and precision - to reduce the worse case scenario likelihood of an application predicted to be accepted and then rejected.
The outcome of this project showed that personal applicant features such as country of citizenship, state of residency, visa used to enter the US on didn't no influence the outcome. However, it was difficult to suggest improvements to the applications themselves as the most influential features were not easily altered.
Tools:
- Python (Pandas, Matplotlib, Sklearn)
- Classification Models (Logistic Regression, Decision Trees, Random Forest, XGBoost, Naive Bayes) .
Repository Includes:
- Jupyter Notebook containing full project process
- Modules containing functions for process
- Final presentation PDF
The blog post for this project can be found here