# Exploration and Modeling of U.S. Permanent Visa Applications, 2012-2017

## Abstract.

This project takes a close look at United States Permanent Visa Applications submitted between 2012 and 2017. An exploratory data analysis examines which employers and industries are bringing in the most immigrants, where these immigrants are moving to, and where they are coming from. It also begins to investigate which factors are correlated with visa approval. For my challenge area, I looked at several methods of topic extraction for text columns about applicants' jobs and industries, including Latent Dirichlet Allocation and two models of non-negative matrix factorization. Finally, this project aims to predict whether visa applications will be approved or denied using logistic regression and machine learning.

## Description of the Data

In order for U.S. employers to hire a foreign worker permanently, the worker must be approved for a certification, or visa, by the U.S. Department of Labor. Data on these applications is posted annually in different sets on the Department's website, which were combined into one large data set and posted on Kaggle by Jacob Boysen (jboysen) as "us-perm-visas".<br>
This data set, with nearly 375,000 observations and over 70 columns, contains information on all U.S. permanent work visa applications from 2012 to 2017, including employer, position, wage offered, job posting history, employee education, and final decision.

## Questions to Answer

- Which employers, cities, and industries are bringing in the most immigrants? What countries are they coming from?
- What factors are correlated with whether a visa request will be approved or denied?
- How can I use natural language processing to determine the essential industries and fields bringing in immigrants?
- Can I predict whether a visa request will be approved or denied?

## Notebooks

[Import and Tidy](02-Import-Tidy.ipynb)<br>
[Exploratory Data Analysis](03-Exploratory.ipynb)<br>
[Topic Modeling and Machine Learning](04-NLP-Modeling.ipynb)<br>
[Presentation](05-Presentation.ipynb)<br>

## Citations

[Boysen, Jacob. “US Permanent Visa Applications.” Kaggle, 24 Aug. 2017](http://www.kaggle.com/jboysen/us-perm-visas/) <br>
[Grisel, Olivier, et al. “Topic Extraction with Non-Negative Matrix Factorization and Latent Dirichlet Allocation.” Scikit-Learn, 2017](http://www.scikit-learn.org/stable/auto_examples/applications/plot_topics_extraction_with_nmf_lda.html#sphx-glr-auto-examples-applications-plot-topics-extraction-with-nmf-lda-py)<br>
[“OFLC Performance Disclosure Data.” Employment & Training Administration, U.S. Department of Labor, 23 Jan. 2018](www.foreignlaborcert.doleta.gov/performancedata.cfm)<br>
[“NAICS Identification Tools.” NAICS Association, 2017](https://www.naics.com/search/)<br>
[“Visa Waiver Program (WT/WB Status).” UC Berkeley International Office, UC Regents](https://internationaloffice.berkeley.edu/visa-waiver)<br>
[“Nonimmigrant Visa Classifications.” U.S. Department of State, 14 July 2017](https://fam.state.gov/FAM/09FAM/09FAM040201.html)