GitHub - mohith4397/Project_NIIT_H1B_Applications: This repository contains project work done at NIIT on H1B Applications.Technologies used are MapReduce, Hive, Pig, Sqoop, Bash

Project_NIIT_H1B_Applications

Data set link
https://drive.google.com/file/d/0B-UpSievtnlkellOUURTT0xpMmM/view?usp=
H1-B Case Study
The H1B is an employment-based, non-immigrant visa category for temporary foreign workers in the United States. For a foreign national to apply for H1B visa, an US employer must offer a job and petition for H1B visa with the US immigration department. This is the most common visa status applied for and held by international students once they complete college/ higher education (Masters, Ph.D.) and work in a full-time position. We will be performing analysis on the H1B visa applicants between the years 2011-2016. After analyzing the data, we can derive the following facts.

1 a) Is the number of petitions with Data Engineer job title increasing over time?
  b) Find top 5 job titles who are having highest growth in applications.
2 a) Which part of the US has the most Data Engineer jobs for each year?
  b) find top 5 locations in the US who have got certified visa for each year.
3)Which industry has the most number of Data Scientist positions?
4)Which top 5 employers file the most petitions each year?
5) Find the most popular top 10 job positions for H1B visa applications for each year?
6) Find the percentage and the count of each case status on total applications for each year. 
Create a graph depicting the pattern of all the cases over the period of time.
7) Create a bar graph to depict the number of applications for each year
8) Find the average Prevailing Wage for each Job for each Year (take part time and full time separate).
Arrange the output in descending order.
9) Which are  employers along with the number of petitions who have the success rate more than 70% 
in petitions and total petitions filed more than 1000?
10) Which are the  job positions along with the number of petitions which have the success rate more than 70%
in petitions and total petitions filed more than 1000?
11) Export result for question no 10 to MySql database.

SUCCESS RATE % = (Certified + Certified Withdrawn)/Total x 100
The dataset has nearly 3 million records. The dataset description is as follows: The columns in the dataset include:
CASE_STATUS:
Status associated with the last significant event or decision.
Valid values include "Certified","Certified-Withdrawn","Denied"and"Withdrawn".
Certified: Employer filed the LCA, which was approved by DOL
Certified Withdrawn: LCA was approved but later withdrawn by employer
Withdrawn: LCA was withdrawn by employer before
Denied: LCA was denied by DOL

EMPLOYER_NAME: Name of employer submitting labour condition application.
SOC_NAME: the Occupational name associated with the SOC_CODE. SOC_CODE is the occupational code associated with the job being requested for temporary labour condition, as classified by the Standard Occupational Classification (SOC) System.
JOB_TITLE: Title of the job
FULL_TIME_POSITION:
Y = Full Time Position;
N = Part Time Position
PREVAILING_WAGE: Prevailing Wage for the job being requested for temporary labour condition. The wage is listed at annual scale in USD. The prevailing wage for a job position is defined as the average wage paid to similarly employed workers in the requested occupation in the area of intended employment. The prevailing wage is based on the employer’s minimum requirements for the position.
YEAR: Year in which the H1B visa petition was filed
WORKSITE: City and State information of the foreign worker’s intended area of employment
lon: longitude of the Worksite
lat: latitude of the Worksite

In the data, few columns are enclosed by double quotes and also we have comma’s in a single column and the column is enclosed by double quotes. So we have used hive csv serve to load the data. In the quoteChar, we have given “(double quote). So this will take the column value in between the double quotes. Use all the following tools

MapReduce - any 4 programs
Hive - any 4 programs
Pig - any 4 programs

Case Status
1.certified
2.certified withdrawn
3.denied
4.withdrawn
5.pending quality and compliance review
6.invalidated
7.rejected
8.NA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hive

Hive

MapReduce

MapReduce

Pig

Pig

Sqoop

Sqoop

MyMenu.sh

MyMenu.sh

README.md

README.md

dataset.txt

dataset.txt

Repository files navigation

Project_NIIT_H1B_Applications

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Hive		Hive
MapReduce		MapReduce
Pig		Pig
Sqoop		Sqoop
MyMenu.sh		MyMenu.sh
README.md		README.md
dataset.txt		dataset.txt

mohith4397/Project_NIIT_H1B_Applications

Folders and files

Latest commit

History

Repository files navigation

Project_NIIT_H1B_Applications

About

Topics

Resources

Stars

Watchers

Forks

Languages