# H-1B Visas Article

Results are in the [H-1B Visa Datasheet](https://docs.google.com/spreadsheets/d/1kO5vxEbIkr2X15clL980FAXCyMWD5yVGwkebyssDDa0/edit?gid=0#gid=0)

## Script Setup

In [1]:
# Load packages
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)



In [2]:
# Define functions
def value_dist(series, dropna=False):
  counts = series.value_counts(dropna=dropna)
  shares = series.value_counts(dropna=dropna, normalize=True)
  result = pd.DataFrame({'count': counts, 'share': shares})
  return result

In [3]:
# Filenames and parameters
data_dir = '../../data/analysis/'
parameters_dir = '../../data/parameters/'
uscis_filename = 'uscis.csv'
lca_filename = 'lca.csv'

h1b_cap = 85000
industry_flags = ['tech', 'semiconductors', 'artificial_intelligence', 'it_services', 'professional_services', 'software',]
company_flags = industry_flags + ['outsourcing']

In [4]:
# Variables for aggregation
approval_cols = ['Initial Approval', 'Continuing Approval', 'Total Approval']
denial_cols = ['Initial Denial', 'Continuing Denial', 'Total Denial']
petition_cols = approval_cols + denial_cols + ['Total Petitions']

def create_agg(cols, agg):
  dict = {}
  for col in cols:
    dict[col] = agg
  return dict

def create_sum(cols):
  return create_agg(cols, 'sum')

approval_sum = create_sum(approval_cols)
denial_sum = create_sum(denial_cols)
petition_sum = create_sum(petition_cols)

## Load Data

### Employer Characteristics

In [5]:
# Company data from CompanyMarketCap
marketcap = pd.read_csv(
  parameters_dir + 'companiesmarketcap.csv',
  usecols = ['Company', 'Country', 'Employees'] + industry_flags
)

In [6]:
# Company data assembled manually, including outsourcing flag
employer_params = pd.read_csv(parameters_dir + 'employer_params.csv', usecols=['employer', 'outsourcing', 'employee_count', 'in_market_cap_data', 'tech'])

In [7]:
# Create a single dataset with employer characteristics
employers = (
  employer_params.set_index('employer')
    .join(
      other = marketcap.rename(columns={'Company':'employer'}).set_index('employer')
      , how = 'outer'
      , lsuffix = '_params'
      , rsuffix = '_marketcap'
      , validate = '1:1'
    )
)

# Create the final tech flag. This flags a few additional companies as tech which are not in the CompaniesMarketCap data (e.g. LinkedIn, Yahoo)
employers['tech'] = employers['tech_marketcap'].combine_first(employers['tech_params'])

# Create the final employee_count field. The _params field contains data from CompaniesMarketCap as well as employee counts gathered from other sources (e.g. Wikipedia)
employers['employee_count'] = employers['employee_count'].combine_first(employers['Employees'])

# Drop columns
employers.drop(['tech_params', 'tech_marketcap'], axis=1, inplace=True)

In [8]:
# Remove old files
del(marketcap, employer_params)

### LCA: Labor Conditions Applications

In [9]:
lca_cols = {
  'CASE_NUMBER': 'str',
  'DATAFILE_YEAR': 'int',
  'EMPLOYER_NAME': 'str',
  'H_1B_DEPENDENT': 'str',
  # 'JOB_TITLE'': ',
  # 'SOC_CODE'': ',
  # 'SOC_TITLE'': ',
  'TOTAL_WORKER_POSITIONS': 'float',
  'NEW_EMPLOYMENT': 'float',
  'CONTINUED_EMPLOYMENT': 'float',
  'CHANGE_PREVIOUS_EMPLOYMENT': 'float',
  'NEW_CONCURRENT_EMPLOYMENT': 'float',
  'CHANGE_EMPLOYER': 'float',
  'AMENDED_PETITION': 'float',
  'FULL_TIME_POSITION': 'str',
  'VISA_CLASS': 'str',
  'NAICS_CODE': 'str',
  'WAGE_ANNUAL_FROM': 'float',
  'WAGE_ANNUAL_TO': 'float',
  'PW_ANNUAL': 'float',
  'PW_WAGE_LEVEL': 'str'
}

In [10]:
# Load LCA data
lca = pd.read_csv(data_dir + lca_filename, usecols=lca_cols.keys(), dtype=lca_cols)

In [11]:
# Amend LCA data with employer characteristics
lca = (
  lca.set_index('EMPLOYER_NAME')
    .join(
      other = employers.loc[:, ['Country', 'employee_count'] + company_flags],
      how = 'left',
      validate = 'm:1'
    )
  .reset_index()
  .set_index('CASE_NUMBER')
)

### USCIS: US Citizenship and Immigration Servies

In [12]:
# Load USCIS data
uscis = pd.read_csv(data_dir + uscis_filename)

In [13]:
# Amend USCIS dataset with employer characteristics
uscis = (
  uscis.set_index('Employer')
    .join(
      other = employers.loc[:, ['Country', 'Employees'] + company_flags],
      how = 'left',
      validate = 'm:1'
    )
    .reset_index()
)

IBM is categorized as both a tech company and an outsourcing company. To simplify our analysis, we'll categorize it only as an outsourcing company. That way it doesn't appear in both tech and outsourcing categories when we compare the two groups. This is consistent with previous analyses from other sources (see reports from the [Economic Policy Institute](https://www.epi.org/publication/h-1b-visas-and-prevailing-wage-levels/) and the [New York Times](https://www.nytimes.com/interactive/2015/11/06/us/outsourcing-companies-dominate-h1b-visas.html)).

In [14]:
uscis.loc[uscis['Employer'] == 'IBM', 'tech'] = 0

In [15]:
lca.loc[lca['EMPLOYER_NAME'] == 'IBM', 'tech'] = 0

Similarly, Nvidia is categorized as both a semiconductor company and an AI company. To simplify the analysis, we'll take the stance that it's not an AI company per se but a semiconductor company that is a supplier to the AI industry.

In [16]:
uscis.loc[uscis['Employer'] == 'Nvidia', 'artificial_intelligence'] = 0

In [17]:
lca.loc[lca['EMPLOYER_NAME'] == 'Nvidia', 'artificial_intelligence'] = 0

### H-1B Dependent Companies

In [18]:
# Dataset will contain one row per company-year and a flag for whether the company declared H-1B dependency in that year
# Note that H-1B dependency status is only available for 2015 onward
h1b_dependency = (
  lca
    .loc[:, ['EMPLOYER_NAME', 'DATAFILE_YEAR', 'H_1B_DEPENDENT']]
    .drop_duplicates()
    .rename(columns={'EMPLOYER_NAME':'Employer', 'DATAFILE_YEAR': 'Fiscal Year', 'H_1B_DEPENDENT': 'dependency_vals'})
    .reset_index(drop=True)
    .groupby(['Employer', 'Fiscal Year'])
    ['dependency_vals'].apply(set)
    .to_frame()
    .reset_index()
)

def get_h1b_dependency(x):
  if 'Y' in x:
    return 1
  elif 'N' in x:
    return 0
  else:
    return pd.NA

h1b_dependency['H-1B Dependent'] = h1b_dependency['dependency_vals'].apply(get_h1b_dependency)
h1b_dependency.drop('dependency_vals', axis=1, inplace=True)

## Analysis

USCIS petition approvals and denials by year

In [19]:
temp = uscis.groupby('Fiscal Year').agg(petition_sum)
temp['Initial Denial Rate'] = temp['Initial Denial'] / (temp['Initial Approval'] + temp['Initial Denial'])
temp['Continuing Denial Rate'] = temp['Continuing Denial'] / (temp['Continuing Approval'] + temp['Continuing Denial'])
temp['Total Denial Rate'] = temp['Total Denial'] / (temp['Total Approval'] + temp['Total Denial'])
temp.to_clipboard()
temp

Unnamed: 0_level_0,Initial Approval,Continuing Approval,Total Approval,Initial Denial,Continuing Denial,Total Denial,Total Petitions,Initial Denial Rate,Continuing Denial Rate,Total Denial Rate
Fiscal Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2009,99733,127612,227345,18064,7943,26007,253352,0.153349,0.058596,0.102652
2010,83967,108877,192844,7324,5135,12459,205303,0.080227,0.045039,0.060686
2011,114790,155068,269858,8559,4508,13067,282925,0.069388,0.02825,0.046185
2012,140714,128950,269664,7388,4008,11396,281060,0.049885,0.030145,0.040547
2013,130650,164445,295095,9583,4536,14119,309214,0.068336,0.026843,0.045661
2014,130529,202073,332602,10634,6097,16731,349333,0.075331,0.029289,0.047894
2015,111807,176560,288367,7658,5490,13148,301515,0.064102,0.030157,0.043606
2016,105056,252056,357112,11297,11762,23059,380171,0.097092,0.044584,0.060654
2017,96111,277126,373237,14515,15345,29860,403097,0.131208,0.052467,0.074076
2018,87867,247041,334908,28150,33167,61317,396225,0.242637,0.118366,0.154753


USCIS petition approvals and denials by employer type: Tech and Outsourcing

In [20]:
temp = (
  uscis
    .fillna(0)
    .groupby(['tech', 'outsourcing'], dropna=False, as_index=False)
    .agg(petition_sum)
)
temp['Employer Type'] = np.select(
  [
    (temp['tech'] == 0) & (temp['outsourcing'] == 0),
    (temp['tech'] == 1) & (temp['outsourcing'] == 0),
    (temp['tech'] == 0) & (temp['outsourcing'] == 1),
    (temp['tech'] == 1) & (temp['outsourcing'] == 1)
  ], 
  ['Other', 'Tech', 'Outsourcing', 'Both']
)
temp.drop(['tech', 'outsourcing'], axis=1, inplace=True)

temp.to_clipboard(index=False)
temp

Unnamed: 0,Initial Approval,Continuing Approval,Total Approval,Initial Denial,Continuing Denial,Total Denial,Total Petitions,Employer Type
0,1221654,2178744,3400398,147406,109301,256707,3657105,Other
1,343846,704360,1048206,38765,57551,96316,1144522,Outsourcing
2,166607,377651,544258,2857,5807,8664,552922,Tech


USCIS petitions approvals and denials over time by employer type: Tech and Outsourcing

In [21]:
temp = (
  uscis
    .fillna(0)
    .groupby(['tech', 'outsourcing', 'Fiscal Year'], dropna=False, as_index=False)
    .agg(petition_sum)
)
temp['Employer Type'] = np.select(
  [
    (temp['tech'] == 0) & (temp['outsourcing'] == 0),
    (temp['tech'] == 1) & (temp['outsourcing'] == 0),
    (temp['tech'] == 0) & (temp['outsourcing'] == 1),
    (temp['tech'] == 1) & (temp['outsourcing'] == 1)
  ], 
  ['Other', 'Tech', 'Outsourcing', 'Both']
)
temp.drop(['tech', 'outsourcing'], axis=1, inplace=True)

temp.to_clipboard(index=False)
temp

Unnamed: 0,Fiscal Year,Initial Approval,Continuing Approval,Total Approval,Initial Denial,Continuing Denial,Total Denial,Total Petitions,Employer Type
0,2009,85809,106934,192743,14386,6647,21033,213776,Other
1,2010,63262,87747,151009,6612,4611,11223,162232,Other
2,2011,84006,121032,205038,6762,3770,10532,215570,Other
3,2012,83544,95805,179349,5598,3133,8731,188080,Other
4,2013,82999,118068,201067,6845,3622,10467,211534,Other
5,2014,84324,135022,219346,7873,4467,12340,231686,Other
6,2015,72473,112148,184621,5919,4014,9933,194554,Other
7,2016,72868,149988,222856,9921,7693,17614,240470,Other
8,2017,63882,154430,218312,12220,10340,22560,240872,Other
9,2018,69508,159870,229378,23526,18031,41557,270935,Other


USCIS petitions by company

In [22]:
# Which companies use the H-1B program the most (from 2009-2023)
# Limit to the top 200 employers
nyears = uscis['Fiscal Year'].drop_duplicates().shape[0]

temp = (
  uscis
    .groupby(['Employer'] + company_flags, dropna=False)
    .agg(approval_sum)
    .sort_values('Total Approval', ascending=False)
    # .head(200)
    .reset_index()
    .fillna(0)
)
# temp['Share of Cap'] = temp['Initial Approval'] / (h1b_cap * nyears)
# temp['Share of Cap Sum'] = temp['Share of Cap'].cumsum()
temp['Share of Total Approval'] = temp['Total Approval'] / temp['Total Approval'].sum()
temp['Share of Total Approval Sum'] = temp['Share of Total Approval'].cumsum()
temp['Tech or Outsourcing'] = temp.apply(lambda row: 1 if row['tech']==1 or row['outsourcing'] == 1 else 0, axis=1)
temp.head(200).to_clipboard(index=False)
temp.head(200)

Unnamed: 0,Employer,tech,semiconductors,artificial_intelligence,it_services,professional_services,software,outsourcing,Initial Approval,Continuing Approval,Total Approval,Share of Total Approval,Share of Total Approval Sum,Tech or Outsourcing
0,Cognizant Technology Solutions,0.0,0.0,0.0,1.0,1.0,0.0,1.0,52507,142416,194923,0.039040,0.039040,1
1,Infosys,0.0,0.0,0.0,1.0,1.0,0.0,1.0,46442,95614,142056,0.028452,0.067492,1
2,Tata Consultancy Services,0.0,0.0,0.0,1.0,1.0,0.0,1.0,45382,74772,120154,0.024065,0.091557,1
3,Amazon,1.0,0.0,0.0,0.0,0.0,0.0,0.0,35284,61977,97261,0.019480,0.111037,1
4,Wipro,0.0,0.0,0.0,1.0,1.0,0.0,1.0,28728,45706,74434,0.014908,0.125945,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,DFS CORPORATE SERVICES LLC,0.0,0.0,0.0,0.0,0.0,0.0,0.0,132,1669,1801,0.000361,0.400009,0
196,CVS/PHARMACY,0.0,0.0,0.0,0.0,0.0,0.0,0.0,202,1597,1799,0.000360,0.400370,0
197,BARCLAYS CAPITAL INC,0.0,0.0,0.0,0.0,0.0,0.0,0.0,409,1389,1798,0.000360,0.400730,0
198,FIS MANAGEMENT SERVICES LLC,0.0,0.0,0.0,0.0,0.0,0.0,0.0,304,1484,1788,0.000358,0.401088,0


USCIS petition approvals by company-year

In [34]:
# Limit the list to the 100 biggest users of the program
top_employers = (
  uscis
    .groupby('Employer')
    .agg(petition_sum)
    .sort_values('Total Approval', ascending=False)
    .head(100)
    .index
    .tolist()
)

In [35]:
company_year = (
  uscis
    .loc[uscis['Employer'].isin(top_employers), :]
    .groupby(['Employer', 'Fiscal Year'] + company_flags)
    .agg(approval_sum)
    .reset_index()
)
company_year['Share of Cap'] = company_year['Initial Approval'] / h1b_cap
company_year['Tech or Outsourcing'] = company_year.apply(lambda row: 1 if row['tech']==1 or row['outsourcing'] == 1 else 0, axis=1)
company_year = company_year.sort_values(['Fiscal Year', 'Total Approval'], ascending=[True, False])
company_year.to_clipboard(index=False)
company_year

Unnamed: 0,Employer,Fiscal Year,tech,semiconductors,artificial_intelligence,it_services,professional_services,software,outsourcing,Initial Approval,Continuing Approval,Total Approval,Share of Cap,Tech or Outsourcing
360,Infosys,2009,0.0,0.0,0.0,1.0,1.0,0.0,1.0,550,3302,3852,0.006471,1
786,Wipro,2009,0.0,0.0,0.0,1.0,1.0,0.0,1.0,2115,1583,3698,0.024882,1
505,Microsoft,2009,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1505,1792,3297,0.017706,1
345,IBM,2009,0.0,0.0,1.0,0.0,0.0,1.0,1.0,929,805,1734,0.010929,1
225,Cognizant Technology Solutions,2009,0.0,0.0,0.0,1.0,1.0,0.0,1.0,234,1339,1573,0.002753,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
134,Automatic Data Processing,2023,1.0,0.0,0.0,1.0,1.0,1.0,0.0,50,256,306,0.000588,1
474,Larsen & Toubro,2023,0.0,0.0,0.0,0.0,1.0,0.0,1.0,157,149,306,0.001847,1
697,Synopsys,2023,1.0,1.0,0.0,0.0,0.0,1.0,0.0,64,195,259,0.000753,1
504,Micron Technology,2023,1.0,1.0,0.0,0.0,0.0,0.0,0.0,105,138,243,0.001235,1


In [37]:
# Export company-year data
company_year.to_csv(data_dir + 'company_year.csv', index=False)

USCIS petitions approvals and denials over time by employer type: AI and Semiconductors

In [25]:
temp = (
  uscis
    .fillna(0)
    .loc[(uscis['artificial_intelligence'] == 1) | uscis['semiconductors'] == 1]
    .groupby(['artificial_intelligence', 'semiconductors', 'Fiscal Year'], dropna=False, as_index=False)
    .agg(petition_sum)
)

temp['Employer Type'] = np.select(
  [
    (temp['artificial_intelligence'] == 0) & (temp['semiconductors'] == 0),
    (temp['artificial_intelligence'] == 1) & (temp['semiconductors'] == 0),
    (temp['artificial_intelligence'] == 0) & (temp['semiconductors'] == 1),
    (temp['artificial_intelligence'] == 1) & (temp['semiconductors'] == 1)
  ], 
  ['Neither', 'AI', 'Semiconductor', 'Both']
)
temp.drop(['artificial_intelligence', 'semiconductors'], axis=1, inplace=True)

temp.to_clipboard(index=False)
temp

Unnamed: 0,Fiscal Year,Initial Approval,Continuing Approval,Total Approval,Initial Denial,Continuing Denial,Total Denial,Total Petitions,Employer Type
0,2009,2020,2006,4026,9,37,46,4072,Semiconductor
1,2010,1648,1966,3614,12,22,34,3648,Semiconductor
2,2011,2434,3288,5722,15,101,116,5838,Semiconductor
3,2012,3245,2825,6070,17,50,67,6137,Semiconductor
4,2013,2823,3771,6594,15,57,72,6666,Semiconductor
5,2014,2169,4558,6727,19,37,56,6783,Semiconductor
6,2015,1756,3535,5291,11,51,62,5353,Semiconductor
7,2016,1929,4639,6568,12,121,133,6701,Semiconductor
8,2017,2718,4887,7605,55,88,143,7748,Semiconductor
9,2018,2274,4994,7268,112,106,218,7486,Semiconductor


AI companies

In [26]:
temp = (
  uscis
    .loc[uscis['artificial_intelligence'] == 1, :]
    .groupby(['Employer', 'Fiscal Year'] + company_flags)
    .agg(approval_sum)
    .reset_index()
)
temp = temp.sort_values(['Employer', 'Fiscal Year'], ascending=[True, False])

temp = (
  temp
    .groupby('Employer')
    .agg(approval_sum)
    .loc[:, ['Total Approval']]
    .rank(ascending=False)
    .rename(columns={'Total Approval': 'Rank'})
    .join(temp.set_index('Employer'))
    .sort_values(['Rank', 'Fiscal Year'])
)

temp.to_clipboard()
temp

Unnamed: 0_level_0,Rank,Fiscal Year,tech,semiconductors,artificial_intelligence,it_services,professional_services,software,outsourcing,Initial Approval,Continuing Approval,Total Approval
Employer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Microsoft,1.0,2009,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1505,1792,3297
Microsoft,1.0,2010,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1942,1732,3674
Microsoft,1.0,2011,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1409,2414,3823
Microsoft,1.0,2012,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1513,1924,3437
Microsoft,1.0,2013,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1057,2840,3897
...,...,...,...,...,...,...,...,...,...,...,...,...
reAlpha Tech,30.0,2021,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1,0,1
reAlpha Tech,30.0,2022,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1,0,1
Exscientia,30.0,2023,1.0,0.0,1.0,0.0,0.0,1.0,0.0,2,0,2
Presto Automation,30.0,2023,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1,1,2


Software companies

In [27]:
temp = (
  uscis
    .fillna(0)
    .loc[(uscis['artificial_intelligence'] == 1) | uscis['semiconductors'] == 1]
    .groupby(['software','Fiscal Year'], dropna=False, as_index=False)
    .agg(petition_sum)
)

temp.to_clipboard(index=False)
temp

Unnamed: 0,software,Fiscal Year,Initial Approval,Continuing Approval,Total Approval,Initial Denial,Continuing Denial,Total Denial,Total Petitions
0,0.0,2009,2042,1962,4004,10,37,47,4051
1,0.0,2010,1711,1922,3633,14,21,35,3668
2,0.0,2011,2582,3282,5864,13,102,115,5979
3,0.0,2012,3570,2883,6453,21,52,73,6526
4,0.0,2013,3150,3947,7097,20,55,75,7172
5,0.0,2014,2483,4882,7365,21,39,60,7425
6,0.0,2015,2195,4104,6299,13,52,65,6364
7,0.0,2016,2459,5336,7795,13,123,136,7931
8,0.0,2017,3546,5788,9334,60,94,154,9488
9,0.0,2018,3033,6893,9926,160,129,289,10215


USCIS petition denial rates by year

In [28]:
# Calculating denial rates
def calc_denials(df):
  d = df.groupby('Fiscal Year').agg(petition_sum)
  d['Denial Rate'] = d['Total Denial'] / d['Total Petitions']
  d['Initial Denial Rate'] = d['Initial Denial'] / (d['Initial Approval'] + d['Initial Denial'])
  d['Continuing Denial Rate'] = d['Continuing Denial'] / (d['Continuing Approval'] + d['Continuing Denial'])
  return d

In [29]:
# Denial rate by year
denials = calc_denials(uscis)
denials_outsourcing = calc_denials(uscis[uscis['outsourcing'] == 1])
denials_tech = calc_denials(uscis[uscis['tech'] == 1])
denials_other = calc_denials(uscis[(uscis['tech'] == 0) & (uscis['outsourcing'] == 0)])

denials = (
  denials
    .join(denials_outsourcing, rsuffix=' - Outsourcing')
    .join(denials_tech, rsuffix=' - Tech')
    .join(denials_other, rsuffix=' - Other')
    .sort_index()
)
# denials[['Denial Rate', 'Denial Rate - Outsourcing', 'Denial Rate - Tech']]

denials.to_clipboard()
denials

Unnamed: 0_level_0,Initial Approval,Continuing Approval,Total Approval,Initial Denial,Continuing Denial,Total Denial,Total Petitions,Denial Rate,Initial Denial Rate,Continuing Denial Rate,Initial Approval - Outsourcing,Continuing Approval - Outsourcing,Total Approval - Outsourcing,Initial Denial - Outsourcing,Continuing Denial - Outsourcing,Total Denial - Outsourcing,Total Petitions - Outsourcing,Denial Rate - Outsourcing,Initial Denial Rate - Outsourcing,Continuing Denial Rate - Outsourcing,Initial Approval - Tech,Continuing Approval - Tech,Total Approval - Tech,Initial Denial - Tech,Continuing Denial - Tech,Total Denial - Tech,Total Petitions - Tech,Denial Rate - Tech,Initial Denial Rate - Tech,Continuing Denial Rate - Tech,Initial Approval - Other,Continuing Approval - Other,Total Approval - Other,Initial Denial - Other,Continuing Denial - Other,Total Denial - Other,Total Petitions - Other,Denial Rate - Other,Initial Denial Rate - Other,Continuing Denial Rate - Other
Fiscal Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1
2009,99733,127612,227345,18064,7943,26007,253352,0.102652,0.153349,0.058596,8012,13169,21181,3639,1192,4831,26012,0.185722,0.312334,0.083003,5912,7509,13421,39,104,143,13564,0.010543,0.006554,0.013661,995,2123,3118,6,14,20,3138,0.006373,0.005994,0.006551
2010,83967,108877,192844,7324,5135,12459,205303,0.060686,0.080227,0.045039,14833,13263,28096,675,473,1148,29244,0.039256,0.043526,0.034435,5872,7867,13739,37,51,88,13827,0.006364,0.006262,0.006441,928,2161,3089,8,13,21,3110,0.006752,0.008547,0.00598
2011,114790,155068,269858,8559,4508,13067,282925,0.046185,0.069388,0.02825,23565,22058,45623,1762,542,2304,47927,0.048073,0.06957,0.023982,7219,11978,19197,35,196,231,19428,0.01189,0.004825,0.0161,1517,3374,4891,7,36,43,4934,0.008715,0.004593,0.010557
2012,140714,128950,269664,7388,4008,11396,281060,0.040547,0.049885,0.030145,47895,22667,70562,1739,724,2463,73025,0.033728,0.035036,0.030952,9275,10478,19753,51,151,202,19955,0.010123,0.005469,0.014206,1465,2709,4174,8,26,34,4208,0.00808,0.005431,0.009506
2013,130650,164445,295095,9583,4536,14119,309214,0.045661,0.068336,0.026843,39215,31821,71036,2695,770,3465,74501,0.046509,0.064304,0.023626,8436,14556,22992,43,144,187,23179,0.008068,0.005071,0.009796,1253,3556,4809,9,18,27,4836,0.005583,0.007132,0.005036
2014,130529,202073,332602,10634,6097,16731,349333,0.047894,0.075331,0.029289,38278,50691,88969,2680,1489,4169,93138,0.044762,0.065433,0.028536,7927,16360,24287,81,141,222,24509,0.009058,0.010115,0.008545,1310,4832,6142,18,39,57,6199,0.009195,0.013554,0.008007
2015,111807,176560,288367,7658,5490,13148,301515,0.043606,0.064102,0.030157,31246,49269,80515,1672,1322,2994,83509,0.035852,0.050793,0.026131,8088,15143,23231,67,154,221,23452,0.009424,0.008216,0.010067,1414,4652,6066,20,31,51,6117,0.008337,0.013947,0.00662
2016,105056,252056,357112,11297,11762,23059,380171,0.060654,0.097092,0.044584,23062,81755,104817,1299,3688,4987,109804,0.045417,0.053323,0.043163,9126,20313,29439,77,381,458,29897,0.015319,0.008367,0.018411,1468,5396,6864,16,88,104,6968,0.014925,0.010782,0.016047
2017,96111,277126,373237,14515,15345,29860,403097,0.074076,0.131208,0.052467,19675,100908,120583,2082,4633,6715,127298,0.05275,0.095693,0.043898,12554,21788,34342,213,372,585,34927,0.016749,0.016684,0.016787,1475,5839,7314,28,67,95,7409,0.012822,0.018629,0.011344
2018,87867,247041,334908,28150,33167,61317,396225,0.154753,0.242637,0.118366,7592,59570,67162,4254,14550,18804,85966,0.218738,0.359109,0.196303,10767,27601,38368,370,586,956,39324,0.024311,0.033223,0.02079,2262,7815,10077,180,231,411,10488,0.039188,0.07371,0.02871


In [33]:
# Export denial rates
denials.to_csv(data_dir + 'denials.csv')

H-1B dependent companies

Number of H-1B dependent organizations each year

In [33]:
temp = h1b_dependency.groupby(['Fiscal Year', 'H-1B Dependent'], as_index=False).agg({'Employer': 'count'})
temp['Employers'] = temp.groupby('Fiscal Year')['Employer'].transform('sum')
temp = temp.loc[temp['H-1B Dependent'] == 1, :]
temp['Share H-1B Dependent'] = temp['Employer'] / temp['Employers']
temp.rename(columns={'Employer': 'H-1B Dependent Employers', 'Employers': 'All Employers'}, inplace=True)
temp.drop('H-1B Dependent', axis=1, inplace=True)
temp.to_clipboard(index=False)
temp

Unnamed: 0,Fiscal Year,H-1B Dependent Employers,All Employers,Share H-1B Dependent
1,2015,4415,62682,0.070435
3,2016,4584,63026,0.072732
5,2017,4504,62251,0.072352
7,2018,4550,62086,0.073285
9,2019,4702,62084,0.075736
11,2020,6389,60456,0.10568
13,2021,5372,55444,0.096891
15,2022,5635,59908,0.094061
17,2023,5569,57431,0.096969


H-1B dependency status by company and company type

In [34]:
# Join H-1B dependency status with USCIS data
h1b_dependency['employer_join'] = h1b_dependency['Employer'].str.lower().str.replace('[^a-z0-9]', '', regex=True)
# h1b_join.drop('Employer', axis=1, inplace=True)

uscis_h1b_dependency = uscis.groupby(['Employer', 'Fiscal Year', 'tech', 'outsourcing'], as_index=False).agg(petition_sum)
uscis_h1b_dependency['employer_join'] = uscis_h1b_dependency['Employer'].str.lower().str.replace('[^a-z0-9]', '', regex=True)

uscis_h1b_dependency = (
  uscis_h1b_dependency
  .set_index(['employer_join', 'Fiscal Year'])
  .join(h1b_dependency.set_index(['employer_join', 'Fiscal Year']), how='left', rsuffix='_RIGHT')
  .reset_index()
  .drop(['employer_join', 'Employer_RIGHT'], axis=1)
)

# Filter to outsourcing and tech companies
temp = (
  uscis_h1b_dependency
    .loc[
      (
        (uscis_h1b_dependency['tech'] == 1) | 
        (uscis_h1b_dependency['outsourcing'] == 1)
      ) & 
      (uscis_h1b_dependency['Fiscal Year'] >= 2015),
    :]
)

temp.to_clipboard(index=False)
temp

Unnamed: 0,Fiscal Year,Employer,tech,outsourcing,Initial Approval,Continuing Approval,Total Approval,Initial Denial,Continuing Denial,Total Denial,Total Petitions,H-1B Dependent
17,2015,Accenture,0.0,1.0,3442,2464,5906,158,26,184,6090,0
18,2016,Accenture,0.0,1.0,1854,5076,6930,41,56,97,7027,0
19,2017,Accenture,0.0,1.0,954,4175,5129,110,80,190,5319,0
20,2018,Accenture,0.0,1.0,366,2694,3060,162,451,613,3673,0
21,2019,Accenture,0.0,1.0,656,2465,3121,482,441,923,4044,0
...,...,...,...,...,...,...,...,...,...,...,...,...
2218,2015,Yahoo,1.0,0.0,262,226,488,1,3,4,492,0
2219,2016,Yahoo,1.0,0.0,212,172,384,2,0,2,386,0
2220,2017,Yahoo,1.0,0.0,319,280,599,1,20,21,620,0
2221,2018,Yahoo,1.0,0.0,4,22,26,6,3,9,35,0


Wages

In [38]:
# Prevailing wage levels for outsourcing vs tech
# temp = lca.loc[(lca['outsourcing'] == 1) | (lca['tech'] == 1), :]
temp = lca.copy()
# temp.loc[:, ['group']] = pd.NA
# temp.loc[temp['outsourcing'] == 1, 'group'] = 'Outsourcing'
# temp.loc[temp['tech'] == 1, 'group'] = 'Tech'

temp['group'] = np.select(
  condlist = [
    (temp['tech'] == 0) & (temp['outsourcing'] == 0),
    (temp['tech'] == 1) & (temp['outsourcing'] == 0),
    (temp['tech'] == 0) & (temp['outsourcing'] == 1),
    (temp['tech'] == 1) & (temp['outsourcing'] == 1)
  ],
  choicelist = ['Other', 'Tech', 'Outsourcing', 'Both'],
  default = 'Other'
)

temp = (
  temp
    .groupby(['group', 'DATAFILE_YEAR', 'PW_WAGE_LEVEL'])
    .agg({'TOTAL_WORKER_POSITIONS': 'sum', 'PW_ANNUAL': 'median', 'WAGE_ANNUAL_FROM': 'median'})
    .reset_index()
)
temp['group_workers'] = temp['TOTAL_WORKER_POSITIONS'] / temp.groupby(['group', 'DATAFILE_YEAR'])['TOTAL_WORKER_POSITIONS'].transform('sum')

temp.to_clipboard(index=False)

# Export dataset
temp.to_csv(data_dir + 'wages.csv', index=False)

temp

Unnamed: 0,group,DATAFILE_YEAR,PW_WAGE_LEVEL,TOTAL_WORKER_POSITIONS,PW_ANNUAL,WAGE_ANNUAL_FROM,group_workers
0,Other,2015,Level I,360048.0,56930.0,62000.0,0.517681
1,Other,2015,Level II,224633.0,72509.0,80000.0,0.322980
2,Other,2015,Level III,70988.0,90542.0,100000.0,0.102067
3,Other,2015,Level IV,39833.0,103813.0,119205.0,0.057272
4,Other,2017,Level I,275249.0,58906.0,65000.0,0.513360
...,...,...,...,...,...,...,...
91,Tech,2022,Level IV,52812.0,151882.0,184080.0,0.168773
92,Tech,2023,Level I,55296.0,87693.0,120200.0,0.169375
93,Tech,2023,Level II,132602.0,114026.0,143566.0,0.406169
94,Tech,2023,Level III,85016.0,133515.0,163000.0,0.260410


Number of employers  
We'll find the number of employers that petitioned for and received at least one H-1B visa by counting the number of unique employer names in each year. We'll use the original employer name as recorded in the original USCIS data, not the standardized name created during the ingestion process.  
This is an estimate, since the same employer often gets recorded with slight differences in their name. However, we're most interested in the change over time as opposed to the raw number.  
We'll calculate it two ways: (1) counting unique employer names, (2) counting unique employer names and tax IDs

In [37]:
temp = uscis.groupby('Fiscal Year').agg(Employers = ('Employer (original name)', 'nunique'))

temp.to_clipboard()
temp

Unnamed: 0_level_0,Employers
Fiscal Year,Unnamed: 1_level_1
2009,60715
2010,50249
2011,57602
2012,50749
2013,51715
2014,51417
2015,43790
2016,47780
2017,44450
2018,49398


Repeat using unique employer names and tax IDs

In [38]:
temp = uscis.loc[:, ['Fiscal Year', 'Employer (original name)', 'Tax ID']].drop_duplicates().groupby('Fiscal Year').size()

temp.to_clipboard()
temp

Fiscal Year
2009    62540
2010    51083
2011    58579
2012    51553
2013    52539
2014    52297
2015    44689
2016    48832
2017    45576
2018    50714
2019    54246
2020    50875
2021    49142
2022    51787
2023    49384
dtype: int64