# Labor Condition Applications (LCAs)
## Import pre-2020 raw data
**Source:** [Department of Labor > Foreign Labor Certification > Performance Data](https://www.dol.gov/agencies/eta/foreign-labor/performance)  
The first step in the H-1B visa program is for employers to submite a Labor Condition Application (LCA) providing information on the type of visa they are seeking, the occupation they are hiring for, the number of workers, the intended pay range, etc.  

According to the DOL: "Beginning in Fiscal Year 2020, the Program Record Layouts associated with each program disclosure file are substantially different from prior fiscal years and include a number of additional data fields extracted based on new visa application forms, appendices, and addenda implemented by OFLC through the new [Foreign Labor Application Gateway (FLAG) System](https://flag.dol.gov/?_ga=2.116320975.836097246.1711548631-607582160.1711058977)."

In this script, we'll import the files for pre-2020 files. Files for 2020 onward are handled in another script.

In [1]:
# Import packages
import pandas as pd

In [2]:
# Set up parameters
data_dir = '../../data/'
output_dir = data_dir + 'raw/'
output_filename_base = 'lca_raw_'
parameters_dir = data_dir + 'parameters/'
column_names_filename = 'lca_pre_2020_column_names.csv'

base_url = 'https://www.dol.gov/sites/dolgov/files/ETA/oflc/pdfs/'

url_suffixes = {
  2019: 'H-1B_Disclosure_Data_FY2019', 
  2018: 'H-1B_Disclosure_Data_FY2018_EOY', 
  2017: 'H-1B_Disclosure_Data_FY17', 
  2016: 'H-1B_Disclosure_Data_FY16', 
  2015: 'H-1B_Disclosure_Data_FY15_Q4',
  2014: 'H-1B_FY14_Q4',
  2013: 'LCA_FY2013',
  2012: 'LCA_FY2012_Q4',
  2011: 'H-1B_iCert_LCA_FY2011_Q4',
  2010: 'H-1B_FY2010'
}

In [3]:
# Import dataframe that standardizes column names across years
pre_2020_column_names = pd.read_csv(parameters_dir + column_names_filename)

In [None]:
# Import datasets pre-2020
for year, suff in url_suffixes.items():
  url = base_url + suff + '.xlsx'
  print('Importing ' + str(year) + ' data: ' + url)
  fiscal_year = 'FY' + str(year)
  stz_name_col = 'Standard'
  
  df = pd.read_excel(url, usecols = pre_2020_column_names[fiscal_year].dropna().tolist(), dtype=str)
  df.rename(
    columns = pre_2020_column_names[[stz_name_col, fiscal_year]].dropna().set_index(fiscal_year)[stz_name_col].to_dict(),
    inplace = True
    )
  df['DATAFILE_YEAR'] = year
  # df['DATAFILE_QUARTER'] = ''

  # Save data locally
  print('Saving ' + str(year) + ' data')
  df.to_csv(output_dir + output_filename_base + str(year) + '.csv', index=False)