A python script that load 2019 h1b petition data into local sqlite database
- python 3
- python module sqlalchemy
- download the raw data in format of xlsx from Department of Labor
- use any online xlsx -> csv converting website(I use zamar) to convert the file
- create a sqlite database
sqlite3 h1b_data.db
- run the python script
# python establish2019H1BDatabas.py <csv file name> <db name>
python establish2019H1BDatabas.py H-1B_Disclosure_Data_FY2019.csv h1b_data.db
- run the salary analytics(medium/quartile) script
# python calculateEmployerSalaryStatisitcs.py <number of employers to be calculated> <db name>
python calculateEmployerSalaryStatisitcs.py 2000 h1b_data.db
SELECT count(*), EMPLOYER_NAME FROM h1bdata_2019
WHERE PREVAILING_WAGE > 123000
AND JOB_TITLE LIKE "%Engineer%"
GROUP BY EMPLOYER_NAME ORDER BY COUNT(*) DESC LIMIT 100;
SELECT PREVAILING_WAGE, EMPLOYER_NAME, JOB_TITLE, WORKSITE_CITY, WORKSITE_STATE
FROM h1bdata_2019
WHERE EMPLOYER_NAME LIKE "%Google%"
AND WORKSITE_CITY = 'Cambridge' LIMIT 100;
SELECT * FROM employer_salary_stats ORDER BY QUARTILE_PAY DESC LIMIT 100;