https://github.com/arapfaik/scraping-glassdoor-selenium https://shandou.medium.com/export-and-create-conda-environment-with-yml-5de619fe5a2
Goal: Create a tool that estimates data engineer salaries to aid data engineers negotiate my income when I get a job offer.
Workflow:
Data Collection: Scrape over 1000 job descriptions from glassdoor using python and selenium. Completed
Data Cleaning: Engineer features from the text of each job description to quantify the value companies put on python, excel, aws, and spark. Completed
Exploratory Data Analysis: Use Jupyter notebook and graphing libraries such as matplotlib and seaborn in order to discover main characteristics of the data. Check it out here EDA Notebook Completed
Model Selection: Compare and Evalaute ML Models and choose the model with best performance.Completed
Productionize: Build a client facing API using flask. Completed
Cloud Deployment: Migrate this project to google cloud platform.
Scheduling: Automate the whole process using cron or airflow and deploy this project on a cloud platform (AWS or GCP).
Data Visualization: Loading of Data into BigQuery in order to create a dashboard using Google Data Studio