![glassdoor](https://github.com/jasonchanhku/DataScienceDemand/blob/master/images/glassdoor.png?raw=true)

# Data Science Demand in Hong Kong
### by Jason Chan Jin An

# Introduction

This project aims to gauge the data science job demands in Hong Kong
in the past 30 days rolling based on job posts from Glassdoor. Glassdoor was 
the preferred data source because its wide array of available information:

* Job Title
* Company Name
* Link
* Company Rating
* Job Description
* Company Size
* Year Founded
* Company Type
* Industry
* Company Revenue
* CEO (sentiment)
* Recommend Percentage
* Approval Percentage

# Questions to be Answered

Exploratory Data Analysis (EDA) is performed in this notebook from a statistical and data standpoint and seeks to answer the following questions: 
* Who are hiring data scientists in Hong Kong ?
    * Big / small companies ?
    * What kind of industries ?
    * Good company feedback and approval ?
* Do company ratings differ from company types / industry / etc ?
* Given my preference of company type, which jobs suits me most ?

## Dataset

The dataset was obtained by building a python web scraper using `selenium` and the script is saved as `scraper.py` in the repository.

***

# Data Prep and Libraries

In [1]:
# Libraries used
import pandas as pd
import numpy as np

In [31]:
# Data prep
df = pd.read_csv('https://raw.githubusercontent.com/jasonchanhku/DataScienceDemand/master/data/glassdoor_data.csv')
df.head()

Unnamed: 0,Title,Company,Link,Rating,Job_Description,Size,Founded,Company_Type,Industry,Revenue,CEO,Recommend,Approve
0,"CIB QR - Risk Quantitative Research, Equity De...",J.P. Morgan,https://www.glassdoor.com/partner/jobListing.h...,3.7,J.P. Morgans Corporate & Investment Bank is a ...,10000+ employees,1799,Public (JPM),Finance,$10+ billion (USD) per year,Jamie Dimon,76.0,93.0
1,Quantitative Research - M/F VIE,Societe Generale,https://www.glassdoor.com/partner/jobListing.h...,3.4,Environment\n\nYour environment\nSG CIB is the...,10000+ employees,1864,Public (GLE),Finance,$10+ billion (USD) per year,Frederic Oudea,68.0,83.0
2,Data Analyst - Modeling,Transunion,https://www.glassdoor.com/partner/jobListing.h...,3.9,Dynamics of the Role\n\nThe incumbent\nis expe...,1001 to 5000 employees,1968,Public (TRU),Finance,$1 to $2 billion (USD) per year,Jim Peck,75.0,93.0
3,Data Scientist,Lenovo,https://www.glassdoor.com/partner/jobListing.h...,3.3,Position Description\nDesign data mining and m...,10000+ employees,1984,Public (LNVGY),Information Technology,$10+ billion (USD) per year,Yang Yuanqing,57.0,64.0
4,Quantitative Researcher,Societe Generale,https://www.glassdoor.com/partner/jobListing.h...,3.4,Environment\n\nSG CIB is the Corporate and Inv...,10000+ employees,1864,Public (GLE),Finance,$10+ billion (USD) per year,Frederic Oudea,68.0,83.0


In [32]:
# Exclude outlier RegionUP as it is a scam company
df = df[df['Company'] != 'RegionUP']

# Drop NA values
df = df.dropna()

# Drop those without Approval scores
df = df[df['Approve'] != -1]

In [46]:
(df['CEO'] == 'CEO').value_counts()

False    141
Name: CEO, dtype: int64