# Capstone Project: Job Post and Profitability 

Phai Phongthiengtham
***

What drives a company’s success? Undeniably, one important factor is the people and teamwork. This project aims to understand companies' profitability and how they post vacancies. The main datasets are from career builder job postings and financial data from compustat.

## Data

Main datasets in this project are:
1. Job postings from career builder, provided by Economic Modeling Specialists International (EMSI)Economic Modeling Specialists International (EMSI). 
2. Financial data on publicly traded companies: COMPUSTAT (North America) database.

## Data wrangling

* Job postings: this ipython notebook [here](https://github.com/phaiptt125/online_job_posting/blob/master/data_cleaning/initial_cleaning.ipynb) explains in detail how I clean the job postings data.
* Merge with compustat:  this ipython notebook [here](https://github.com/phaiptt125/online_job_posting/blob/master/data_cleaning/merge_compustat.ipynb) explains in detail how I merge job postings data to compustat.

## Measuring Profitability

I use "Return on Equity - ROE", which is the amount of net income returned as a percantage of shareholders equity (excluding preferred stock). Return on equity measures a corporation's profitability by revealing how much profit a company generates with the money shareholders have invested. 

$$ \text{Return on Equity} = \frac{\text{Net Income}}{\text{Shareholder's Equity}}$$

In [7]:
import pandas as pd

df = pd.read_csv('data_10K_no_description.txt',sep='\t',header=0)
select_var = ['conml','roe','onet','naics','state','original_jobtitle', 'high_school','associate','bachelor','master','phd']
df[select_var].head(10)

Unnamed: 0,conml,roe,onet,naics,state,original_jobtitle,high_school,associate,bachelor,master,phd
0,Del Taco Restaurants Inc,0.055423,11-9051.00,722513,CA,California,1,0,0,0,0
1,Medtronic PLC,0.080089,11-3021.00,334510,MN,I&O Business Development & Technology Sr IT Pr...,0,0,0,0,0
2,Macy's Inc,0.143188,11-3121.00,452111,AL,"Macy's Brookwood Village, Birmingham, AL: Huma...",0,0,0,0,0
3,TE Connectivity Ltd,0.236771,11-9141.00,334417,NJ,Property Manager,0,0,1,0,0
4,CA Inc,0.136228,11-2021.00,511210,CA,"VP, Regional Field Marketing",0,0,1,1,0
5,Humana Inc.,0.057464,11-9111.00,621491,WI,Medical Director-HumanaOne and Small Busi,0,0,0,0,0
6,Darden Restaurants Inc.,0.227958,11-9051.00,722511,GA,Restaurant Manager,0,0,0,0,0
7,HSBC Holdings PLC,0.015663,11-3031.02,522110,NY,Senior Sales Manager Asset Based Lending,0,0,0,0,0
8,Colliers International Group Inc,0.33036,11-9021.00,531390,GA,Transaction/Construction Coordinator,0,0,1,0,0
9,DISH Network Corp,0.312662,11-3121.00,515210,CA,Human Resources Manager - Pacific Region,0,0,1,0,0


## Variable description

* *"conml"* : The official company name as reported on its EDGAR SEC filings.
* *"roe"* : Return on equity (percent of net income per shareholder's equity).
* *"onet"* : Occupation code according to U.S. Department of Labor, see [here](https://www.onetonline.org/) for more information.
* *"naics"* : North American Industry Classification System, see [here](https://www.census.gov/eos/www/naics/) for more information.
* *"state"* : State in which the company is located (50 states + DC).
* *"original_jobtitle"* : Original job title as appeared in the careerbuilder.
* *high_school* : Whether a post specifically requires a high school degree (=1 if yes). 
* *associate* : Whether a post specifically requires an associate degree (=1 if yes). 
* *bachelor* : Whether a post specifically requires a bachelor degree (=1 if yes). 
* *master* : Whether a post specifically requires a master degree (=1 if yes). 
* *phd* : Whether a post specifically requires a phd degree (=1 if yes). 

## Data story

## Recommending Education Requirement when Posting Vacancies