# Final Project: Build Leadership Pipeline

Part 2: Project Design Writeup

## Problem Statement

To spot leaders early in their careers by predicting if a people manager will be a good leader


#### Hypothesis

There are factors that indicate whether or not a people manager will be successful in managing a team

#### Specific Aim

To classify people managers into "good" and "bad" categories.

Intuition: Promo Frequency and Talent Mapping category will have the most impact in predicting the leadership quality of people managers 

## Research Design Outline

#### Classification Problem with binary outcome

The research plan is to explore couple different models:
    1. Logistic Regression
    2. Random Forests

Logistic Regression is the preferred model because of it's interpretability. However, random decision forests can help build a sustainable model particularly when newer features need to be factored in.

#### Data Wrangling or Pre-processing:
    1. Normalization: Variables need to be standardized to bring to the same scale
    2. Imputation: Missing values need to be filled in with mean, median, etc depending on the variable
    3. Feature Engineering: New features (and dummies for categorical) need to be created wherever necessary
    4. Regularization: L1 and L2 methods need to be explored to identify important features

## Data

#### Source: HR Analytics Warehouse

1. Demographics (Age, Tenure, Pay Grade)
2. Diversity (Gender, Ethnicity)
3. Span of Control (# of direct reports, org size)
4. Promotions (Last promo date, # of promos and movements)
5. Performance (prior year ratings)
6. Compensation (salary and compa-ratios)
7. Upward Feedback scores
8. Talent Mapping / Succession Profiles
9. Outcome: Management & Leadership Rating

#### Data dictionary: 

Variable | Description | Type of Variable
---| ---| ---
Worker ID | integer values | continuous
Age Band | 20-30,30-40,..,60+ | categorical
Gender | 0=Female, 1=Male | categorical
Ethnicity | Asian, White, etc | categorical
Management Level | Manager, Director, VP, etc | categorical
Tenure Band | 0-3,3-5,5-9,9-15,15+ | categorical
Total Promotions | (0,10) | continuous
Promo Frequency | (0,30) | continuous
Time since last promo | (0,30) | continuous
Pay Sector | A-,A,B,C,C+ | categorical
Compa-Ratio to Market | (0,200) | continuous
Direct Reports | integers (0,20) | continuous
Org Size | integers (0,10000) | continuous
Function | G&A,R&D,S&M | categorical
Region | NA,EMEA,APAC | categorical
Mid-Year Status | 0=off-track, 1=on-track | categorical
Prior Year Perf | 1=Exceptional,...,5=Below Expecations | categorical
Succesion Profiles | (0,100) | continuous
Readiness Flag | 0=No, 1=Yes | categorical
Talent Mapping | 9-box ratings A1-C3 | categorical
Upward Feedback Score | (0,5) | continuous
Calendar Year | 2015 to 2017 | continuous
M&L Rating | 0=Bad, 1=Good | categorical

## Risks & Assumptions

Risks:
1. New people managers will have built no history in the company so we will miss out on categorizing a decent bunch
2. Ratings are awarded annually so factors impacting the outcome could vary by year reducing the power of predictability

Assumptions:
1. Since M&L Ratings are awarded through a calibration process, top-level management view is factored into the final decision
2. For tenured managers, it would be interesting to search for other signals i.e. see how their M&L ratings change over time (as factors fluctuate) but for keeping the model simple, time dimension aspect will not be explored in this project

## Domain Knowledge

As an immediate step, it would be interesting to test the model against upcoming 2017 M&L ratings.

If the model accuracy is high enough, then the next step would be to create similar model for individual contributors who could potentially be given opportunities to manage teams based on merit and their close resemblance to "good" leaders in the career trajectory, thereby building the leadership pipeline.

## Success Criteria

The primary intent of this project is to identify and observe "good" leadership skills so it is important to predict "good" managers with high precision. Thus, minimizing false positives becomes critical in this model.

Even though precision scoring better minimizes false positivies, this classifier model will be built to optimize for maximum AUC, and then alternate scoring methods will be explored.