# Challenge Description
We got employee data from a few companies. We have data about all employees who joined
from 2011/01/24 to 2015/12/13. For each employee, we also know if they are still at the
company as of 2015/12/13 or they have quit. Beside that, we have general info about the
employee, such as avg salary during her tenure, dept, and yrs of experience.

The goal here is to predict employee retention and understand its main drivers.<br>
Specifically, you should:
* Assume, for each company, that the headcount starts from zero on 2011/01/23. Estimate
employee headcount, for each company, on each day, from 2011/01/24 to 2015/12/13.
That is, if by 2012/03/02 2000 people have joined company 1 and 1000 of them have
already quit, then company headcount on 2012/03/02 for company 1 would be 1000.
**You should create a table with 3 columns: day, employee_headcount, company_id.**
* What are the main factors that drive employee churn? Do they make sense? Explain your
findings.
* If you could add to this data set just one variable that could help explain employee churn,
what would that be?

# Data

There is one data file you will need to load (located in the data folder):

### problem1_data.csv

** Columns:**
* **employee_id** : id of the employee. Unique by employee per company
* **company_id** : company id.
* **dept** : employee dept
* **seniority** : number of yrs of work experience when hired
* **salary** : avg yearly salary of the employee during her tenure within the company
* **join_date** : when the employee joined the company, it can only be between 2011/01/24
and 2015/12/13
* **quit_date**: when the employee left her job (if she is still employed as of 2015/12/13, this
field is NA)


In [1]:
import pandas as pd

In [3]:
df = pd.read_csv('data/problem1_data.csv')

In [4]:
df.head()

Unnamed: 0,employee_id,company_id,dept,seniority,salary,join_date,quit_date
0,13021.0,7,customer_service,28,89000.0,2014-03-24,2015-10-30
1,825355.0,7,marketing,20,183000.0,2013-04-29,2014-04-04
2,927315.0,4,marketing,14,101000.0,2014-10-13,
3,662910.0,7,customer_service,20,115000.0,2012-05-14,2013-06-07
4,256971.0,2,data_science,23,276000.0,2011-10-17,2014-08-22


In [8]:
df.seniority.unique()

array([28, 20, 14, 23, 21,  4,  7, 13, 17,  1, 10,  6, 19, 15, 26, 27,  5,
       18, 16, 25,  9,  2, 29,  3,  8, 22, 24, 12, 11, 98, 99])

In [12]:
df1 = df[df.seniority < 90]
df1.head()

Unnamed: 0,employee_id,company_id,dept,seniority,salary,join_date,quit_date
0,13021.0,7,customer_service,28,89000.0,2014-03-24,2015-10-30
1,825355.0,7,marketing,20,183000.0,2013-04-29,2014-04-04
2,927315.0,4,marketing,14,101000.0,2014-10-13,
3,662910.0,7,customer_service,20,115000.0,2012-05-14,2013-06-07
4,256971.0,2,data_science,23,276000.0,2011-10-17,2014-08-22


In [None]:
df2 = df[]