# Introduction 

This notebook will generate a common employment data with following columns :
1. ID : Employe ID
2. DOB : Date of Birth
3. DOH : Date of Hire
4. Salary

The aim of this notebook is that Consulting Actuary Company or any other company that work with employee data can have an open employee data that they can generate on their own for training or education purposes.

# Data Generator

To minimize the error of data, the range of date of the data will be between 1 to 28 (instead of 1 to 31 like normal calendar). 

In [25]:
# Packages
import pandas as pd # DataFrame
import numpy as np # Vector or Matrices
import datetime # Date time

In [26]:
emp_counts = 500 # Length of Data

# Creating DataFrame
data = pd.DataFrame(data = {"id" : np.random.randint(0, emp_counts*10, emp_counts), 
                            'dob' : [str(np.random.randint(1970, 2001)) + '-' + str(np.random.randint(1,12)) + '-' + str(np.random.randint(1,29)) for i in range (emp_counts)],
                            'doh' : [str(np.random.randint(2008, 2022)) + '-' + str(np.random.randint(1,12)) + '-' + str(np.random.randint(1,29)) for i in range (emp_counts)],
                            'salary' : np.round(np.random.randn(emp_counts)**2*1e4,2)})
data['dob'] = pd.to_datetime(data['dob'])
data['doh'] = pd.to_datetime(data['doh'])
data

Unnamed: 0,id,dob,doh,salary
0,2840,1976-04-28,2012-02-07,2418.24
1,771,1988-10-03,2011-10-14,1003.54
2,827,1997-06-13,2014-07-24,914.43
3,4655,1978-05-23,2014-08-23,7.50
4,3888,1972-04-05,2019-08-01,10586.79
...,...,...,...,...
495,1217,1988-07-17,2013-06-11,10879.69
496,2614,1996-09-09,2020-10-05,70.37
497,4077,1985-02-04,2013-09-03,18220.42
498,2967,1990-02-25,2017-05-27,1.28


In [27]:
# To save in Excel Format
data.to_excel(f'data/employee-data-with-{emp_counts}-data.xlsx')

# Actuarial Assumption 
## Mortality and Morbidity Distribution

We will make a mortality Table from Makeham Distribution with parameters as follow 
$A = 0.00022, B = 2.7 × 10^{−6}, c = 1.124$. In general, the Survival Function of Makeham Distribution is :
\begin{equation}
S_X(t) = \exp(-A)\exp\left(\left(-\frac{B}{\ln{c}}c^x\left(c^t-1\right)\right)\right)
\end{equation}

In [28]:
def p(t,x, A = 0.00022, B = 2.7e-06, c = 1.124) :
    return np.exp(-A)*np.exp((-B/np.log(c)*c**x*(c**t-1)))

We also assume that Morbidity is 1% of the Mortality Rate

## Resignation Rate
Since assumption on resignation rate can be vary across company, we will use a simple assumption that is the rate is 10% at the age of 22 and decrease linearly to 1% at the age of retirement - 1.

# Economic Assumption 
Since the yield curve data hasn't been modeled in this repository, we assume a single discount rate with 5% p.a. and salary increase 10% p.a.
We also assume that the severance, service and seperation pay as follow :
| Yos | Sev | Svc |
|:---:|:---:|:---:|
|  0  |  1  |  0  |
|  1  |  2  |  0  |
|  2  |  3  |  0  |
|  3  |  4  |  2  |
|  4  |  5  |  2  |
|  5  |  6  |  2  |
|  6  |  7  |  3  |
|  7  |  8  |  3  |
|  8  |  9  |  3  |
|  9  |  9  |  4  |
|  10 |  9  |  4  |
|  11 |  9  |  4  |
|  12 |  9  |  5  |
|  13 |  9  |  5  |
|  14 |  9  |  5  |
|  15 |  9  |  6  |
|  16 |  9  |  6  |
|  17 |  9  |  6  |
|  18 |  9  |  7  |
|  19 |  9  |  7  |
|  20 |  9  |  7  |
|  21 |  9  |  8  |
|  22 |  9  |  8  |
|  23 |  9  |  8  |
|  24 |  9  |  10 |
|  25 |  9  |  10 |
|  26 |  9  |  10 |
|  27 |  9  |  10 |
|  28 |  9  |  10 |
|  29 |  9  |  10 |
|  30 |  9  |  10 |  

In [29]:
len([min(i+1,9) for i in range(31)])

31

In [30]:
sev_svc = pd.DataFrame({'severance': [min(i+1,9) for i in range(31)],
                        'service' : [0,0,0,2,2,2,3,3,3,
                                     4,4,4,5,5,5,6,6,6,
                                     7,7,7,8,8,8,10,10,10,10,10,10,10]})
sev_svc

Unnamed: 0,severance,service
0,1,0
1,2,0
2,3,0
3,4,2
4,5,2
5,6,2
6,7,3
7,8,3
8,9,3
9,9,4


And a benefit table as follow: 

| YoS | Pension | Death | Disability | Resign |
|:---:|:-------:|:-----:|:----------:|:------:|
|  0  |   1.75  |   2   |      2     |   0.5  |
|  1  |   3.5   |   4   |      4     |    1   |
|  2  |   5.25  |   6   |      6     |   1.5  |
|  3  |    9    |   10  |     10     |    2   |
|  4  |  10.75  |   12  |     12     |   2.5  |
|  5  |   12.5  |   14  |     14     |    3   |
|  6  |  15.25  |   17  |     17     |   3.5  |
|  7  |    17   |   19  |     19     |    4   |
|  8  |  18.75  |   21  |     21     |   4.5  |
|  9  |  19.75  |   22  |     22     |   4.5  |
|  10 |  19.75  |   22  |     22     |   4.5  |
|  11 |  19.75  |   22  |     22     |   4.5  |
|  12 |  20.75  |   23  |     23     |   4.5  |
|  13 |  20.75  |   23  |     23     |   4.5  |
|  14 |  20.75  |   23  |     23     |   4.5  |
|  15 |  21.75  |   24  |     24     |   4.5  |
|  16 |  21.75  |   24  |     24     |   4.5  |
|  17 |  21.75  |   24  |     24     |   4.5  |
|  18 |  22.75  |   25  |     25     |   4.5  |
|  19 |  22.75  |   25  |     25     |   4.5  |
|  20 |  22.75  |   25  |     25     |   4.5  |
|  21 |  23.75  |   26  |     26     |   4.5  |
|  22 |  23.75  |   26  |     26     |   4.5  |
|  23 |  23.75  |   26  |     26     |   4.5  |
|  24 |  25.75  |   28  |     28     |   4.5  |
|  25 |  25.75  |   28  |     28     |   4.5  |
|  26 |  25.75  |   28  |     28     |   4.5  |
|  27 |  25.75  |   28  |     28     |   4.5  |
|  28 |  25.75  |   28  |     28     |   4.5  |
|  29 |  25.75  |   28  |     28     |   4.5  |
|  30 |  25.75  |   28  |     28     |   4.5  |

In [31]:
ben_fac = pd.DataFrame({'retire': 1.75*sev_svc['severance']+sev_svc['service'],
                        'death': 2*sev_svc['severance']+sev_svc['service'],
                        'disable': 2*sev_svc['severance']+sev_svc['service'],
                        'resign': [1]*sev_svc.shape[0]})
ben_fac

Unnamed: 0,retire,death,disable,resign
0,1.75,2,2,1
1,3.5,4,4,1
2,5.25,6,6,1
3,9.0,10,10,1
4,10.75,12,12,1
5,12.5,14,14,1
6,15.25,17,17,1
7,17.0,19,19,1
8,18.75,21,21,1
9,19.75,22,22,1


## Pre-Processing Data
We need the information of **Age** and **Year of Service (YOS)** of each of the employees. For that we also need to define the **valuation date**. Let's assume the valuation date is 31 December 2022

In [32]:
val_date = pd.Timestamp('2022-12-31')
data['age'] = np.round((val_date- data.dob)/np.timedelta64(1, 'Y'),2)
data['yos'] = np.round((val_date- data.doh)/np.timedelta64(1, 'Y'),2)

In [33]:
data

Unnamed: 0,id,dob,doh,salary,age,yos
0,2840,1976-04-28,2012-02-07,2418.24,46.68,10.90
1,771,1988-10-03,2011-10-14,1003.54,34.24,11.21
2,827,1997-06-13,2014-07-24,914.43,25.55,8.44
3,4655,1978-05-23,2014-08-23,7.50,44.61,8.36
4,3888,1972-04-05,2019-08-01,10586.79,50.74,3.42
...,...,...,...,...,...,...
495,1217,1988-07-17,2013-06-11,10879.69,34.46,9.56
496,2614,1996-09-09,2020-10-05,70.37,26.31,2.24
497,4077,1985-02-04,2013-09-03,18220.42,37.90,9.33
498,2967,1990-02-25,2017-05-27,1.28,32.85,5.60


# Calculating the Post Employment Benefit

In [40]:
int(56 - data.age[0])

9

In [70]:
death_ = ben_fac['death'].iloc[int(1.1):int()]
death_

4

In [67]:
sal_inc = 0.03
i = 0.05
nra = 55
df 
for n in range(2):
    inc_ = [(1+sal_inc)**k for k in range(int(nra - df.age[n]))]
    inc_.append(inc_[-1] if (df.age[n] - int(df.age[n]))<0.5 else inc_[-1]*(1+sal_inc))
    inc_ = np.array(inc_)
    pv = np.array([(1+i)**(-k) for k in range(int(nra - df.age[n]))])
    pv 
    for x in ben_fac.columns[1:]:
        ben_ = df.salary[n]*ben_fac[x].iloc[int(df.yos[n]):int(nra - df.age[n])+1]
          
    a = (df.salary[0]*inc_)#.T.dot().dot(pv)

In [66]:
inc_

array([1.0, 1.03, 1.0609, 1.092727, 1.1255088100000001, 1.1592740743,
       1.1940522965290001, 1.2298738654248702, list([1.2667700813876164])],
      dtype=object)

In [58]:
df.salary[0]*inc_

[1.0,
 1.03,
 1.0609,
 1.092727,
 1.1255088100000001,
 1.1592740743,
 1.1940522965290001,
 1.2298738654248702]

In [None]:
def present_value(df = data, sal_inc = 0.03,i = 0.05, nra = 55):
    total = pd.DataFrame(columns= ben_fac.columns,)
    for n in range(df.shape[0]):
        inc_ = [(1+sal_inc)**k for k in range(int(nra - df.age[n]))]
        inc_.append(inc_[-1] if (df.age[n] - int(df.age[n]))<0.5 else inc_[-1]*(1+sal_inc))
        inc_ = np.array(inc_)
        pv = [(1+i)**(-k) for k in range(int(nra - df.age[n]))]
        pv.append((1+i)**(-(nra - df.age[n])))
        pv = np.array(inc_)   
        sum = 0     
        for x in ben_fac.columns[1:]:
            ben_ = df.salary[n]*ben_fac[x].iloc[int(df.yos[n]):int(nra - df.age[n])+1]
            