# Introduction 

This notebook will generate a common employment data with following columns :
1. ID : Employe ID
2. DOB : Date of Birth
3. DOH : Date of Hire
4. Salary

The aim of this notebook is that Consulting Actuary Company or any other company that work with employee data can have an open employee data that they can generate on their own for training or education purposes.

# Data Generator

To minimize the error of data, the range of date of the data will be between 1 to 28 (instead of 1 to 31 like normal calendar). 

In [1]:
# Packages
import pandas as pd # DataFrame
import numpy as np # Vector or Matrices
import datetime # Date time

In [24]:
def generate_date(year_1 = 1980, year_2 = 2000):
    return datetime.datetime(year_1, 1, 1) + datetime.timedelta(days= np.random.randint((datetime.datetime(year_2, 12,31) - datetime.datetime(year_1, 1,1)).days))  


In [25]:
emp_counts = int(5e6) # Length of Data

# Creating DataFrame
data = pd.DataFrame(data = {"id" : np.random.randint(0, emp_counts*10, emp_counts), 
                            'dob' : [generate_date() for i in range(emp_counts)],
                            'doh' : [generate_date(year_1 = 2008, year_2 = 2023) for i in range(emp_counts)],
                            'salary' : np.round(np.random.randn(emp_counts)**2*1e4,2)})
data['dob'] = pd.to_datetime(data['dob'])
data['doh'] = pd.to_datetime(data['doh'])
data

Unnamed: 0,id,dob,doh,salary
0,10414596,1987-01-18,2017-09-07,5070.83
1,41845687,1990-09-20,2012-04-24,1370.64
2,31056189,1982-05-01,2011-05-31,17299.19
3,47028756,1984-08-15,2010-05-03,4648.21
4,45699489,1986-10-17,2011-07-24,13310.38
...,...,...,...,...
4999995,16693351,1993-11-14,2022-12-05,7928.28
4999996,46038758,1999-08-21,2017-12-20,651.16
4999997,38277386,1998-12-08,2009-08-04,478.15
4999998,16218445,1990-12-28,2010-08-29,2875.36


364

In [141]:
# To save in Excel Format
#data.to_excel(f'data/employee-data-with-{emp_counts}-data.xlsx')

# Actuarial Assumption 
## Mortality and Morbidity Distribution

We will make a mortality Table from Makeham Distribution with parameters as follow 
$A = 0.00022, B = 2.7 × 10^{−6}, c = 1.124$. In general, the Survival Function of Makeham Distribution is :
\begin{equation}
S_X(t) = \exp(-A)\exp\left(\left(-\frac{B}{\ln{c}}c^x\left(c^t-1\right)\right)\right)
\end{equation}

In [142]:
def p(t,x, A = 0.00022, B = 2.7e-06, c = 1.124) :
    return np.exp(-A)*np.exp((-B/np.log(c)*c**x*(c**t-1)))

We also assume that Morbidity is 1% of the Mortality Rate

## Resignation Rate
Since assumption on resignation rate can be vary across company, we will use a simple assumption that is the rate is 10% at the age of 22 and decrease linearly to 1% at the age of retirement - 1.

# Economic Assumption 
Since the yield curve data hasn't been modeled in this repository, we assume a single discount rate with 5% p.a. and salary increase 10% p.a.
We also assume that the severance, service and seperation pay as follow :
| Yos | Sev | Svc |
|:---:|:---:|:---:|
|  0  |  1  |  0  |
|  1  |  2  |  0  |
|  2  |  3  |  0  |
|  3  |  4  |  2  |
|  4  |  5  |  2  |
|  5  |  6  |  2  |
|  6  |  7  |  3  |
|  7  |  8  |  3  |
|  8  |  9  |  3  |
|  9  |  9  |  4  |
|  10 |  9  |  4  |
|  11 |  9  |  4  |
|  12 |  9  |  5  |
|  13 |  9  |  5  |
|  14 |  9  |  5  |
|  15 |  9  |  6  |
|  16 |  9  |  6  |
|  17 |  9  |  6  |
|  18 |  9  |  7  |
|  19 |  9  |  7  |
|  20 |  9  |  7  |
|  21 |  9  |  8  |
|  22 |  9  |  8  |
|  23 |  9  |  8  |
|  24 |  9  |  10 |
|  25 |  9  |  10 |
|  26 |  9  |  10 |
|  27 |  9  |  10 |
|  28 |  9  |  10 |
|  29 |  9  |  10 |
|  30 |  9  |  10 |  

In [144]:
sev_svc = pd.DataFrame({'severance': [min(i+1,9) for i in range(60)],
                        'service' : [0,0,0,2,2,2,3,3,3,
                                     4,4,4,5,5,5,6,6,6,
                                     7,7,7,8,8,8,10,10,10,10,10,10,10,
                                     10,10,10,10,10,10,10,10,10,10,10,10,10,10,
                                     10,10,10,10,10,10,10,10,10,10,10,10,10,10,10]})
sev_svc

Unnamed: 0,severance,service
0,1,0
1,2,0
2,3,0
3,4,2
4,5,2
5,6,2
6,7,3
7,8,3
8,9,3
9,9,4


And a benefit table as follow: 

| YoS | Pension | Death | Disability | Resign |
|:---:|:-------:|:-----:|:----------:|:------:|
|  0  |   1.75  |   2   |      2     |   0.5  |
|  1  |   3.5   |   4   |      4     |    1   |
|  2  |   5.25  |   6   |      6     |   1.5  |
|  3  |    9    |   10  |     10     |    2   |
|  4  |  10.75  |   12  |     12     |   2.5  |
|  5  |   12.5  |   14  |     14     |    3   |
|  6  |  15.25  |   17  |     17     |   3.5  |
|  7  |    17   |   19  |     19     |    4   |
|  8  |  18.75  |   21  |     21     |   4.5  |
|  9  |  19.75  |   22  |     22     |   4.5  |
|  10 |  19.75  |   22  |     22     |   4.5  |
|  11 |  19.75  |   22  |     22     |   4.5  |
|  12 |  20.75  |   23  |     23     |   4.5  |
|  13 |  20.75  |   23  |     23     |   4.5  |
|  14 |  20.75  |   23  |     23     |   4.5  |
|  15 |  21.75  |   24  |     24     |   4.5  |
|  16 |  21.75  |   24  |     24     |   4.5  |
|  17 |  21.75  |   24  |     24     |   4.5  |
|  18 |  22.75  |   25  |     25     |   4.5  |
|  19 |  22.75  |   25  |     25     |   4.5  |
|  20 |  22.75  |   25  |     25     |   4.5  |
|  21 |  23.75  |   26  |     26     |   4.5  |
|  22 |  23.75  |   26  |     26     |   4.5  |
|  23 |  23.75  |   26  |     26     |   4.5  |
|  24 |  25.75  |   28  |     28     |   4.5  |
|  25 |  25.75  |   28  |     28     |   4.5  |
|  26 |  25.75  |   28  |     28     |   4.5  |
|  27 |  25.75  |   28  |     28     |   4.5  |
|  28 |  25.75  |   28  |     28     |   4.5  |
|  29 |  25.75  |   28  |     28     |   4.5  |
|  30 |  25.75  |   28  |     28     |   4.5  |

In [145]:
ben_fac = pd.DataFrame({'retire': 1.75*sev_svc['severance']+sev_svc['service'],
                        'death': 2*sev_svc['severance']+sev_svc['service'],
                        'disable': 2*sev_svc['severance']+sev_svc['service'],
                        'resign': [1]*sev_svc.shape[0]})
ben_fac

Unnamed: 0,retire,death,disable,resign
0,1.75,2,2,1
1,3.5,4,4,1
2,5.25,6,6,1
3,9.0,10,10,1
4,10.75,12,12,1
5,12.5,14,14,1
6,15.25,17,17,1
7,17.0,19,19,1
8,18.75,21,21,1
9,19.75,22,22,1


## Pre-Processing Data
We need the information of **Age** and **Year of Service (YOS)** of each of the employees. For that we also need to define the **valuation date**. Let's assume the valuation date is 31 December 2022

In [146]:
val_date = pd.Timestamp('2022-12-31')
data['age'] = np.round((val_date- data.dob)/np.timedelta64(1, 'Y'),2)
data['yos'] = np.round((val_date- data.doh)/np.timedelta64(1, 'Y'),2)

In [147]:
data

Unnamed: 0,id,dob,doh,salary,age,yos
0,1,1986-01-04,2017-04-10,1079.32,36.99,5.72


# Calculating the Post Employment Benefit

In [148]:
int(56 - data.age[0])

19

In [149]:
death_ = ben_fac['death'].iloc[int(1.1):int()]
death_

Series([], Name: death, dtype: int64)

In [150]:
def present_value(df = data, sal_inc = 0.03,i = 0.05, nra = 55):
    sum = [0]*df.shape[0]
    for n in range(df.shape[0]):
        try :
            inc_ = [(1+sal_inc)**k for k in range(int(nra - df.age[n]))]
            pv = [(1+i)**(-k) for k in range(int(nra - df.age[n]))]
            inc_ = np.array(inc_)
            pv = np.array(pv)               
            ben_ = ben_fac.iloc[int(df.yos[n]):int(df.yos[n] + nra - df.age[n])+1].T.dot(np.multiply(df.salary[n]*inc_,pv))

        except:
        
            inc_ = [(1+sal_inc)**k for k in range(int(nra - df.age[n]))]
            pv = [(1+i)**(-k) for k in range(int(nra - df.age[n]))]
            inc_.append(inc_[-1] if (df.age[n] - int(df.age[n]))<0.5 else inc_[-1]*(1+sal_inc))
            inc_ = np.array(inc_)
            pv.append((1+i)**(-(nra - df.age[n])))
            pv = np.array(pv)   
            ben_ = ben_fac.iloc[int(df.yos[n]):int(df.yos[n] + nra - df.age[n])].T.dot(np.multiply(df.salary[n]*inc_,pv))
        
        sum[n] = ben_.sum()
    return sum

In [151]:
normal = np.sum(present_value())
add_interest = np.sum(present_value(i  = 0.06))
dec_interest = np.sum(present_value(i  = 0.04))

ValueError: Dot product shape mismatch, (4, 18) vs (19,)

In [None]:
normal

285257.2212300787

In [None]:
add_interest

278390.26497671736

In [None]:
dec_interest

292431.82332608924

In [None]:
ben_fac.iloc[int(df.yos[n]):int(df.yos[n] + nra - df.age[n])+1]

Unnamed: 0,retire,death,disable,resign
11,19.75,22,22,1
12,20.75,23,23,1
13,20.75,23,23,1
14,20.75,23,23,1
15,21.75,24,24,1
16,21.75,24,24,1
17,21.75,24,24,1
18,22.75,25,25,1
19,22.75,25,25,1
20,22.75,25,25,1


In [None]:
n = 0
sal_inc = 0.03
i = 0.05
df = data.copy()
nra = 55

inc_ = [(1+sal_inc)**k for k in range(int(nra - df.age[n]))]
inc_.append(inc_[-1] if (df.age[n] - int(df.age[n]))<0.5 else inc_[-1]*(1+sal_inc))
inc_ = np.array(inc_)
pv = [(1+i)**(-k) for k in range(int(nra - df.age[n]))]
pv.append((1+i)**(-(nra - df.age[n])))
pv = np.array(pv)   
ben_ = ben_fac.iloc[int(df.yos[n]):int(df.yos[n] + nra - df.age[n])+1].T.dot(np.multiply(df.salary[n]*inc_,pv))
sum[n] = ben_.sum()

ValueError: Dot product shape mismatch, (4, 6) vs (5,)

In [None]:
ben_.sum()

1431091.6382905873

In [None]:
ben_fac.iloc[int(df.yos[n]):int(nra - df.age[n])+1].shape

(11, 4)