# Flexibility in a Search Model 

## Model

- homogeneous workers with utility $u(\cdot)$ if employed and flow (dis)utility $b$ if unemployed
- heterogeneous firms endowed with flexibility level $k \in \{0,1,..., K\}$ costing $c(k)$ with linear profit $y(x;k)-w(x;k)-c(k)$
- search parameters: discount rate $\rho$, unemployed meet firms at rate $\lambda$ (no on-the-job search), upon meeting draw match-specific productivity $x \sim G(x)$, bargaining parameter $\alpha$, employed face separation shock $\eta$


## Necessary Packages

In [1]:
# General
import numpy as np
import pandas as pd 
import scipy.stats as stats

# Graphics
import matplotlib.pyplot as plt 
import seaborn as sns

# Estimation
from scipy.optimize import minimize


## Data 
- employed workers earn wage $w_i$ at firm with flexibility level $k$
- unemployed workers have unemployment durations of $t_i$
- flexibility level $k$ defined by 
    - 0: No flexibility in start and end times of work 
    - 1: Informal policy allowing flexibility in start and end times of work
    - 2: Formal policy allowing for flexibility in start and end times of work

In [2]:
df=pd.read_stata('workfile.dta', columns=['sex','employed', 'flex_sched_score', 'hrwage', 'dur'])

In [3]:
df['flex']=np.NaN
df['flex'].iloc[(df['flex_sched_score']==0)] = 0
df['flex'].iloc[(df['flex_sched_score']==1 )| (df['flex_sched_score']==2)] = 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [4]:
df['flex_sched_score'].value_counts()

1.0    1219
0.0     829
2.0     406
Name: flex_sched_score, dtype: int64

In [5]:
df['flex'].value_counts()

1.0    1625
0.0     829
Name: flex, dtype: int64

In [6]:
df['flex_sched_score'].groupby(df['sex']).value_counts(normalize=True)

sex     flex_sched_score
male    1.0                 0.547504
        0.0                 0.274557
        2.0                 0.177939
female  1.0                 0.444719
        0.0                 0.402640
        2.0                 0.152640
Name: flex_sched_score, dtype: float64

In [7]:
df['employed'].value_counts()

1.0    2454
0.0      54
Name: employed, dtype: int64

### Men

In [8]:
men = df[df['sex']=='male']
len(men)

1269

In [9]:
men['employed'].value_counts()

1.0    1242
0.0      27
Name: employed, dtype: int64

In [10]:
men['hrwage'].describe()

count    1242.000000
mean       42.517616
std        18.955248
min         0.008000
25%        26.923000
50%        39.423000
75%        58.173000
max        72.115250
Name: hrwage, dtype: float64

In [11]:
men['dur'].describe()

count     27.000000
mean      18.629629
std       23.348774
min        8.000000
25%       12.000000
50%       12.000000
75%       16.000000
max      131.000000
Name: dur, dtype: float64

In [12]:
fifth_pctl = np.zeros(1)

for k in range(3):
    tmp = men[men['flex_sched_score']==k]
    fifth = np.percentile(tmp['hrwage'],5)
    fifth_pctl = np.append(fifth_pctl, fifth)
    print("5th percentile wage = " + str(fifth) + " for men with flex level " + str(k))
    
men['wage_trunc3']=men['hrwage']

for k in range(3):
    men['wage_trunc3'].iloc[(men['hrwage']<fifth_pctl[k+1]) & (men['flex_sched_score']==k)]=fifth_pctl[k+1] #k+1 because empty array initiates with zero

men['wage_trunc3'].groupby([men['employed'],men['flex_sched_score']]).describe()

5th percentile wage = 13.0 for men with flex level 0
5th percentile wage = 17.49037504196167 for men with flex level 1
5th percentile wage = 18.0 for men with flex level 2


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  men['wage_trunc3']=men['hrwage']


Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,std,min,25%,50%,75%,max
employed,flex_sched_score,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1.0,0.0,341.0,35.949913,17.789873,13.0,22.5,31.25,47.11525,72.11525
1.0,1.0,680.0,45.039341,18.127398,17.490376,30.0,43.269001,61.53825,72.11525
1.0,2.0,221.0,46.191139,18.594522,18.0,30.048,43.748001,64.903748,72.11525


In [13]:
fifth_pctl = np.zeros(1)

for k in range(2):
    tmp = men[men['flex']==k]
    fifth = np.percentile(tmp['hrwage'],5)
    fifth_pctl = np.append(fifth_pctl, fifth)
    print("5th percentile wage = " + str(fifth) + " for men with flex level " + str(k))
    
men['wage_trunc2']=men['hrwage']

for k in range(2):
    men['wage_trunc2'].iloc[(men['hrwage']<fifth_pctl[k+1]) & (men['flex']==k)]=fifth_pctl[k+1] #k+1 because empty array initiates with zero
        
men['wage_trunc3'].groupby([men['employed'],men['flex']]).describe()

5th percentile wage = 13.0 for men with flex level 0
5th percentile wage = 17.5 for men with flex level 1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  men['wage_trunc2']=men['hrwage']
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,std,min,25%,50%,75%,max
employed,flex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1.0,0.0,341.0,35.949913,17.789873,13.0,22.5,31.25,47.11525,72.11525
1.0,1.0,901.0,45.321857,18.239418,17.490376,30.0,43.269001,62.5,72.11525


### Women

In [14]:
women = df[df['sex']=='female']
len(women)

1239

In [15]:
fifth_pctl = np.zeros(1)

for k in range(3):
    tmp = women[women['flex_sched_score']==k]
    fifth = np.percentile(tmp['hrwage'],5)
    fifth_pctl = np.append(fifth_pctl, fifth)
    print("5th percentile wage = " + str(fifth) + " for women with flex level " + str(k))
    
women['wage_trunc3']=women['hrwage']

for k in range(3):
    women['wage_trunc3'].iloc[(women['hrwage']<fifth_pctl[k+1]) & (women['flex_sched_score']==k)]=fifth_pctl[k+1] #k+1 because empty array initiates with zero

women[['wage_trunc3','hrwage']].groupby([women['employed'],women['flex_sched_score']]).describe()    

5th percentile wage = 10.434999752044678 for women with flex level 0
5th percentile wage = 12.982499980926514 for women with flex level 1
5th percentile wage = 12.06520004272461 for women with flex level 2


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  women['wage_trunc3']=women['hrwage']
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0_level_0,Unnamed: 1_level_0,wage_trunc3,wage_trunc3,wage_trunc3,wage_trunc3,wage_trunc3,wage_trunc3,wage_trunc3,wage_trunc3,hrwage,hrwage,hrwage,hrwage,hrwage,hrwage,hrwage,hrwage
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
employed,flex_sched_score,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
1.0,0.0,488.0,29.081598,14.835706,10.434999,18.75,25.79325,35.607564,72.11525,488.0,28.904427,15.090322,0.019,18.75,25.79325,35.607564,72.11525
1.0,1.0,539.0,37.236629,17.382296,12.9825,23.77875,33.75,48.076752,72.11525,539.0,37.081024,17.621569,0.05275,23.77875,33.75,48.076752,72.11525
1.0,2.0,185.0,34.797462,16.732208,12.0652,21.9,31.730749,44.711498,72.11525,185.0,34.618717,17.004156,3.0,21.9,31.730749,44.711498,72.11525


In [16]:
fifth_pctl = np.zeros(1)

for k in range(2):
    tmp = women[women['flex']==k]
    fifth = np.percentile(tmp['hrwage'],5)
    fifth_pctl = np.append(fifth_pctl, fifth)
    print("5th percentile wage = " + str(fifth) + " for women with flex level " + str(k))
    
women['wage_trunc2']=women['hrwage']

for k in range(2):
    women['wage_trunc2'].iloc[(women['hrwage']<fifth_pctl[k+1]) & (women['flex']==k)]=fifth_pctl[k+1] #k+1 because empty array initiates with zero

women[['wage_trunc2','hrwage']].groupby([women['employed'],women['flex']]).describe()        

5th percentile wage = 10.434999752044678 for women with flex level 0
5th percentile wage = 12.5 for women with flex level 1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  women['wage_trunc2']=women['hrwage']
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0_level_0,Unnamed: 1_level_0,wage_trunc2,wage_trunc2,wage_trunc2,wage_trunc2,wage_trunc2,wage_trunc2,wage_trunc2,wage_trunc2,hrwage,hrwage,hrwage,hrwage,hrwage,hrwage,hrwage,hrwage
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
employed,flex,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
1.0,0.0,488.0,29.081598,14.835706,10.434999,18.75,25.79325,35.607564,72.11525,488.0,28.904427,15.090322,0.019,18.75,25.79325,35.607564,72.11525
1.0,1.0,724.0,36.602165,17.255283,12.5,23.0,33.653751,48.076752,72.11525,724.0,36.451839,17.487267,0.05275,23.0,33.653751,48.076752,72.11525


## Model Independent Functions

In [17]:
def lognormpdf(x: np.array, μ: float, σ: float):
    """
    Calculates lognormal pdf without stats packages
    """
    
    denom = x * σ * np.sqrt(2*np.pi)
    exp_num = -(np.log(x)-μ)**2
    exp_denom = 2 * σ * σ
    num = np.exp(exp_num/exp_denom)
    
    return num/denom

In [18]:
def lognormsf(x: np.array, μ: float, σ: float):
    """
    Calculated lognormal cdf with scipy.stats normal cdf
    """
    
    lnx = np.log(x)
    num = lnx - μ
    denom = σ
    
    return 1-stats.norm.cdf(num/denom)

In [112]:
def bootstrap(data: pd.DataFrame, n_samples:int):
    """
    Thanks, Caleb
    """
    bootstrapped_sample_list = []
    
    for n in range(n_samples):
        nth_sample = data.sample(frac=1, replace=True)
        bootstrapped_sample_list.append(nth_sample)
    
    return bootstrapped_sample_list

## Utility $u(w,k; \gamma) = w(x,k) + \gamma k$ and Productivity assumption $y(x,k; \zeta) = \zeta kx$

### Functions

In [19]:
def Pr_wage_given_match(data: pd.DataFrame, flex: str, wage: str, res_wage: float, c_k: np.array, ζ: float, γ: float, α: float, μ: float, σ: float):
    """
    Calculates probability of a wage draw conditional on a match being formed 
    
    Inputs
    - data: DataFrame
    - flex: string for name of flexibility column
    - wage: string for name of wage column
    - res_wage: float of observed minimum wage
    - c_k: Kx1 array of cost of providing flexibility
    - ζ: productivity weight of flexibility k
    - γ: utility weight of flexibility k
    - α: bargaining parameter
    - μ: location parameter of the log-normal wage distribution
    - σ: scale parameter of the log-normal wage distribution
    
    Functions
    - lognormpdf(x: np.array, μ: float, σ: float)
    - lognormsf(x: np.array, μ: float, σ: float)
    """
    employed_indiv = np.zeros(1) #sets first entry to zero 

    for k in range(len(c_k)):
        tmp = data[data[flex]==k]
        g = ( 1/( α*ζ*(k+1) ) ) * lognormpdf(( tmp[wage] - (1-α)*( res_wage - (γ*(k+1)) ) + α*c_k[k] )/( α*ζ*(k+1) ), μ, σ )
        G_tilde = lognormsf( ( res_wage + c_k[k] - (γ*(k+1)) )/(ζ*(k+1)), μ, σ )
        divide_thing = g/G_tilde
        employed_indiv = np.append(employed_indiv, divide_thing)
    
    return employed_indiv[1:] #removes first entry 


In [20]:
def hazard(res_wage: float, c_k: np.array, p_k: np.array, λ: float, ζ: float, γ: float, μ: float, σ: float):
    """
    Calculates the hazard rate out of employment 
    
    Inputs
    - res_wage: float of observed minimum wage
    - c_k: Kx1 array of cost of providing flexibility
    - p_k: Kx1 array of probability of each level of flexibility
    - λ: arrival rate of offer
    - ζ: productivity weight of flexibility k
    - γ: utility weight of flexibility k    
    - μ: location parameter of the log-normal wage distribution
    - σ: scale parameter of the log-normal wage distribution
    
    Functions
    - lognormsf(x: np.array, μ: float, σ: float)
    """
    
    prob_sum = 0
    
    if len(p_k)!=len(c_k):
        return print("Length of p_k and c_k do not match.")
    else:
        for k in range(len(c_k)):
            prob_sum += p_k[k] * lognormsf( ( res_wage + c_k[k] - (γ*(k+1)) )/(ζ*(k+1)), μ, σ ) #k+1 because Python index 0

    return λ*prob_sum

In [21]:
def log_L(data: pd.DataFrame, flex: str, wage: str, dur: str, res_wage: float, c_k: np.array, p_k: np.array, α: float, λ: float, η: float, ζ: float, γ: float, μ: float, σ: float):
    """
    
    Inputs
    - data: DataFrame of all individuals
    - flex: string for column of flexibility index (k)
    - wage: string for column of wage data 
    - dur: string for unemployment duration data
    - res_wage: float of observed minimum wage
    - c_k: Kx1 array of cost of providing flexibility
    - p_k: Kx1 array of probability of each level of flexibility
    - α: bargaining parameter
    - λ: arrival rate of offer
    - η: termination rate
    - ζ: productivity weight of flexibility k
    - γ: utility weight of flexibility k   
    - μ: location parameter of the log-normal wage distribution
    - σ: scale parameter of the log-normal wage distribution
    
    Functions
    - hazard(res_wage: np.array, c_k: np.array, p_k: np.array, λ: float, ζ: float, γ: float, μ: float, σ: float)
    - Pr_wage_given_match(data: pd.DataFrame, flex: str, wage: str, res_wage: np.array, c_k: np.array, ζ: float, γ: float, α: float, μ: float, σ: float)
    """
    
    N_log_h = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, ζ, γ, μ, σ) )
    N_log_h_plus_η = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, ζ, γ, μ, σ) + η )
    
    empl_data = np.sum( np.log( Pr_wage_given_match(data, flex, wage, res_wage, c_k, ζ, γ, α, μ, σ) ) )
    
    Nu_log_η = data[dur].count() * np.log(η)
    
    unempl_data = hazard(res_wage, c_k, p_k, λ, ζ, γ, μ, σ) * np.sum(data[dur])
    
    logL = -(N_log_h - N_log_h_plus_η + empl_data + Nu_log_η - unempl_data)
    
    return logL[0]

In [22]:
# Parameters to be estimated 

c_k = np.array([0,5,10])
λ = 10
η = 10
γ = 0
ζ = 1
α = 0.5

### Estimation: Men, K=2

In [62]:
men['flex'].value_counts(normalize=True, sort=False)

1.0    0.725443
0.0    0.274557
Name: flex, dtype: float64

In [104]:
prob_k = np.array([0.274557, 0.725443])

In [81]:
# Two-stage estimation

## Labor Market Variables in the first stage
Bounds1 = ((0,999), (0,999), (0,999), (0,999))

params1 = np.array([λ, η, men['wage_trunc2'].mean(), men['wage_trunc2'].std()])

logL_opt1 = lambda x: log_L(men, 'flex', 'wage_trunc2', 'dur', men['wage_trunc2'].min(), 
                            np.array([0,7]), prob_k, α, x[0], x[1], ζ, γ,
                            x[2], x[3])

est2 = minimize(logL_opt1, params1, method='Nelder-Mead', options={'maxiter':500, 'disp':True}, bounds=Bounds1)

## Flexibility Variables in the second stage
params2 = np.array([c_k[1], ζ, γ])

logL_opt2 = lambda x: log_L(men, 'flex', 'wage_trunc2', 'dur', men['wage_trunc2'].min(), 
                            np.array([0,x[0]]), prob_k, α, est2.x[0], est2.x[1], x[1], x[2],
                            est2.x[2], est2.x[3])

est2_second = minimize(logL_opt2, params2, method='Nelder-Mead', options={'maxiter':500, 'disp':True})

  return 1-stats.norm.cdf(num/denom)
  Nu_log_η = data[dur].count() * np.log(η)
  N_log_h = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, ζ, γ, μ, σ) )


Optimization terminated successfully.
         Current function value: 5594.708701
         Iterations: 309
         Function evaluations: 518
Optimization terminated successfully.
         Current function value: 4637.971461
         Iterations: 217
         Function evaluations: 410


In [82]:
print("Men's Labor market variables [λ, η, μ, σ] = "+ str(est2.x))
print("Men's Flexibility variables [c(1), ζ, γ] = "+ str(est2_second.x))

Men's Labor market variables [λ, η, μ, σ] = [5.44551658e-02 1.16768193e-03 3.68869364e+00 5.62082074e-01]
Men's Flexibility variables [c(1), ζ, γ] = [ 24.85654115   0.35470189 -12.2462732 ]


In [None]:
# Two-stage estimation with bootstrapping

## Labor Market Variables in the first stage: λ, η, μ, σ
Bounds1 = ((0,999), (0,999), (0,999), (0,999))
params1 = np.array([λ, η, men['wage_trunc2'].mean(), men['wage_trunc2'].std()])

## Flexibility Variables in the second stage: c(k), ζ, γ
params2 = np.array([c_k[1], ζ, γ])

## Bootstrapping
bootstrapped_data = bootstrap(men, n_samples=5000)

logL1 = []
logL2 = []
lambdas = []
etas = []
mus = []
sigmas = []
cs = []
zetas = []
gammas = []

for sample in bootstrapped_data:
    logL_opt1 = lambda x: log_L(sample, 'flex', 'wage_trunc2', 'dur', sample['wage_trunc2'].min(), 
                            np.array([0,7]), prob_k, α, x[0], x[1], ζ, γ,
                            x[2], x[3])
    est2 = minimize(logL_opt1, params1, method='Nelder-Mead', bounds=Bounds1)#options={'maxiter':500, 'disp':True}, 
    
    logL_opt2 = lambda x: log_L(sample, 'flex', 'wage_trunc2', 'dur', sample['wage_trunc2'].min(), 
                            np.array([0,x[0]]), prob_k, α, est2.x[0], est2.x[1], x[1], x[2],
                            est2.x[2], est2.x[3])
    est2_second = minimize(logL_opt2, params2, method='Nelder-Mead')#, options={'maxiter':500, 'disp':True}
    
    logL1.append(est2.fun)
    logL2.append(est2_second.fun)
    lambdas.append(est2.x[0])
    etas.append(est2.x[1])
    mus.append(est2.x[2])
    sigmas.append(est2.x[3])
    cs.append(est2_second.x[0])
    zetas.append(est2_second.x[1])
    gammas.append(est2_second.x[2])


  return 1-stats.norm.cdf(num/denom)
  N_log_h = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, ζ, γ, μ, σ) )
  Nu_log_η = data[dur].count() * np.log(η)
  N_log_h_plus_η = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, ζ, γ, μ, σ) + η )
  result = getattr(ufunc, method)(*inputs, **kwargs)
  lnx = np.log(x)
  empl_data = np.sum( np.log( Pr_wage_given_match(data, flex, wage, res_wage, c_k, ζ, γ, α, μ, σ) ) )
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  result = getattr(ufunc, method)(*inputs, **kwargs)
  empl_data = np.sum( np.log( Pr_wage_given_match(data, flex, wage, res_wage, c_k, ζ, γ, α, μ, σ) ) )


In [None]:
# fig,ax = plt.subplots(3,3,figsize=(8,12))
sns.displot(logL1).set(title="Log-Likelihood of First Stage")
sns.displot(logL2).set(title="Log-Likelihood of Second Stage")
sns.displot(lambdas).set(title="Lambda")
sns.displot(etas).set(title="Eta")
sns.displot(mus).set(title="Mu")
sns.displot(sigmas).set(title="Sigma")
sns.displot(cs).set(title="Cost of Flexibility")
sns.displot(zetas).set(title="Zetas")
sns.displot(gammas).set(title="Gammas")

### Estimation: Women, K=2

In [69]:
women['flex'].value_counts(normalize=True, sort=False)

0.0    0.40264
1.0    0.59736
Name: flex, dtype: float64

In [70]:
prob_k = np.array([0.40264, 0.59736])

In [78]:
# Two-stage estimation

## Labor Market Variables in the first stage
Bounds1 = ((0,999), (0,999), (0,999), (0,999))

params1 = np.array([λ, η, women['wage_trunc2'].mean(), women['wage_trunc2'].std()])

logL_opt1 = lambda x: log_L(women, 'flex', 'wage_trunc2', 'dur', women['wage_trunc2'].min(), 
                            np.array([0,7]), prob_k, α, x[0], x[1], ζ, γ,
                            x[2], x[3])

est2 = minimize(logL_opt1, params1, method='Nelder-Mead', options={'maxiter':500, 'disp':True}, bounds=Bounds1)

## Flexibility Variables in the second stage
params2 = np.array([c_k[1], ζ, γ])

logL_opt2 = lambda x: log_L(women, 'flex', 'wage_trunc2', 'dur', women['wage_trunc2'].min(), 
                            np.array([0,x[0]]), prob_k, α, est2.x[0], est2.x[1], x[1], x[2],
                            est2.x[2], est2.x[3])

est2_second = minimize(logL_opt2, params2, method='Nelder-Mead', options={'maxiter':500, 'disp':True})

  return 1-stats.norm.cdf(num/denom)
  empl_data = np.sum( np.log( Pr_wage_given_match(data, flex, wage, res_wage, c_k, ζ, γ, α, μ, σ) ) )
  Nu_log_η = data[dur].count() * np.log(η)


Optimization terminated successfully.
         Current function value: 5212.687761
         Iterations: 337
         Function evaluations: 571
Optimization terminated successfully.
         Current function value: 4304.964731
         Iterations: 175
         Function evaluations: 353


In [80]:
print("Women's Labor market variables [λ, η, μ, σ] = "+ str(est2.x))
print("Women's Flexibility variables [c(1), ζ, γ] = "+ str(est2_second.x))

Women's Labor market variables [λ, η, μ, σ] = [5.65043093e-02 1.23027330e-03 3.49235356e+00 6.16944391e-01]
Women's Flexibility variables [c(1), ζ, γ] = [17.06962877  0.31147502 -9.75230305]


### Estimation: Men, K=3

In [83]:
men['flex_sched_score'].value_counts(normalize=True, sort=False)

1.0    0.547504
0.0    0.274557
2.0    0.177939
Name: flex_sched_score, dtype: float64

In [84]:
prob_k = np.array([0.274557, 0.547504, 0.177939])

In [93]:
# Two-stage estimation

## Labor Market Variables in the first stage
Bounds1 = ((0,999), (0,999), (0,999), (0,999))

params1 = np.array([λ, η, men['wage_trunc3'].mean(), men['wage_trunc3'].std()])

logL_opt1 = lambda x: log_L(men, 'flex_sched_score', 'wage_trunc3', 'dur', men['wage_trunc3'].min(), 
                            np.array([0,7,10]), prob_k, α, x[0], x[1], ζ, γ,
                            x[2], x[3])

est4 = minimize(logL_opt1, params1, method='Nelder-Mead', options={'maxiter':500, 'disp':True}, bounds=Bounds1)

## Flexibility Variables in the second stage
params2 = np.array([c_k[1], c_k[2], ζ, γ])

logL_opt2 = lambda x: log_L(men, 'flex_sched_score', 'wage_trunc3', 'dur', men['wage_trunc3'].min(), 
                            np.array([0,x[0],x[1]]), prob_k, α, est4.x[0], est4.x[1], x[2], x[3],
                            est4.x[2], est4.x[3])

est4_second = minimize(logL_opt2, params2, method='Nelder-Mead', options={'maxiter':500, 'disp':True})

  return 1-stats.norm.cdf(num/denom)
  Nu_log_η = data[dur].count() * np.log(η)
  N_log_h = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, ζ, γ, μ, σ) )


Optimization terminated successfully.
         Current function value: 5640.634417
         Iterations: 440
         Function evaluations: 751
Optimization terminated successfully.
         Current function value: 5620.938567
         Iterations: 126
         Function evaluations: 261


In [94]:
print("Men's Labor market variables [λ, η, μ, σ] = "+ str(est4.x))
print("Men's Flexibility variables [c(1), c(2), ζ, γ] = "+ str(est4_second.x))

Men's Labor market variables [λ, η, μ, σ] = [5.49055995e-02 1.16650510e-03 3.61478061e+00 5.89857838e-01]
Men's Flexibility variables [c(1), c(2), ζ, γ] = [ 4.55388391e+00  3.83519501e+01  1.02632195e+00 -7.82580490e-03]


### Estimation: Women, K=3 
Estimation is finding $\mu=0$

In [88]:
women['flex_sched_score'].value_counts(normalize=True, sort=False)

0.0    0.402640
1.0    0.444719
2.0    0.152640
Name: flex_sched_score, dtype: float64

In [89]:
prob_k = np.array([0.402640, 0.444719, 0.152640])

In [95]:
# Two-stage estimation

# Labor Market Variables in the first stage
Bounds1 = ((0,999), (0,999), (0,999), (0,999))

params1 = np.array([λ, η, women['wage_trunc3'].mean(), women['wage_trunc3'].std()])

logL_opt1 = lambda x: log_L(women, 'flex_sched_score', 'wage_trunc3', 'dur', women['wage_trunc3'].min(), 
                            np.array([0,7,10]), prob_k, α, x[0], x[1], ζ, γ,
                            x[2], x[3])

est4 = minimize(logL_opt1, params1, method='Nelder-Mead', options={'maxiter':500, 'disp':True}, bounds=Bounds1)

# Flexibility Variables in the second stage
params2 = np.array([c_k[1], c_k[2], ζ, γ])

logL_opt2 = lambda x: log_L(women, 'flex_sched_score', 'wage_trunc3', 'dur', women['wage_trunc3'].min(), 
                            np.array([0,x[0],x[1]]), prob_k, α, est4.x[0], est4.x[1], x[2], x[3],
                            est4.x[2], est4.x[3])

est4_second = minimize(logL_opt2, params2, method='Nelder-Mead', options={'maxiter':1000, 'disp':True})

  return 1-stats.norm.cdf(num/denom)
  empl_data = np.sum( np.log( Pr_wage_given_match(data, flex, wage, res_wage, c_k, ζ, γ, α, μ, σ) ) )
  Nu_log_η = data[dur].count() * np.log(η)


Optimization terminated successfully.
         Current function value: 5572.561282
         Iterations: 203
         Function evaluations: 389
Optimization terminated successfully.
         Current function value: 3950.778259
         Iterations: 355
         Function evaluations: 626


In [96]:
print("Women's Labor market variables [λ, η, μ, σ] = "+ str(est4.x))
print("Women's Flexibility variables [c(1), c(2), ζ, γ] = "+ str(est4_second.x))

Women's Labor market variables [λ, η, μ, σ] = [3.40115879e-01 1.23001579e-03 0.00000000e+00 2.23079095e+00]
Women's Flexibility variables [c(1), c(2), ζ, γ] = [ 5.34002280e+00  5.87424834e+02  1.79983835e-01 -1.04337966e+01]


## Utility Linear in wage and Productivity assumption $y(x;k) = kx$

### Functions

In [None]:
def Pr_wage_given_match(data: pd.DataFrame, flex: str, wage: str, res_wage: float, c_k: np.array, α: float, μ: float, σ: float):
    """
    Calculates probability of a wage draw conditional on a match being formed 
    
    Inputs
    - data: DataFrame
    - flex: string for name of flexibility column
    - wage: string for name of wage column
    - res_wage: float of observed minimum wage
    - c_k: Kx1 array of cost of providing flexibility
    - α: bargaining parameter
    - μ: location parameter of the log-normal wage distribution
    - σ: scale parameter of the log-normal wage distribution
    
    Functions
    - lognormpdf(x: np.array, μ: float, σ: float)
    - lognormsf(x: np.array, μ: float, σ: float)
    """
    employed_indiv = np.zeros(1) #sets first entry to zero 

    for k in range(len(c_k)):
        tmp = data[data[flex]==k]
        g = ( 1/( α*(k+1) ) ) * lognormpdf( ( 1/( α*(k+1) ) )*( tmp[wage] - (1-α)*res_wage + α*c_k[k] ), μ, σ )
        G_tilde = lognormsf( ( 1/(k+1) )*( res_wage + c_k[k] ), μ, σ )
        divide_thing = g/G_tilde
        employed_indiv = np.append(employed_indiv, divide_thing)
    
    return employed_indiv[1:] #removes first entry 

In [None]:
def hazard(res_wage: float, c_k: np.array, p_k: np.array, λ: float, μ: float, σ: float):
    """
    Calculates the hazard rate out of employment 
    
    Inputs
    - res_wage: float of observed minimum wage
    - c_k: Kx1 array of cost of providing flexibility
    - p_k: Kx1 array of probability of each level of flexibility
    - λ: arrival rate of offer
    - μ: location parameter of the log-normal wage distribution
    - σ: scale parameter of the log-normal wage distribution
    
    Functions
    - lognormsf(x: np.array, μ: float, σ: float)
    """
    
    prob_sum = 0
    
    if len(p_k)!=len(c_k):
        return print("Length of p_k and c_k do not match.")
    else:
        for k in range(len(c_k)):
            prob_sum += p_k[k] * lognormsf( ( 1/(k+1) )*( res_wage + c_k[k]), μ, σ ) #k+1 because Python index 0

    return λ*prob_sum

In [None]:
def log_L(data: pd.DataFrame, flex: str, wage: str, dur: str, res_wage: float, c_k: np.array, p_k: np.array, α: float, λ: float, η: float, μ: float, σ: float):
    """
    
    Inputs
    - data: DataFrame of all individuals
    - flex: string for column of flexibility index (k)
    - wage: string for column of wage data 
    - dur: string for unemployment duration data
    - res_wage: float of observed minimum wage
    - c_k: Kx1 array of cost of providing flexibility
    - p_k: Kx1 array of probability of each level of flexibility
    - α: bargaining parameter
    - λ: arrival rate of offer
    - η: termination rate
    - μ: location parameter of the log-normal wage distribution
    - σ: scale parameter of the log-normal wage distribution
    
    Functions
    - hazard(res_wage: np.array, c_k: np.array, p_k: np.array, λ: float, μ: float, σ: float)
    - Pr_wage_given_match(data: pd.DataFrame, flex: str, wage: str, res_wage: np.array, c_k: np.array,  α: float, μ: float, σ: float)
    """
    
    N_log_h = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, μ, σ) )
    N_log_h_plus_η = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, μ, σ) + η )
    
    empl_data = np.sum( np.log( Pr_wage_given_match(data, flex, wage, res_wage, c_k, α, μ, σ) ) )
    
    Nu_log_η = data[dur].count() * np.log(η)
    
    unempl_data = hazard(res_wage, c_k, p_k, λ, μ, σ) * np.sum(data[dur])
    
    logL = -(N_log_h - N_log_h_plus_η + empl_data + Nu_log_η - unempl_data)
    
    return logL[0]

In [None]:
# Parameters to be estimated 

c_k = np.array([0,5,10])
λ = 10
η = 10

### Estimation: Men, K=3

In [None]:
men['flex_sched_score'].value_counts(normalize=True, sort=False)

In [None]:
prob_k = np.array([0.274557, 0.547504, 0.177939])

In [None]:
params = np.array([c_k[1], c_k[2], λ, η, men['wage_trunc3'].mean(), men['wage_trunc3'].std()])

logL_opt = lambda x: log_L(men, 'flex_sched_score', 'wage_trunc3', 'dur', men['wage_trunc3'].min(), 
                            np.array([0,x[0],x[1]]), prob_k, 0.5, x[2], x[3],
                            x[4], x[5])

est4 = minimize(logL_opt, params, method='Nelder-Mead')

In [None]:
est4.success

In [None]:
est4.x

In [None]:
est4.fun

In [None]:
params = np.array([c_k[1], c_k[2], λ, η, men['hrwage'].mean(), men['hrwage'].std()])

logL_opt = lambda x: log_L(men, 'flex_sched_score', 'hrwage', 'dur', men['hrwage'].min(), 
                            np.array([0,x[0],x[1]]), prob_k, 0.5, x[2], x[3],
                            x[4], x[5])

est3 = minimize(logL_opt, params, method='Nelder-Mead')

In [None]:
est3.success

In [None]:
est3.x

In [None]:
est3.fun

### Estimation: Men, K=2

In [None]:
men['flex'].value_counts(normalize=True, sort=False)

In [None]:
prob_k = np.array([0.274557, 0.725443])

In [None]:
params = np.array([c_k[1], λ, η, men['wage_trunc2'].mean(), men['wage_trunc2'].std()])

logL_opt = lambda x: log_L(men, 'flex', 'wage_trunc2', 'dur', men['wage_trunc2'].min(), 
                            np.array([0,x[0]]), prob_k, 0.5, x[1], x[2],
                            x[3],x[4])

est2 = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':8000})

In [None]:
est2.success

In [None]:
est2.x

In [None]:
est2.fun

In [None]:
params = np.array([c_k[1], λ, η, men['hrwage'].mean(), men['hrwage'].std()])

logL_opt = lambda x: log_L(men, 'flex', 'hrwage', 'dur', men['hrwage'].min(), 
                            np.array([0,x[0]]), prob_k, 0.5, x[1], x[2],
                            x[3],x[4])

est1 = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':8000})

In [None]:
est1.success

In [None]:
est1.x

In [None]:
est1.fun

### Estimation: Women, K=3

In [None]:
women['flex_sched_score'].value_counts(normalize=True, sort=False)

In [None]:
prob_k = np.array([0.402640, 0.444719, 0.152640])

In [None]:
params = np.array([c_k[1], c_k[2], λ, η, women['wage_trunc3'].mean(), women['wage_trunc3'].std()])

logL_opt = lambda x: log_L(women, 'flex_sched_score', 'wage_trunc3', 'dur', women['wage_trunc3'].min(), 
                            np.array([0,x[0],x[1]]), prob_k, 0.5, x[2], x[3],
                            x[4], x[5])

est4 = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':8000})

c_1 = 5, c_2 = 10

In [None]:
est4.success

In [None]:
est4.x

In [None]:
est4.fun

In [None]:
# Changes every time I run it for some reason.

params = np.array([c_k[1], c_k[2], λ, η, women['hrwage'].mean(), women['hrwage'].std()])

logL_opt = lambda x: log_L(women, 'flex_sched_score', 'hrwage', 'dur', women['hrwage'].min(), 
                            np.array([0,x[0],x[1]]), prob_k, 0.5, x[2], x[3],
                            x[4], x[5])

est3 = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':8000})

In [None]:
est3.success

In [None]:
est3.x

In [None]:
est3.fun

### Estimation: Women, K=2

In [None]:
women['flex'].value_counts(normalize=True, sort=False)

In [None]:
prob_k = np.array([0.40264, 0.59736])

In [None]:
# Also changes every time I run it

params = np.array([c_k[1], λ, η, women['wage_trunc2'].mean(), women['wage_trunc2'].std()])

logL_opt = lambda x: log_L(women, 'flex', 'wage_trunc2', 'dur', women['wage_trunc2'].min(), 
                            np.array([0,x[0]]), prob_k, 0.5, x[1], x[2],
                            x[3],x[4])

est2 = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':8000})

In [None]:
est2.success

In [None]:
est2.x

In [None]:
est2.fun

In [None]:
# Consistent across runs

params = np.array([c_k[1], λ, η, women['hrwage'].mean(), women['hrwage'].std()])

logL_opt = lambda x: log_L(women, 'flex', 'hrwage', 'dur', women['hrwage'].min(), 
                            np.array([0,x[0]]), prob_k, 0.5, x[1], x[2],
                            x[3],x[4])

est1 = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':8000})

In [None]:
est1.success

In [None]:
est1.x

In [None]:
est1.fun

# Figures

## Flex Schedule Score (k = 3)

In [None]:
fig, ax = plt.subplots(3, 1, figsize=(12, 8))

for k in range(3):
    tmp = df[(df['flex_sched_score']==k) & (df['sex']=='male') & (df['employed']==1)]
    sns.distplot(tmp['hrwage'], color='#4B9CD3', hist_kws={'alpha' : .3}, bins=100, ax=ax[k])
#     ax[k].legend(['Flexibility Level ' + str(k)])
    ax[k].set_ylim([0,0.1])
    ax[k].set_xlim([0,75])
    ax[k].set(xlabel = 'Hourly Wage for Men with Flexibile Schedule Score ' +str(k))

# ax.set(xlabel="Distribution of Men's Hourly Wage (raw)")

plt.tight_layout()

fig.savefig('./hrwage_men_3flex.png', bbox_inches='tight', transparent=True)

In [None]:
fig, ax = plt.subplots(3, 1, figsize=(12, 8))

for k in range(3):
    tmp = df[(df['flex_sched_score']==k) & (df['sex']=='female') & (df['employed']==1)]
    sns.distplot(tmp['hrwage'], color='#4B9CD3', hist_kws={'alpha' : .3}, bins=100, ax=ax[k])
#     ax[k].legend(['Flexibility Level ' + str(k)])
    ax[k].set_ylim([0,0.1])
    ax[k].set_xlim([0,75])
    ax[k].set(xlabel = 'Hourly Wage for Women with Flexibile Schedule Score ' +str(k))

# ax.set(xlabel="Distribution of Men's Hourly Wage (raw)")

plt.tight_layout()

fig.savefig('./hrwage_women_3flex.png', bbox_inches='tight', transparent=True)

In [None]:
fig, ax = plt.subplots(3, 1, figsize=(12, 8))

for k in range(3):
    tmp = women[(women['flex_sched_score']==k) & (women['employed']==1)]
    sns.distplot(tmp['wage_trunc3'], color='#4B9CD3', hist_kws={'alpha' : .3}, bins=100, ax=ax[k])
#     ax[k].legend(['Flex Level ' + str(k)])
    ax[k].set_ylim([0,0.1])
    ax[k].set_xlim([0,75])
    ax[k].set(xlabel = 'Truncated Hourly Wage for Women with Flexibile Schedule Score ' +str(k))

#ax.set(xlabel="Distribution of Men's Hourly Wage")

plt.tight_layout()

fig.savefig('./wageTrunc_women_3flex.png', bbox_inches='tight', transparent=True)

## Binary Flexibility Measure

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(12, 8))

for k in range(2):
    tmp = df[(df['flex']==k) & (df['sex']=='male') & (df['employed']==1)]
    sns.distplot(tmp['hrwage'], color='#4B9CD3', hist_kws={'alpha' : .3}, bins=100, ax=ax[k])
#     ax[k].legend(['Flexibility Level ' + str(k)])
    ax[k].set_ylim([0,0.1])
    ax[k].set_xlim([0,75])
    if k == 1:
        ax[k].set(xlabel = 'Hourly Wage for Men with Flexible Schedule')
    elif k == 0:
        ax[k].set(xlabel = 'Hourly Wage for Men without Flexible Schedule')
    else:
        print("Not binary k")

# ax.set(xlabel="Distribution of Men's Hourly Wage (raw)")

plt.tight_layout()

fig.savefig('./hrwage_men_2flex.png', bbox_inches='tight', transparent=True)

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(12, 8))

for k in range(2):
    tmp = df[(df['flex']==k) & (df['sex']=='female') & (df['employed']==1)]
    sns.distplot(tmp['hrwage'], color='#4B9CD3', hist_kws={'alpha' : .3}, bins=100, ax=ax[k])
#     ax[k].legend(['Flexibility Level ' + str(k)])
    ax[k].set_ylim([0,0.1])
    ax[k].set_xlim([0,75])
    if k == 1:
        ax[k].set(xlabel = 'Hourly Wage for Women with Flexible Schedule')
    elif k == 0:
        ax[k].set(xlabel = 'Hourly Wage for Women without Flexible Schedule')
    else:
        print("Not binary k")

# ax.set(xlabel="Distribution of Men's Hourly Wage (raw)")

plt.tight_layout()

fig.savefig('./hrwage_women_2flex.png', bbox_inches='tight', transparent=True)

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(12, 8))

for k in range(2):
    tmp = men[(men['flex']==k) & (men['employed']==1)] #(df['sex']=='male') & 
    sns.distplot(tmp['wage_trunc'], color='#4B9CD3', hist_kws={'alpha' : .3}, bins=100, ax=ax[k])
#     ax[k].legend(['Flexibility Level ' + str(k)])
    ax[k].set_ylim([0,0.1])
    ax[k].set_xlim([0,75])
    if k == 1:
        ax[k].set(xlabel = 'Truncated Hourly Wage for Men with Flexible Schedule')
    elif k == 0:
        ax[k].set(xlabel = 'Truncated Hourly Wage for Men without Flexible Schedule')
    else:
        print("Not binary k")

# ax.set(xlabel="Distribution of Men's Hourly Wage (raw)")

plt.tight_layout()

fig.savefig('./wagetrunc_men_2flex.png', bbox_inches='tight', transparent=True)

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(12, 8))

for k in range(2):
    tmp = women[(women['flex']==k) & (women['employed']==1)] #(df['sex']=='male') & 
    sns.distplot(tmp['wage_trunc2'], color='#4B9CD3', hist_kws={'alpha' : .3}, bins=100, ax=ax[k])
#     ax[k].legend(['Flexibility Level ' + str(k)])
    ax[k].set_ylim([0,0.1])
    ax[k].set_xlim([0,75])
    if k == 1:
        ax[k].set(xlabel = 'Truncated Hourly Wage for Women with Flexible Schedule')
    elif k == 0:
        ax[k].set(xlabel = 'Truncated Hourly Wage for Women without Flexible Schedule')
    else:
        print("Not binary k")

# ax.set(xlabel="Distribution of Men's Hourly Wage (raw)")

plt.tight_layout()

fig.savefig('./wagetrunc_women_2flex.png', bbox_inches='tight', transparent=True)

# Summary Statistics

In [None]:
agg_dict_empl_3 = {
    'hrwage': ['count', 'min', 'mean', 'std'],
    'wage_trunc3': ['min', 'mean', 'std']
}

agg_dict_empl_2 = {
    'hrwage': ['count', 'min', 'mean', 'std'],
    'wage_trunc2': ['min', 'mean', 'std']
}

agg_dict_unempl = {
    'dur': ['count', 'min', 'max', 'mean', 'std']
}

agg_dict_pos = {
    'flex_sched_score': ['count'] #want to add percent!
}

### Men

In [None]:
print(men.groupby(['flex_sched_score']).agg(agg_dict_empl_3).to_latex(float_format="%.2f"))

In [None]:
print(men.groupby(['flex']).agg(agg_dict_empl_2).to_latex(float_format="%.2f"))

In [None]:
print(men.agg(agg_dict_unempl).to_latex()) # by gender

### Women

In [None]:
print(women.groupby(['flex_sched_score']).agg(agg_dict_empl_3).to_latex(float_format="%.2f"))

In [None]:
print(women.groupby(['flex']).agg(agg_dict_empl_2).to_latex(float_format="%.2f"))

In [None]:
print(women.agg(agg_dict_unempl).to_latex()) # by gender

In [None]:
print(df.groupby(df['sex']).agg(agg_dict_unempl).to_latex(float_format="%.2f")) # by gender

In [None]:
print(empl_df.groupby(['female']).agg(agg_dict_empl).to_latex()) # all

In [None]:
print(unempl_df.agg(agg_dict_unempl).to_latex()) # by gender

# Scratch

In [65]:
# Binary flexibility

Bounds = ((0,999), (0,999), (0,999), (0,999))

params = np.array([λ, η, men['wage_trunc2'].mean(), men['wage_trunc2'].std()])

logL_opt = lambda x: log_L(men, 'flex', 'wage_trunc2', 'dur', men['wage_trunc2'].min(), 
                            np.array([0,10]), prob_k, α, x[0], x[1], ζ, γ,
                            x[2], x[3])

est2 = minimize(logL_opt, params, method='Nelder-Mead', bounds=Bounds, options={'maxiter':5000, 'disp':True})

  return 1-stats.norm.cdf(num/denom)
  N_log_h = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, ζ, γ, μ, σ) )
  Nu_log_η = data[dur].count() * np.log(η)


Optimization terminated successfully.
         Current function value: 5593.448180
         Iterations: 383
         Function evaluations: 652


In [66]:
est2.x

array([5.43812096e-02, 1.16851695e-03, 3.71965182e+00, 5.45725730e-01])

In [67]:
params = np.array([c_k[1], ζ, γ])

logL_opt = lambda x: log_L(men, 'flex', 'wage_trunc2', 'dur', men['wage_trunc2'].min(), 
                            np.array([0,x[0]]), prob_k, α, est2.x[0], est2.x[1], x[1], x[2],
                            est2.x[2], est2.x[3])

est2_second = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':500, 'disp':True})

Optimization terminated successfully.
         Current function value: 4702.811448
         Iterations: 205
         Function evaluations: 392


In [68]:
est2_second.x

array([ 24.05239467,   0.35900187, -12.195175  ])

In [71]:
# Binary flexibility

Bounds = ((0,999), (0,999), (0,999), (0,999))

params = np.array([λ, η, women['wage_trunc2'].mean(), women['wage_trunc2'].std()])

logL_opt = lambda x: log_L(women, 'flex', 'wage_trunc2', 'dur', women['wage_trunc2'].min(), 
                            np.array([0,7]), prob_k, α, x[0], x[1], ζ, γ,
                            x[2], x[3])

est2 = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':500, 'disp':True}, bounds=Bounds)

  return 1-stats.norm.cdf(num/denom)
  empl_data = np.sum( np.log( Pr_wage_given_match(data, flex, wage, res_wage, c_k, ζ, γ, α, μ, σ) ) )
  Nu_log_η = data[dur].count() * np.log(η)


Optimization terminated successfully.
         Current function value: 5212.687761
         Iterations: 337
         Function evaluations: 571


In [72]:
est2.x

array([5.65043093e-02, 1.23027330e-03, 3.49235356e+00, 6.16944391e-01])

In [73]:
params = np.array([c_k[1], ζ, γ])

logL_opt = lambda x: log_L(women, 'flex', 'wage_trunc2', 'dur', women['wage_trunc2'].min(), 
                            np.array([0,x[0]]), prob_k, α, est2.x[0], est2.x[1], x[1], x[2],
                            est2.x[2], est2.x[3])

est2_second = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':500, 'disp':True})

Optimization terminated successfully.
         Current function value: 4304.964731
         Iterations: 175
         Function evaluations: 353


In [74]:
est2_second.x

array([17.06962877,  0.31147502, -9.75230305])

In [None]:
params = np.array([c_k[1], c_k[2], λ, η, γ, men['wage_trunc3'].mean(), men['wage_trunc3'].std()])

logL_opt = lambda x: log_L(men, 'flex_sched_score', 'wage_trunc3', 'dur', men['wage_trunc3'].min(), 
                            np.array([0,x[0],x[1]]), prob_k, α, x[2], x[3], ζ, x[4],
                            x[5], x[6])

est4 = minimize(logL_opt, params, method='Nelder-Mead', options={'maxiter':800})

Runs if $\zeta = 1$ and $\gamma = 0$, as in the initial model, so it is a problem of using one flexibility marker to estimate 3 flexibility measures

In [None]:
est4.success

In [None]:
est4

In [None]:
est4.fun

In [None]:
# Parameters to be estimated 

c_k = np.array([0,5,10])
λ = 2.1
η = 2.1
μ = men['hrwage'].groupby(men['flex_sched_score']).mean().values
σ = men['hrwage'].groupby(men['flex_sched_score']).std().values

In [None]:
logL_opt = lambda x: log_L(men, 'flex_sched_score', 'hrwage', 'dur', Uk, 
                            x[0], prob_k, 0.5, x[1], x[2],
                            x[3], x[4])

In [None]:
params = np.array([c_k, float(λ), float(η), μ, σ])
params

In [None]:
params[1]

In [None]:
log_L(men, 'flex_sched_score', 'hrwage', 'dur', Uk, 
                            params[0], prob_k, 0.5, params[1], params[2],
                            params[3], params[4])

In [None]:
logL_opt(params)

## Pr_wage_match, hazard, and logL with res_wage, mu, and sigma varying with k

In [None]:
def Pr_wage_given_match(data: pd.DataFrame, flex: str, wage: str, res_wage: float, c_k: np.array, α: float, μ: float, σ: float):
# def Pr_wage_given_match(data: pd.DataFrame, flex: str, wage: str, res_wage: np.array, c_k: np.array, α: float, μ: np.array, σ: np.array):
    """
    Calculates probability of a wage draw conditional on a match being formed 
    
    Inputs
    - data: DataFrame
    - flex: string for name of flexibility column
    - wage: string for name of wage column
    - res_wage: Kx1 array of observed minimum wages for each flexibility level
    - c_k: Kx1 array of cost of providing flexibility
    - α: bargaining parameter
    - μ: array of location parameter of the log-normal wage distribution for each flexibility level
    - σ: array of scale parameter of the log-normal wage distribution for each flexibility level
    
    Functions
    - lognormpdf(x: np.array, μ: float, σ: float)
    - lognormsf(x: np.array, μ: float, σ: float)
    """
    employed_indiv = np.zeros(1) #sets first entry to zero 
# With U, μ and σ constant in flex level k
    for k in range(len(c_k)):
        tmp = data[data[flex]==k]
        g = ( 1/( α*(k+1) ) ) * lognormpdf( ( 1/( α*(k+1) ) )*( tmp[wage] - (1-α)*res_wage + α*c_k[k] ), μ, σ )
        G_tilde = lognormsf( ( 1/(k+1) )*( res_wage + c_k[k] ), μ, σ )
        divide_thing = g/G_tilde
        employed_indiv = np.append(employed_indiv, divide_thing)
# # With U, μ and σ varying with flex level k - unidentified    
#     for k in range(len(res_wage)):
#         tmp = data[data[flex]==k]
#         g = ( 1/( α*(k+1) ) ) * lognormpdf( ( 1/( α*(k+1) ) )*( tmp[wage] - (1-α)*res_wage[k] + α*c_k[k] ), μ[k], σ[k] )
#         G_tilde = lognormsf( ( 1/(k+1) )*( res_wage[k] + c_k[k] ), μ[k], σ[k] )
#         divide_thing = g/G_tilde
#         employed_indiv = np.append(employed_indiv, divide_thing)
    
    return employed_indiv[1:] #removes first entry 

In [None]:
def hazard(res_wage: float, c_k: np.array, p_k: np.array, λ: float, μ: float, σ: float):
# def hazard(res_wage: np.array, c_k: np.array, p_k: np.array, λ: float, μ: np.array, σ: np.array):
    """
    Calculates the hazard rate out of employment 
    
    Inputs
    - res_wage: Kx1 array of observed minimum wages for each flexibility level
    - c_k: Kx1 array of cost of providing flexibility
    - p_k: Kx1 array of probability of each level of flexibility
    - λ: arrival rate of offer
    - μ: array of location parameter of the log-normal wage distribution for each flexibility level
    - σ: array of scale parameter of the log-normal wage distribution for each flexibility level
    
    Functions
    - lognormsf(x: np.array, μ: float, σ: float)
    """
    
    prob_sum = 0
    
#     if len(res_wage)!=len(c_k):
#         return print("Length of res_wage and c_k do not match.")
#     elif len(res_wage)!=len(p_k):
#         return print("Length of res_wage and p_k do not match.")
    if len(p_k)!=len(c_k):
        return print("Length of p_k and c_k do not match.")
    else:
# With U, μ and σ constant in flex level k
        for k in range(len(c_k)):
            prob_sum += p_k[k] * lognormsf( ( 1/(k+1) )*( res_wage + c_k[k]), μ, σ ) #k+1 because Python index 0

# # With U, μ and σ varying with flex level k - unidentified    
#         for k in range(len(res_wage)):
#             prob_sum += p_k[k] * lognormsf( ( 1/(k+1) )*( res_wage[k] + c_k[k]), μ[k], σ[k] ) #k+1 because Python index 0
    
    return λ*prob_sum#[0]

In [None]:
def log_L(data: pd.DataFrame, flex: str, wage: str, dur: str, res_wage: float, c_k: np.array, p_k: np.array, α: float, λ: float, η: float, μ: float, σ: float):
# def log_L(data: pd.DataFrame, flex: str, wage: str, dur: str, res_wage: np.array, c_k: np.array, p_k: np.array, α: float, λ: float, η: float, μ: np.array, σ: np.array):
    """
    
    Inputs
    - data: DataFrame of all individuals
    - flex: string for column of flexibility index (k)
    - wage: string for column of wage data 
    - dur: string for unemployment duration data
    - res_wage: Kx1 array of observed minimum wages for each flexibility level
    - c_k: Kx1 array of cost of providing flexibility
    - p_k: Kx1 array of probability of each level of flexibility
    - α: bargaining parameter
    - λ: arrival rate of offer
    - η: termination rate
    - μ: array of location parameter of the log-normal wage distribution for each flexibility level
    - σ: array of scale parameter of the log-normal wage distribution for each flexibility level
    
    Functions
    - hazard(res_wage: np.array, c_k: np.array, p_k: np.array, λ: float, μ: float, σ: float)
    - Pr_wage_given_match(data: pd.DataFrame, flex: str, wage: str, res_wage: np.array, c_k: np.array,  α: float, μ: float, σ: float)
    """
    
    N_log_h = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, μ, σ) )
    N_log_h_plus_η = data.count() * np.log( hazard(res_wage, c_k, p_k, λ, μ, σ) + η )
    
    empl_data = np.sum( np.log( Pr_wage_given_match(data, flex, wage, res_wage, c_k, α, μ, σ) ) )
    
    Nu_log_η = data[dur].count() * np.log(η)
    
    unempl_data = hazard(res_wage, c_k, p_k, λ, μ, σ) * np.sum(data[dur])
    
    logL = -(N_log_h - N_log_h_plus_η + empl_data + Nu_log_η - unempl_data)
    
    return logL[0]

## Old Hazard and Log L (did not copy Pr_wage_match in time)

In [None]:
def hazard(res_wage: np.array, c_k: np.array, p_k: np.array, λ: float, μ: np.array, σ: np.array):
    """
    Calculates the hazard rate out of employment 
    
    Inputs
    - res_wage: Kx1 array of observed minimum wages for each flexibility level
    - c_k: Kx1 array of cost of providing flexibility
    - p_k: Kx1 array of probability of each level of flexibility
    - λ: arrival rate of offer
    - μ: array of location parameter of the log-normal wage distribution for each flexibility level
    - σ: array of scale parameter of the log-normal wage distribution for each flexibility level
    
    Functions
    - lognormsf(x: np.array, μ: float, σ: float)
    """
    
    prob_sum = 0
    
    if len(res_wage)!=len(c_k):
        return print("Length of res_wage and c_k do not match.")
    elif len(res_wage)!=len(p_k):
        return print("Length of res_wage and p_k do not match.")
    elif len(p_k)!=len(c_k):
        return print("Length of p_k and c_k do not match.")
    else:
        for k in range(len(res_wage)):
            prob_sum += p_k[k] * lognormsf( ( 1/(k+1) )*( res_wage[k] + c_k[k]), μ[k], σ[k] ) #k+1 because Python index 0
    
    return λ*prob_sum

In [None]:
def log_L(wage: np.array, k: np.array, res_wage: np.array, c_k: np.array, p_k: np.array, dur: np.array, α: float, λ: float, η: float, μ: np.array, σ: np.array):
    """
    
    Inputs
    - wage: Ne x 1 array of observed wage data 
    - k: Ne x 1 array of observed flexibility level data
    - res_wage: Kx1 array of observed minimum wages for each flexibility level
    - c_k: Kx1 array of cost of providing flexibility
    - p_k: Kx1 array of probability of each level of flexibility
    - dur: Nu x 1 array of observed unemployment duration data
    - α: bargaining parameter
    - λ: arrival rate of offer
    - η: termination rate
    - μ: array of location parameter of the log-normal wage distribution for each flexibility level
    - σ: array of scale parameter of the log-normal wage distribution for each flexibility level
    
    Functions
    - hazard(res_wage: np.array, c_k: np.array, p_k: np.array, λ: float, μ: float, σ: float)
    - Pr_wage_given_match(wage: np.array, k: np.array, res_wage: np.array, c_k: np.array,  α: float, μ: float, σ: float)
    """
    
    N_log_h = len(wage) * np.log( hazard(res_wage, c_k, p_k, λ, μ, σ) )
    N_log_h_plus_η = len(wage) * np.log( hazard(res_wage, c_k, p_k, λ, μ, σ) + η )
    
    empl_data = np.sum( np.log( Pr_wage_given_match(wage, k, res_wage, c_k,  α, μ, σ) ) )
    
    Nu_log_η = len(dur) * np.log(η)
    
    unempl_data = hazard(res_wage, c_k, p_k, λ, μ, σ) * np.sum(dur)
    
    logL = N_log_h - N_log_h_plus_η + empl_data + Nu_log_η - unempl_data
    
    return logL

In [None]:
empl_men = men[men['employed']==1]
len(empl_men)

In [None]:
unempl_men = men[men['employed']==0]
len(unempl_men)

In [None]:
fig, ax = plt.subplots(3, 1, figsize=(12, 8))

for k in range(3):
    tmp = df[(df['flex_sched_score']==k) & (df['sex']=='female') & (df['employed']==1)]
    sns.distplot(tmp['hrwage'], color='#4B9CD3', hist_kws={'alpha' : .3}, bins=100, ax=ax[k])
    ax[k].legend(['Flex Level ' + str(k)])
    ax[k].set_ylim([0,0.1])
    ax[k].set_xlim([0,75])


#ax.set(xlabel="Distribution of Men's Hourly Wage")

plt.tight_layout()

# fig.savefig('./figures/wage_noMin.png', bbox_inches='tight', transparent=True)