# Flexibility in a Search Model 

## Model

- homogeneous workers with utility $u(w, f) = w + \gamma f$ if employed and flow (dis)utility $b$ if unemployed
- heterogeneous firms endowed with flexibility level $f \in \{0,1,..., F\}$ with probability $p_f$ and production function $y(x) = x$ and linear profit $\pi(x,f) = y(x)-w(x,f)-c(f)$
- search parameters: 
    - discount rate $\rho$, unemployed meet firms at rate $\lambda$ (no on-the-job search), bargaining parameter $\alpha$, employed face separation shock $\eta$
    - upon meeting draw match-specific productivity $x \sim G_f(x)$, log-normal with parameters $\mu_f$ and $\sigma^2_f$ 

## Necessary Packages

In [1]:
# General
import numpy as np
import pandas as pd 
import scipy.stats as stats

# Graphics
import matplotlib.pyplot as plt 
import seaborn as sns

# Estimation
from scipy.optimize import minimize

# Debugging
import pdb


## Data 
- Homogeneity measures: aged 25-55; white; either college graduate (col_edu==1) or HS graduate (hs_edu==1)
- employed workers earn wage $w$ (in 2018 dollars) at firm with flexibility level $k$
    - flexibility measures:
        - Schedule Flexibility 
            - 0: No flexibility in start and end times of work 
            - 1: Able to change start and end times of work 
        - Location Flexibility 
            - 0: Not able to work from home
            - 1: Able to work from home
- unemployed workers have unemployment durations of $t$


In [2]:
df=pd.read_stata('workfile.dta', columns=['hs_edu', 'col_edu', 'sex','employed', 'flexsched', 'flex_sched_score', 'flexloc', 'flex_loc_score', 'hrwage_r', 'dur'])

df=df[df['hs_edu']==1] # change based on which to estimate - label Figures, Output to CSV to match


In [3]:
def winsorize(data: pd.DataFrame, flex:str, n_flex:int, winsorized_wage:str, raw_wage:str, employed:str, percentile: float):
    nth_pctl = np.zeros(1)

    for k in range(2):
        tmp = data[data[flex]==k]
        nth = np.percentile(tmp[raw_wage],1)
        nth_pctl = np.append(nth_pctl, nth)

    data[winsorized_wage]=data[raw_wage]

    for k in range(2):
        data[winsorized_wage].iloc[(data[raw_wage]<nth_pctl[k+1]) & (data[flex]==k)]=nth_pctl[k+1] #k+1 because empty array initiates with zero

    return data[[winsorized_wage, raw_wage]].groupby([data[employed],data[flex]]).describe()

### Men

In [4]:
men = df[df['sex']=='male']
len(men)

966

In [5]:
winsorize(men, 'flexsched', 2, 'wage_flexsched', 'hrwage_r', 'employed', 1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data[winsorized_wage]=data[raw_wage]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0_level_0,Unnamed: 1_level_0,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
employed,flexsched,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
1.0,0.0,493.0,24.677324,11.845616,7.668274,16.219997,22.162113,29.195044,73.873962,493.0,24.639904,11.908799,0.25,16.219997,22.162113,29.195044,73.873962
1.0,1.0,434.0,27.636673,14.780142,6.300299,17.307499,23.953321,34.309273,73.873962,434.0,27.607347,14.826255,1.61341,17.307499,23.953321,34.309273,73.873962


In [6]:
winsorize(men, 'flexloc', 2, 'wage_flexloc', 'hrwage_r', 'employed', 1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data[winsorized_wage]=data[raw_wage]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0_level_0,Unnamed: 1_level_0,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
employed,flexloc,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
1.0,0.0,761.0,24.163544,11.82506,6.74047,15.673129,21.832258,28.845997,73.873962,761.0,24.129055,11.882587,0.25,15.673129,21.832258,28.845997,73.873962
1.0,1.0,166.0,34.792599,16.301811,10.784574,23.08881,31.249996,43.683587,73.873962,166.0,34.740051,16.392229,2.462371,23.08881,31.249996,43.683587,73.873962


### Women

In [7]:
women = df[df['sex']=='female']
len(women)

670

In [8]:
winsorize(women, 'flexsched', 2, 'wage_flexsched', 'hrwage_r', 'employed', 1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data[winsorized_wage]=data[raw_wage]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0_level_0,Unnamed: 1_level_0,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,wage_flexsched,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
employed,flexsched,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
1.0,0.0,313.0,17.393709,8.783751,5.40061,11.799999,15.865248,20.109999,73.873962,313.0,17.38505,8.796501,3.201211,11.799999,15.865248,20.109999,73.873962
1.0,1.0,301.0,20.83647,11.812701,5.423999,12.999998,16.744638,25.240246,73.873962,301.0,20.812151,11.848063,1.269,12.999998,16.744638,25.240246,73.873962


In [9]:
winsorize(women, 'flexloc', 2, 'wage_flexloc', 'hrwage_r', 'employed', 1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data[winsorized_wage]=data[raw_wage]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0_level_0,Unnamed: 1_level_0,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,wage_flexloc,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r,hrwage_r
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
employed,flexloc,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
1.0,0.0,463.0,16.528254,7.635334,5.201336,11.432499,14.999998,19.999998,72.115242,463.0,16.509394,7.667031,1.269,11.432499,14.999998,19.999998,72.115242
1.0,1.0,151.0,26.90896,13.798091,7.574075,16.573148,23.885899,33.079166,73.873962,151.0,26.90151,13.80878,6.749999,16.573148,23.885899,33.079166,73.873962


## Model Independent Functions

In [10]:
def lognormpdf(x: np.array, μ: float, σ: float):
    """
    Calculates lognormal pdf without stats packages
    """
    
    denom = x * σ * np.sqrt(2*np.pi)
    exp_num = -(np.log(x)-μ)**2
    exp_denom = 2 * σ * σ
    num = np.exp(exp_num/exp_denom)
    
    return num/denom

In [11]:
def lognormsf(x: np.array, μ: float, σ: float):
    """
    Calculates lognormal cdf with scipy.stats normal cdf
    """
    
    lnx = np.log(x)
    num = lnx - μ
    denom = σ
    
    return 1-stats.norm.cdf(num/denom)

In [12]:
def bootstrap(data: pd.DataFrame, n_samples:int):
    """
    Thanks, Caleb
    """
    bootstrapped_sample_list = []
    
    for n in range(n_samples):
        nth_sample = data.sample(frac=1, replace=True)
        bootstrapped_sample_list.append(nth_sample)
    
    return bootstrapped_sample_list

In [13]:
def std_error(values):
    """
    Calculates the standard error (standard deviation of values divided by square root of the number of values) of some values 
    """
    
    stderr = np.std(values) / np.sqrt(len(values))

    return stderr

In [14]:
def fit_stats(values):
    """
    Returns mean and standard error from a list of values 
    
    Functions:
    - std_error(values)
    """
    
    mean = np.mean(values)
    
    stderr = std_error(values)
    
    return [mean, stderr]
#     return print("Boostrapped value ", str(mean), "\nStandard error    ", str(stderr),"\n")

## Model Functions

In [64]:
def hazard(res_wage: np.array, p_f: np.array, γ_f: np.array, c_f: np.array, μ_f: np.array, σ_f: np.array, λ: float):
    """
    Calculates the hazard rate out of unemployment 
    
    Inputs
    - res_wage: FX1 array of observed minimum wage at each flexibility level
    - p_f: Fx1 array of probability of each flexibility level
    - γ_f: Fx1 array of utility weight of flexibility    
    - c_f: Fx1 array of cost of providing flexibility    
    - μ_f: Fx1 array of location parameter of the log-normal wage distribution for each flexibility level
    - σ_f: Fx1 array of scale parameter of the log-normal wage distribution for each flexibility level
    - λ: arrival rate of offer
    
    Functions
    - lognormsf(x: np.array, μ: float, σ: float)
    """
    
    prob_sum = 0
    
    if len(p_f)!=len(c_f):
        return print("Length of p_f and c_f do not match.")
    elif len(p_f)!=len(γ_f):
        return print("Length of p_f and γ_f do not match.")
    elif len(γ_f)!=len(c_f):
        return print("Length of γ_f and c_f do not match.")
    else:
        for f in range(len(c_f)):
            prob_sum += p_f[f] * lognormsf( ( res_wage[f] + c_f[f] - γ_f[f]) , μ_f[f], σ_f[f] )

    return λ*prob_sum

In [16]:
def Pr_wage_given_match(data: pd.DataFrame, flex: str, wage: str, res_wage: np.array, p_f: np.array, γ_f: np.array, 
                        c_f: np.array, μ_f: np.array, σ_f: np.array, α: float):
    """
    Calculates probability of a wage draw conditional on a match being formed 
    
    Inputs
    - data: DataFrame
    - flex: string for name of flexibility column
    - wage: string for name of wage column
    - res_wage: array of observed minimum wage of flexibility f
    - p_f: Fx1 array of probability of each level of flexibility
    - γ_f: Fx1 array of utility weight of flexibility    
    - c_f: Fx1 array of cost of providing flexibility    
    - μ_f: Fx1 array of location parameter of the log-normal wage distribution for flexibility f
    - σ_f: Fx1 array of scale parameter of the log-normal wage distribution for flexibility f
    - α: bargaining parameter    
    
    Functions
    - lognormpdf(x: np.array, μ: float, σ: float)
    - lognormsf(x: np.array, μ: float, σ: float)
    """
    employed_indiv = np.zeros(1) #sets first entry to zero 

    for f in range(len(c_f)):
        tmp = data[data[flex]==f]
        
        g_f = ( 1/α ) * lognormpdf( ( ( tmp[wage] + α*c_f[f] - (1-α)*(res_wage[f] - γ_f[f]) ) ), μ_f[f], σ_f[f] )
        
        G_tilde_f = lognormsf( ( res_wage[f] + c_f[f] - γ_f[f] ), μ_f[f], σ_f[f] )
        
        divide_thing = p_f[f] * (g_f/G_tilde_f)
        employed_indiv = np.append(employed_indiv, divide_thing)
    
    return employed_indiv[1:] #removes first entry 


In [41]:
def log_L(data: pd.DataFrame, flex: str, wage: str, dur: str, γ_f: np.array, c_f: np.array, μ_f: np.array, σ_f: np.array, 
          α: float, λ: float, η: float):
    """
    Calculates the log-likelihood for the model
    
    Inputs
        Data:
        - data: DataFrame
        - flex: string for column of flexibility index (k)
        - wage: string for column of wage data 
        - dur: string for unemployment duration data
        Parameters:
        - γ_f: Fx1 array of utility weight of flexibility    
        - c_f: Fx1 array of cost of providing flexibility    
        - μ_f: Fx1 array of location parameter of the log-normal wage distribution for flexibility f
        - σ_f: Fx1 array of scale parameter of the log-normal wage distribution for flexibility f
        - α: bargaining parameter
        - λ: arrival rate of offer
        - η: termination rate
    
    Functions
    - hazard(res_wage: np.array, p_f: np.array, γ_f: np.array, c_f: np.array, μ_f: np.array, σ_f: np.array, λ: float)
    - Pr_wage_given_match(data: pd.DataFrame, flex: str, wage: str, res_wage: np.array, p_f: np.array, γ_f: np.array, 
                          c_f: np.array, μ_f: np.array, σ_f: np.array, α: float)
    """
    
    # Min Wage and Probability of Flex from data
    res_wage = data[wage].groupby(data[flex]).min().array
    p_f = data[flex].value_counts(normalize=True) 
    
    # Elements in LogL function
    N_log_h = data.count() * np.log( hazard(res_wage, p_f, γ_f, c_f, μ_f, σ_f, λ) )
    N_log_h_plus_η = data.count() * np.log( hazard(res_wage, p_f, γ_f, c_f, μ_f, σ_f, λ) + η )
    
    Nu_log_η = data[dur].count() * np.log(η)    
    
    unempl_data = hazard(res_wage, p_f, γ_f, c_f, μ_f, σ_f, λ) * np.sum(data[dur])
    
    empl_data = np.sum( np.log( Pr_wage_given_match( data, flex, wage, res_wage, p_f, γ_f, c_f, μ_f, σ_f, α ) ) )
    
    logL = -(N_log_h - N_log_h_plus_η + Nu_log_η - unempl_data + empl_data)
    
    return logL[0]

## Initial Conditions

In [49]:
# Provided by Data
# μ_f = data[wage].groupby(data[flex]).mean().array
# σ_f = data[wage].groupby(data[flex]).std().array

# Guesses
γ_f = np.array([0,1])
c_f = np.array([0, -1])
α = 0.5
λ = 0.01
η = 0.001

In [50]:
μ_f = women['wage_flexsched'].groupby(women['flexsched']).mean().array
σ_f = women['wage_flexsched'].groupby(women['flexsched']).std().array

log_L(women, 'flexsched', 'wage_flexsched', 'dur', γ_f, c_f, μ_f, σ_f, α, λ, η)

4841.405549425466

## Estimation Functions

- Unidentified. Returning initial guesses

In [59]:
def est(data: pd.DataFrame, flex: str, wage: str, dur: str):
    """
    Estimate parameter values for γ_f, c_f, μ_f, σ_f, λ, η
    
    Inputs
    - data: DataFrame
    - flex: string for column of flexibility index (f)
    - wage: string for column of wage data 
    - dur: string for column of unemployment duration data
    
    Functions
    - log_L(data, flex, wage, dur, γ_f, c_f, μ_f, σ_f, α, λ, η)
    """

    μ_f = data[wage].groupby(data[flex]).mean().array
    σ_f = data[wage].groupby(data[flex]).std().array
    
    params = np.array([γ_f[0], γ_f[1], c_f[0], c_f[1], μ_f[0], μ_f[1], σ_f[0], σ_f[1], λ, η])
    
    Bounds = ((-99,99), (-99,99), (-99,99), (-99,99), (0,99), (0,99), (0,99), (0,99), (0,99), (0,99))
    
    logL_opt = lambda x: log_L(data, flex, wage, dur, np.array([x[0],x[1]]), np.array([x[2],x[3]]),
                               np.array([x[4],x[5]]), np.array([x[6],x[7]]),
                               α, x[8], x[9])

    est = minimize(logL_opt, params, bounds=Bounds) #, method='Nelder-Mead', options={'maxiter':1000})#, bounds=Bounds)
    
    return [est.fun, est.x]

In [60]:
est(men, 'flexsched', 'wage_flexsched', 'dur')

  Nu_log_η = data[dur].count() * np.log(η)


[7749.200442901769,
 array([ 0.00000000e+00,  1.00000000e+00,  0.00000000e+00, -1.00000000e+00,
         2.46773243e+01,  2.76366749e+01,  1.18456162e+01,  1.47801416e+01,
         1.00000000e-02,  1.00000000e-03])]

In [57]:
def est_0(data: pd.DataFrame, flex: str, wage: str, dur: str):
    """
    Estimate parameter values for γ_f, c_f, μ_f, σ_f, λ, η where γ_0 = c_0 = 0
    
    Inputs
    - data: DataFrame
    - flex: string for column of flexibility index (f)
    - wage: string for column of wage data 
    - dur: string for column of unemployment duration data
    
    Functions
    - log_L(data, flex, wage, dur, γ_f, c_f, μ_f, σ_f, α, λ, η)
    """

    μ_f = data[wage].groupby(data[flex]).mean().array
    σ_f = data[wage].groupby(data[flex]).std().array
    
    params = np.array([γ_f[1], c_f[1], μ_f[0], μ_f[1], σ_f[0], σ_f[1], λ, η])
    
    Bounds = ((-99,99), (-99,99), (0,99), (0,99), (0,99), (0,99), (0,99), (0,99))
    
    logL_opt = lambda x: log_L(data, flex, wage, dur, np.array([0, x[0]]), np.array([0,x[1]]),
                               np.array([x[2],x[3]]), np.array([x[4],x[5]]),
                               α, x[6], x[7])

    est = minimize(logL_opt, params, bounds=Bounds) #, method='Nelder-Mead', options={'maxiter':1000})#, bounds=Bounds)
    
    return [est.fun, est.x]

In [58]:
est_0(men, 'flexsched', 'wage_flexsched', 'dur')

  Nu_log_η = data[dur].count() * np.log(η)
  df = fun(x) - f0


[7749.200442901769,
 array([ 1.00000000e+00, -1.00000000e+00,  2.46773243e+01,  2.76366749e+01,
         1.18456162e+01,  1.47801416e+01,  1.00000000e-02,  1.00000000e-03])]

In [61]:
def est_gamma(data: pd.DataFrame, flex: str, wage: str, dur: str):
    """
    Estimate parameter values for γ_f, μ_f, σ_f, λ, η where γ_0 = 0 and c_f = 0
    
    Inputs
    - data: DataFrame
    - flex: string for column of flexibility index (f)
    - wage: string for column of wage data 
    - dur: string for column of unemployment duration data
    
    Functions
    - log_L(data, flex, wage, dur, γ_f, c_f, μ_f, σ_f, α, λ, η)
    """

    μ_f = data[wage].groupby(data[flex]).mean().array
    σ_f = data[wage].groupby(data[flex]).std().array
    
    params = np.array([γ_f[1], μ_f[0], μ_f[1], σ_f[0], σ_f[1], λ, η])
    
    Bounds = ((-99,99), (0,99), (0,99), (0,99), (0,99), (0,99), (0,99))
    
    logL_opt = lambda x: log_L(data, flex, wage, dur, np.array([0, x[0]]), np.array([0,0]),
                               np.array([x[1],x[2]]), np.array([x[3],x[4]]),
                               α, x[5], x[6])

    est = minimize(logL_opt, params, bounds=Bounds) #, method='Nelder-Mead', options={'maxiter':1000})#, bounds=Bounds)
    
    return [est.fun, est.x]

In [62]:
est_gamma(men, 'flexsched', 'wage_flexsched', 'dur')

  Nu_log_η = data[dur].count() * np.log(η)


[7759.918462154071,
 array([1.00000000e+00, 2.46773243e+01, 2.76366749e+01, 1.18456162e+01,
        1.47801416e+01, 1.00000000e-02, 1.00000000e-03])]

In [63]:
log_L(men, 'flexsched', 'wage_flexsched', 'dur', γ_f, c_f, μ_f, σ_f, α, λ, η)

7224.336077000613