# Problem set 2
Before we start working on some exercises we will briefly introduce two concepts in Python. First, importing and exporting data. Second, using functions. If you are already familiar
with these features, you can skip the next two sections and jump directly to the exercises.

First, import all necessary packages. We have made a .py file that we will use as a "toolbox". We will fill this toolbox with functions, that we will use as we progress through the course. Exactly how you structure this toolbox is up to you (if you i.e. want to turn it into a class).

In [158]:
import numpy as np
from numpy import linalg as la
import pandas as pd
from io import StringIO
from tabulate import tabulate
from matplotlib import pyplot as plt

# Import this weeks LinearModels .py file
import LinearModelsWeek2_ante as lm
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Importing and exporting data in Python
The easiest way to import data into an numpy array is using a .txt file. Normally we specify a path to the text file, but we will create a fake one to illustrate.

In [159]:
# Create a fake file for easy use.
fake_file = StringIO("0 1\n 2 3")
print(f"Fake file looks like this: \n {fake_file.getvalue()}")
print()

# Load the fake txt file into a numpy array.
data = np.loadtxt(fake_file)
print(f'Loaded into a numpy array, we get the following {type(data)}: \n {data}')

Fake file looks like this: 
 0 1
 2 3

Loaded into a numpy array, we get the following <class 'numpy.ndarray'>: 
 [[0. 1.]
 [2. 3.]]


Sadly, there is no direct way to load an excel sheet into numpy. The easiest solution is to use pandas as an intermediate.

In [160]:
# We save the fake file we created earlier as an excel file, 
# so that we can illustrate how to import using excel.
to_export = pd.DataFrame(data)
to_export.to_excel('test_file.xlsx', header=None, index=None)

# Its important to note that Pandas will treat the first row as a header. If there is no header,
# this needs to be specified. There are also alot of extra options to load specific sheets, or
# only parts of the sheets and tons of extra options.
df_import = pd.read_excel('test_file.xlsx', header=None)
np_array = df_import.to_numpy()
print(np_array)

[[0 1]
 [2 3]]


### Exporting Data
To save a numpy array as a .txt file is easy:

In [161]:
np.savetxt('real_file.txt', np_array)

*If one has large numpy arrays and wants to store them efficiently, they can be saved as a binary .npy files. Such files are not compatible with other programs.*

## Exercises with FE --- Within-Groups Estimation

The exercise takes up the union membership example from the cover sheet. The data set WAGEPAN.TXT contains information about 545 men who worked every year from 1980 to 1987 in the US. The variables of interest are


| Variable | Content |
|-|-|
| nr | Variable that identifies the individual  |
| year | Year of observation |
| Black | Black |
| Hisp | Hispanic |
| Educ | Years of schooling |
| Exper | Years since left school |
| Expersq | Exper2 |
| Married | Marital status |
| Union | Union membership |
| Lwage | Natural logarithm of hourly wages |

Consider the following wage equation:

$$
\begin{align}
\log\left(wage_{it}\right) & =\beta_{0}+\beta_{1}\textit{exper}_{it}+\beta_{2}\textit{exper}_{it}^{2}+\beta_{3}\textit{union}_{it}+\beta_{4}\textit{married}_{i} +\beta_{5}\textit{educ}_{i}+\beta_{6}\textit{hisp}_{i}+\beta_{7}\textit{black}_{i}+c_{i}+u_{it} \tag{1}
\end{align}
$$

Note that *educ*, *hisp*, and *black* are time-invariant variables.

# Exercises

Start by loading the data. Some of this has been done for you already. Since we are working with panels, we need to know how many persons there are and how many time periods we observe them. Since we operate using a balanced panel, this makes our life a little easier.

In [162]:
data = np.loadtxt('wagepan.txt', delimiter=',')
id_array = np.array(data[:, 0])

In [163]:
# First, import the data into numpy. 
# Data should load the .txt file.
data = np.loadtxt('wagepan.txt', delimiter=',')
id_array = np.array(data[:, 0])

# Count how many persons we have. This returns a tuple with the unique IDs,
# and the number of times each person is observed.
unique_id = np.unique(id_array, return_counts=True)
N = unique_id[0].size
T = int(unique_id[1].mean())
year = np.array(data[:, 1], dtype=int)

In [164]:
year

array([1980, 1981, 1982, ..., 1985, 1986, 1987])

In [165]:
data.size

43600

In [166]:
data[:, 1:8].shape[0]

4360

In [167]:
N*T

4360

In [168]:
constant_col = np.ones(shape=N*T)

In [169]:
constant_col.shape

(4360,)

In [170]:
y =  data[:, 8]

In [171]:
y.shape

(4360,)

The table above does not correspond 1:1 with the text file. The data has 10 columns. Named from 0 to 9. Here is a variable describtion:
- Column 0: ID
- Column 1: Year
- Column 2: Black
- Column 3: Experience
- Column 4: Hispanic
- Column 5: Married
- Column 6: Education
- Column 7: Union
- Column 8: ln wage
- Column 9: Experience sqr

In [172]:
# Load the rest of the data into arrays.
y =  data[:, 8].reshape(-1,1)

# x needs to have a constant vector in the first row.
# How would you add this? x should have the shape of (n*t, 8).
x =  np.column_stack((np.ones(shape=N*T), data[:,2], data[:,4], data[:,6], data[:,3], data[:,9], data[:,5], data[:,7]))

# Lets also make some variable names
label_y = 'Log wage'
label_x = [
    'Constant', 
    'Black', 
    'Hispanic', 
    'Education', 
    'Experience', 
    'Experience sqr', 
    'Married', 
    'Union'
]

In [173]:
x.shape

(4360, 8)

## FE Questions
### FE (a):
- **Estimate (1) by pooled OLS,** thus considering for the moment the unobserved components of (q) as one (composite) error term $v_{it}=c_{i}+u_{it}$. 
- What assumptions are made about $E\left[c_{i}\mathbf{x}_{it}\right]$ and $E\left[u_{it}\mathbf{x}_{it}\right]$ when justifying this estimation approach?

In [174]:
est1 = lm.estimate(y=y, x=x, N=N, T=T)
est1

{'b_hat': array([[-0.03470569],
        [-0.14384171],
        [ 0.01569798],
        [ 0.09938779],
        [ 0.08917907],
        [-0.00284866],
        [ 0.10766558],
        [ 0.18007257]]),
 'se': array([[0.064569  ],
        [0.0235595 ],
        [0.02081119],
        [0.0046776 ],
        [0.01011105],
        [0.00070736],
        [0.01569647],
        [0.01712053]]),
 'sigma': array([[0.2311144]]),
 't_values': array([[-0.5374978 ],
        [-6.10546464],
        [ 0.75430479],
        [21.2476231 ],
        [ 8.81996235],
        [-4.02715536],
        [ 6.85922097],
        [10.51793046]]),
 'R2': array([[0.18658652]]),
 'cov': array([[ 4.16915530e-03, -9.45025729e-05, -2.94576887e-04,
         -2.58374638e-04, -2.39464248e-04,  1.00552999e-05,
          1.02919787e-04, -4.58012239e-05],
        [-9.45025729e-05,  5.55050201e-04,  8.34957716e-05,
          4.21616792e-06, -8.96584601e-06,  2.49127666e-07,
          5.76650246e-05, -4.81895390e-05],
        [-2.94576887e-04, 

In [175]:
lm.print_table(results=est1, labels=(label_y, label_x))

Results
Dependent variable: Log wage

                       Beta           Se    t-values
--------------  -----------  -----------  ----------
Constant        -0.0347057   0.064569      -0.537498
Black           -0.143842    0.0235595     -6.10546
Hispanic         0.015698    0.0208112      0.754305
Education        0.0993878   0.0046776     21.2476
Experience       0.0891791   0.010111       8.81996
Experience sqr  -0.00284866  0.000707362   -4.02716
Married          0.107666    0.0156965      6.85922
Union            0.180073    0.0171205     10.5179
R² = 0.187
σ² = 0.231


Use `print_table` and you should get a table that looks like this:

Pooled OLS <br>
Dependent variable: Log wage <br>

|                |    Beta |     Se |   t-values |
|----------------|---------|--------|------------|
| Constant       | -0.0347 | 0.0646 |    -0.5375 |
| Black          | -0.1438 | 0.0236 |    -6.1055 |
| Hispanic       |  0.0157 | 0.0208 |     0.7543 |
| Education      |  0.0994 | 0.0047 |    21.2476 |
| Experience     |  0.0892 | 0.0101 |     8.8200 |
| Experience sqr | -0.0028 | 0.0007 |    -4.0272 |
| Married        |  0.1077 | 0.0157 |     6.8592 |
| Union          |  0.1801 | 0.0171 |    10.5179 |
R² = 0.187 <br>
σ² = 0.231

### FE (b):
- Within transform the data using the `perm` function. What happens to *educ, hisp, and black* and $x_{it1}\equiv1$ when the data are within transformed? 
- What is the rank of the within transformed $\mathbf{X}$ matrix? Why?

$\mathbf{Q}_T:=\mathbf{I}_T-\left(\begin{array}{ccc}1 / T & \ldots & 1 / T \\ \vdots & \ddots & \vdots \\ 1 / T & \ldots & 1 / T\end{array}\right)_{T \times T}$

In [176]:
def demeaning_matrix(T):
    Q_T = np.eye(T)-np.tile(1/T,T) # Fill in
    return Q_T

In [177]:
Q_T = demeaning_matrix(T)
y_demean = lm.perm(Q_T, y) # Fill in
x_demean = lm.perm(Q_T,x) # Fill in

# Check rank of demeaned matrix, and return its eigenvalues.

In [178]:
def check_rank(x: np.ndarray):
    rank = np.linalg.matrix_rank(x)
    return f'The matrix is of rank {rank}'

In [179]:
print('Matrix y', check_rank(y_demean), '\n\nMatrix x', check_rank(x_demean))

Matrix y The matrix is of rank 1 

Matrix x The matrix is of rank 4


---

Matrix $\bold{x}$ is now of rank 4 as time-invariant variables have been removed.

---

An alternative to the perm function is to use panda dataframes to group the data. This has been done for the *entire* dataset below. If you want to use this method you'll need to make sure to pull out the correct variables when computing first difference and fixed effects estimators.

In [180]:
# load the data using pandas
pddat = pd.read_csv('wagepan.txt', delimiter=",", header=None)
pddat.columns = ["ID", "year", "black", "exp", "hisp", "mar", "educ", "union", "logwage", "expsq"]

# first difference the data
pddat = pddat.sort_values(["ID", "year"]) # important to ensure years are sorted correctly
pddat_diff = pddat.groupby("ID").diff().dropna() # take differences and drop NaNs due to differencing
datdiff = pddat_diff.to_numpy() # turn into numpy array

# demean data
pddat_demean=pddat-pddat.groupby("ID").transform('mean')
datdemean = pddat_demean.to_numpy() # turn into numpy array

In [181]:
# Can look at the dataframe to see where the relevant variables are, note dimensions
pddat_diff

Unnamed: 0,year,black,exp,hisp,mar,educ,union,logwage,expsq
1,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.655520,3.0
2,1.0,0.0,1.0,0.0,0.0,0.0,-1.0,-0.508598,5.0
3,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.088752,7.0
4,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.134912,9.0
5,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.131766,11.0
...,...,...,...,...,...,...,...,...,...
4355,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.759397,15.0
4356,1.0,0.0,1.0,0.0,0.0,0.0,1.0,-0.379336,17.0
4357,1.0,0.0,1.0,0.0,0.0,0.0,-1.0,0.553419,19.0
4358,1.0,0.0,1.0,0.0,0.0,0.0,1.0,-0.020068,21.0


In [182]:
ydiff = None # Fill in
xdiff = None # Fill in

In [183]:
# Can look at the dataframe to see where the relevant variables are, note dimensions
pddat_demean

Unnamed: 0,ID,black,educ,exp,expsq,hisp,logwage,mar,union,year
0,,0.0,0.0,-3.5,-24.5,0.0,-0.058112,0.000,-0.125,-3.5
1,,0.0,0.0,-2.5,-21.5,0.0,0.597408,0.000,0.875,-2.5
2,,0.0,0.0,-1.5,-16.5,0.0,0.088810,0.000,-0.125,-1.5
3,,0.0,0.0,-0.5,-9.5,0.0,0.177561,0.000,-0.125,-0.5
4,,0.0,0.0,0.5,-0.5,0.0,0.312473,0.000,-0.125,0.5
...,...,...,...,...,...,...,...,...,...,...
4355,,0.0,0.0,-0.5,-13.5,0.0,0.209697,0.375,-0.375,-0.5
4356,,0.0,0.0,0.5,3.5,0.0,-0.169639,0.375,0.625,0.5
4357,,0.0,0.0,1.5,22.5,0.0,0.383780,0.375,-0.375,1.5
4358,,0.0,0.0,2.5,43.5,0.0,0.363713,0.375,0.625,2.5


In [184]:
y_dot = None # Fill in
x_dot = None # Fill in

### FE (c):
- Estimate (1) on within transformed data (make sure that the employed $\mathbf{\ddot{X}}$ has full rank - drop columns if necessary). 
- How big is the union premium according to the estimate from the FE model? Compare this with the estimate that you calculated from the pooled OLS regression. What does this suggest about $E\left[union_{it}c_{i}\right]$?

In [185]:
x_demean2=x_demean[:,4:]

In [186]:
x_demean2.shape

(4360, 4)

In [187]:
check_rank(x_demean2)

'The matrix is of rank 4'

In [188]:
labels_x2 = ['Experience', 'Experience sqr', 'Married', 'Union']

In [189]:
# Fill in
# NB you can use either y_demean, x_demean (from perm function) or y_dot, x_dot (from dataframes) here

est2=lm.estimate(y=y_demean, x=x_demean2, N=N, T=T, transform='fe')
est2
lm.print_table(results=est2, labels=(label_y, labels_x2))

Results
Dependent variable: Log wage

                       Beta           Se    t-values
--------------  -----------  -----------  ----------
Experience       0.116847    0.00841968     13.8778
Experience sqr  -0.00430089  0.000605274    -7.10569
Married          0.0453033   0.0183097       2.47428
Union            0.0820871   0.0192907       4.25526
R² = 0.178
σ² = 0.123


You should get a table that looks like this:

FE regression<br>
Dependent variable: Log wage

|                |    Beta |     Se |   t-values |
|----------------|---------|--------|------------|
| Experience     |  0.1168 | 0.0084 |    13.8778 |
| Experience sqr | -0.0043 | 0.0006 |    -7.1057 |
| Married        |  0.0453 | 0.0183 |     2.4743 |
| Union          |  0.0821 | 0.0193 |     4.2553 |
R² = 0.178 <br>
σ² = 0.123

## FD Questions
### FD (a):
- Construct $\mathbf{D}$ and use the procedure `perm` $(\mathbf{D},\mathbf{x})$ to compute first differences of the elements of $\mathbf{y}$ and $\mathbf{x}$. 
- What happens to *educ, hisp* and *black* and $x_{it1}\equiv1$ when the data are transformed into first differences? What is the rank of the first differenced $\mathbf{x}$-matrix? Why?

$\mathbf{D}:=\left(\begin{array}{cccccc}-1 & 1 & 0 & \ldots & 0 & 0 \\ 0 & -1 & 1 & & 0 & 0 \\ \vdots & & & \ddots & & \vdots \\ 0 & 0 & 0 & \ldots & -1 & 1\end{array}\right)_{T-1 \times T}$

In [190]:
def fd_matrix(T):
    D_T = -np.eye(T)+np.eye(T,k=1) # Fill in
    D_T = D_T[:-1]
    return D_T

In [191]:
D_T = fd_matrix(T)
Z = lm.perm(D_T,x)
Z

array([[ 0.,  0.,  0., ...,  3.,  0.,  1.],
       [ 0.,  0.,  0., ...,  5.,  0., -1.],
       [ 0.,  0.,  0., ...,  7.,  0.,  0.],
       ...,
       [ 0.,  0.,  0., ..., 19.,  0., -1.],
       [ 0.,  0.,  0., ..., 21.,  0.,  1.],
       [ 0.,  0.,  0., ..., 23.,  0.,  0.]])

In [192]:
test=-np.eye(T)+np.eye(T,k=1)
test[:-1].shape

(7, 8)

### FD (b):
- **Estimate (1) in first differences.** How big is the union premium according to the estimate from this model? 
- Compare the FD estimate with the estimate that you calculated from the FE regression. Is there a difference? If yes, what (if anything) can we conclude based on this finding?

In [193]:
# Transform the data.
D_T = fd_matrix(T)
y_diff = lm.perm(D_T,y) # Fill in
x_diff = lm.perm(D_T,x) # Fill in
x_diff=x_diff[:,4:]
print(check_rank(x_diff))
# Again, check rank condition.



# Estimate on transformed data
# NB you can use either y_diff, x_diff (from perm function) or ydiff, xdiff (from dataframes) here

The matrix is of rank 4


In [194]:
est3=lm.estimate(y=y_diff, x=x_diff, N=N, T=T-1)
lm.print_table(results=est3, labels=(label_y, labels_x2))

Results
Dependent variable: Log wage

                       Beta          Se    t-values
--------------  -----------  ----------  ----------
Experience       0.11575     0.0195867      5.90964
Experience sqr  -0.00388237  0.00138632    -2.80049
Married          0.0381377   0.0229283      1.66335
Union            0.0427878   0.0196575      2.17667
R² = 0.004
σ² = 0.196


You should get a table that look like this:

FD regression <br>
Dependent variable: Log wage

|                |    Beta |     Se |   t-values |
|----------------|---------|--------|------------|
| Experience     |  0.1158 | 0.0196 |     5.9096 |
| Experience sqr | -0.0039 | 0.0014 |    -2.8005 |
| Married        |  0.0381 | 0.0229 |     1.6633 |
| Union          |  0.0428 | 0.0197 |     2.1767 |
R² = 0.004 <br>
σ² = 0.196

## Exercise comparing FE and FD
### Question FE v. FD (a):
**Test for serial correlation in the errors using an auxilliary AR(1) model**, to test assumption FD.3, where the errors $e_{it} = \Delta u_{it}$ should be serially uncorrelated.

We can easily test this assumption given the OLS residuals from the FD version of equation (1). Run the regression (note that you will loose data for
the *two* first periods)
\begin{equation}
\hat{e}_{it}=\rho\hat{e}_{it-1}+error_{it},\quad t=\color{red}{3},\dotsc,T,\quad i=1,\dotsc,N\tag{2}
\end{equation}

Do you find any evidence for serial correlation? Does FD.3 seem appropriate? And why don't we include an intercept in this auxilliary equation?

*Note:* Under FE.3, the idiosyncratic errors $u_{it}$
are uncorrelated. However, FE.3 implies that the $e_{it}$'s are autocorrelated. In fact, of the $u_{it}$'s are serially uncorrelated to beging with, corr $\left(e_{it},e_{it-1}\right)=-0.5$. (Check!) This test is of course only valid if the explanatory variables are strictly exogenous!

*Hint:* You can use the `perm` function to lag
the error term variable. Consider the following; 

$$
{\begin{bmatrix}
1 & 0 & 0 & \cdots & 0 & 0\\
0 & 1 & 0 & \cdots & 0 & 0\\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\
0 & 0 & 0 & \cdots & 1 & 0
\end{bmatrix}}_{T-1\times T}\times{\begin{bmatrix}y_{1}\\
y_{2}\\
\vdots\\
y_{T}
\end{bmatrix}}_{T \times 1}={\begin{bmatrix}y_{1}\\
y_{2}\\
\vdots\\
y_{T - 1}
\end{bmatrix}}_{T - 1\times 1}
$$


In [263]:
def serial_corr(y, x, T, year):
    b_hat = lm.est_ols(y,x)
    e = y-x@b_hat
    fd_t=np.eye(T)
    fd_t=fd_t[:-1]
    e_l = lm.perm(fd_t,e)
    reduced_year = year[year != np.unique(year).min()] #Remove the first year
    e = e[reduced_year != np.unique(reduced_year).min()] #Remove the second year
    # e_l are the lagged values of e.
    return lm.estimate(e, e_l,N=N,T=T-1)

In [264]:
corr_result = serial_corr(y_diff, x_diff, T-1, year)

label_ye = 'OLS residual, e\u1d62\u209c'
label_e = ['e\u1d62\u209c\u208B\u2081']
title = 'Serial Correlation'
lm.print_table(
    (label_ye, label_e), corr_result, 
    title='Serial Correlation', floatfmt='.4f'
)

Serial Correlation
Dependent variable: OLS residual, eᵢₜ

          Beta      Se    t-values
-----  -------  ------  ----------
eᵢₜ₋₁  -0.3961  0.0147    -27.0185
R² = 0.182
σ² = 0.143


*Hint:* Remember again to remove the first year after you have lagged the residuals. So in your estimations your residuals should exclude the years 1980 and 1981.

You should get a table that looks like this:

Serial Correlation <br>
Dependent variable: OLS residual, eᵢₜ

|       |    Beta |     Se |   t-values |
|-------|---------|--------|------------|
| eᵢₜ₋₁ | -0.3961 | 0.0147 |   -27.0185 |
R² = 0.182 <br>
σ² = 0.143

### Question FE v FD (b):

Test for strict exogeneity: Add a lead of the union variable, $union_{i,t+1}$ to the equation (1) (note that you will lose data from period $T$ , 1987) and estimate the model with *fixed effects* (i.e., you have to demean $union_{i,t+1}$ along with all the other variables and throw out time constant variables). Is $union_{i,t+1}$ significant? What does this imply for the strict exogeneity assumption?

*Hint:* To lead a variable, think along the same lines as in Question FE v FD (a)

In [None]:
# Fill in

The table should look something like this:
Exogeneity test <br>
Dependent variable: Log wage

|                |    Beta |     Se |   t-values |
|----------------|---------|--------|------------|
| Experience     |  0.1213 | 0.0100 |    12.1001 |
| Experience sqr | -0.0050 | 0.0008 |    -6.3579 |
| Married        |  0.0436 | 0.0209 |     2.0898 |
| Union          |  0.0757 | 0.0218 |     3.4784 |
| Union lead     |  0.0515 | 0.0223 |     2.3063 |
R² = 0.146<br>
σ² = 0.128

### Question FE v FD (b):
Add interactions on the form $d_{81}\cdot educ, d_{82}\cdot educ, ..., d_{87}\cdot educ$ and estimate the model with fixed effect. Has the return to education increased over time?

*Hint:* Remember that $educ_{i}$ doesn't vary over
time! Therefore we didn't use $educ$ in levels in the FE estimation.
However, if we suppose that the structural equation (4) contains a term $\sum_{s=2}^{T}\delta_{s}d_{s}educ_{i}$, it will be perfectly fine to within-transform these interactions since they vary over time (although in a highly structured manner - they equal
zero in all time periods but one, and then $educ$). Note that one
period is dropped for the within-transformation to work whereas the
levels term, $\beta_{5}educ_{i}$, is dropped to avoid producing a
constant row.

*Programming hint:* You want to append the dataset with a dummy matrix, that would look something like this:

$$
\begin{bmatrix}
0 & 0 & 0 & 0 & 0 & 0 & 0 \\
14 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 14 & 0 & 0 & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & 0 & 0 & 0 & 14 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 \\
9 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 9 & 0 & 0 & 0 & 0 & 0 \\
\vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots \\
0 & 0 & 0 & 0 & 0 & 0 & 9 \\
\end{bmatrix}
$$

This example shows our two first persons, that have 14 and 9 years of education respectively. This matrix can be created as a product of two matrices, what would they look like (a dummy matrix with onlyn ones can be creates using `np.eye` and `np.tile`)? Why is the first row for each person only zeros?

In [None]:

# Fill in

label_x_interactions = label_x_fe + [
    'E81', 'E82', 'E83', 'E84', 'E85', 'E86', 'E87'
]

You should get a table that looks like this:

FE with year interactions <br>
Dependent variable: Log wage

|                |    Beta |     Se |   t-values |
|----------------|---------|--------|------------|
| Experience     |  0.1705 | 0.0273 |     6.2462 |
| Experience sqr | -0.0060 | 0.0009 |    -6.9581 |
| Married        |  0.0475 | 0.0183 |     2.5925 |
| Union          |  0.0794 | 0.0193 |     4.1138 |
| E81            | -0.0010 | 0.0026 |    -0.4009 |
| E82            | -0.0062 | 0.0041 |    -1.5224 |
| E83            | -0.0114 | 0.0057 |    -2.0006 |
| E84            | -0.0136 | 0.0072 |    -1.8787 |
| E85            | -0.0162 | 0.0087 |    -1.8578 |
| E86            | -0.0170 | 0.0101 |    -1.6804 |
| E87            | -0.0167 | 0.0115 |    -1.4619 |
R² = 0.181 <br>
σ² = 0.123