Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the bootstrap really drawing randomly? #138

Closed
a-torgovitsky opened this issue Oct 23, 2019 · 42 comments
Closed

Is the bootstrap really drawing randomly? #138

a-torgovitsky opened this issue Oct 23, 2019 · 42 comments
Assignees
Labels
bug Something isn't working test-this Things that need to be tested

Comments

@a-torgovitsky
Copy link
Collaborator

MWE:

library("ivmte")
args <- list(data = AE,
             target = "att",
             m0 = ~ u,
             m1 = ~ u,
             ivlike = worked ~ morekids + samesex + morekids*samesex,
             propensity = morekids ~ samesex,
             point = TRUE,
             bootstraps = 20)
r <- do.call(ivmte, args)

The output looks suspiciously non-random to me.
For example, iterations 8,9, and 10 are all similar and much larger than the other ones.
Then so are 14, 15 and 16.

Maybe I'm just seeing patterns where there aren't any, but this should be double checked...

> library("ivmte")
> args <- list(data = AE,
+              target = "att",
+              m0 = ~ u,
+              m1 = ~ u,
+              ivlike = worked ~ morekids + samesex + morekids*samesex,
+              propensity = morekids ~ samesex,
+              point = TRUE,
+              bootstraps = 20)
> r <- do.call(ivmte, args)

LP solver: Gurobi ('gurobi')

Obtaining propensity scores...
Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...
Generating IV-like moments...
    Moment 1...
    Moment 2...
    Moment 3...
    Moment 4...

Point estimate of the target parameter: -0.09162913

Bootstrap iteration 1...
    Point estimate:-0.133147

Bootstrap iteration 2...
    Point estimate:-0.1327468

Bootstrap iteration 3...
    Point estimate:-0.1292102

Bootstrap iteration 4...
    Point estimate:-0.1326532

Bootstrap iteration 5...
    Point estimate:-0.1258843

Bootstrap iteration 6...
    Point estimate:-0.1325626

Bootstrap iteration 7...
    Point estimate:-0.1418057

Bootstrap iteration 8...
    Point estimate:-0.09150538

Bootstrap iteration 9...
    Point estimate:-0.09137915

Bootstrap iteration 10...
    Point estimate:-0.09069669

Bootstrap iteration 11...
    Point estimate:-0.1506921

Bootstrap iteration 12...
    Point estimate:-0.1328115

Bootstrap iteration 13...
    Point estimate:-0.1032961

Bootstrap iteration 14...
    Point estimate:-0.09230597

Bootstrap iteration 15...
    Point estimate:-0.08989487

Bootstrap iteration 16...
    Point estimate:-0.0909519

Bootstrap iteration 17...
    Point estimate:-0.1372912

Bootstrap iteration 18...
    Point estimate:-0.1328116

Bootstrap iteration 19...
    Point estimate:-0.1327473

Bootstrap iteration 20...
    Point estimate:-0.132745

--------------------------------------------------
Results 
--------------------------------------------------

Point estimate of the target parameter: -0.09162913

Number of bootstraps: 20

Bootstrapped confidence intervals (nonparametric):
    90%: [-0.1506921, -0.09069669]
    95%: [-0.1506921, -0.08989487]
    99%: [-0.1506921, -0.08989487]
p-value: 0
@a-torgovitsky a-torgovitsky added the test-this Things that need to be tested label Oct 23, 2019
@a-torgovitsky
Copy link
Collaborator Author

I guess I also find it very strange how not-centered the bootstrap distribution is with regards to the point estimate...seems weird

@jkcshea
Copy link
Owner

jkcshea commented Oct 24, 2019

I don't yet have an explanation for the not-centered bootstraps, but it does appear like the draws are random.
Here is the head of each data set from each bootstrap distribution.
The row numbers and data seem to suggest the resampling is working.

LP solver: Gurobi ('gurobi')

Obtaining propensity scores...
Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...
Generating IV-like moments...
    Moment 1...
    Moment 2...
    Moment 3...
    Moment 4...

Point estimate of the target parameter: -0.09162913

Bootstrap iteration 1...
       worked hours morekids samesex yob
24878       0     0        0       0  50
176122      1    48        0       0  49
69889       0     0        0       1  48
79158       1    35        1       0  50
197251      1    10        0       0  45
164615      0     0        0       1  46
    Point estimate:-0.133147

Bootstrap iteration 2...
       worked hours morekids samesex yob
51217       0     0        1       0  47
11771       1    99        0       0  47
40525       1    40        0       1  52
140642      0     0        0       0  44
164804      1     6        1       0  45
83796       0     0        1       0  46
    Point estimate:-0.1327468

Bootstrap iteration 3...
       worked hours morekids samesex yob
59999       1    40        0       1  49
184270      0     0        0       1  44
75519       0     0        1       0  45
128428      0     0        1       1  47
133152      0     0        1       1  47
175270      0     0        0       1  48
    Point estimate:-0.1292102

Bootstrap iteration 4...
       worked hours morekids samesex yob
139550      1    12        1       1  46
47927       1    30        0       1  46
110541      1    28        0       0  44
132399      0     0        0       0  48
49826       1    40        1       1  44
41545       1    24        0       0  48
    Point estimate:-0.1326532

Bootstrap iteration 5...
       worked hours morekids samesex yob
149038      1    24        0       1  49
42803       1     5        0       0  45
57340       1    20        0       1  49
16556       0     0        1       1  49
50123       1     2        0       0  53
160977      1    26        0       0  54
    Point estimate:-0.1258843

Bootstrap iteration 6...
       worked hours morekids samesex yob
95693       0     0        0       1  52
67412       0     0        0       0  50
159955      1    20        0       1  47
46488       1    46        0       0  45
13576       0     0        1       1  47
145078      1    15        0       0  47
    Point estimate:-0.1325626

Bootstrap iteration 7...
       worked hours morekids samesex yob
2388        1    23        1       1  45
12314       0     0        0       1  54
67216       0     0        0       1  49
86487       1    24        1       0  52
158633      1    35        0       1  48
105872      0     0        1       1  50
    Point estimate:-0.1418057

Bootstrap iteration 8...
       worked hours morekids samesex yob
68627       0     0        0       1  50
49733       0     0        1       1  52
20330       0     0        1       0  47
126841      0     0        1       1  51
138118      0     0        0       0  48
5507        1    35        0       0  46
    Point estimate:-0.09150538

Bootstrap iteration 9...
       worked hours morekids samesex yob
45793       0     0        1       1  51
128487      1    40        1       1  51
67497       1    20        1       1  45
56861       1    40        0       1  52
39611       0     0        0       1  48
166813      1    35        0       1  44
    Point estimate:-0.09137915

Bootstrap iteration 10...
       worked hours morekids samesex yob
58487       1    24        0       0  44
202918      0     0        0       1  52
156206      0     0        0       1  44
127119      0     0        1       1  47
58985       1     3        1       1  44
73676       1    40        0       0  45
    Point estimate:-0.09069669

Bootstrap iteration 11...
       worked hours morekids samesex yob
185595      0     0        0       1  45
134794      0     0        0       1  49
172585      0     0        0       1  48
198815      0     0        0       0  51
2213        1    40        0       1  46
181054      0     0        0       1  50
    Point estimate:-0.1506921

Bootstrap iteration 12...
       worked hours morekids samesex yob
132069      1    40        0       1  49
1108        0     0        1       0  45
37698       0     0        0       0  56
140701      1    24        0       0  47
108864      0     0        1       0  48
69530       1    40        0       0  50
    Point estimate:-0.1328115

Bootstrap iteration 13...
       worked hours morekids samesex yob
6889        0     0        1       0  49
8737        0     0        0       0  47
190877      1    20        0       1  44
125023      1    40        0       1  53
52278       0     0        1       1  46
116206      1    40        0       1  44
    Point estimate:-0.1032961

Bootstrap iteration 14...
       worked hours morekids samesex yob
2692        0     0        0       1  52
11749       0     0        0       0  46
145764      1    32        0       0  47
70416       0     0        1       1  48
190341      0     0        1       0  50
20891       0     0        0       1  48
    Point estimate:-0.09230597

Bootstrap iteration 15...
       worked hours morekids samesex yob
157770      1    20        0       1  49
190004      0     0        1       1  55
28442       0     0        0       0  51
155215      0     0        0       1  47
33004       0     0        0       1  49
59551       1    20        1       1  50
    Point estimate:-0.08989487

Bootstrap iteration 16...
       worked hours morekids samesex yob
50947       0     0        0       0  47
138564      1    38        0       1  51
2850        1    40        0       1  48
149978      1    40        0       1  50
47416       1    40        0       0  47
78418       0     0        1       1  45
    Point estimate:-0.0909519

Bootstrap iteration 17...
       worked hours morekids samesex yob
14508       0     0        0       0  46
94489       0     0        0       0  49
34328       0     0        0       1  50
196918      1    20        0       1  47
107993      1     2        0       0  53
74663       1    40        0       0  49
    Point estimate:-0.1372912

Bootstrap iteration 18...
       worked hours morekids samesex yob
191236      1    26        1       0  53
40886       1    34        0       0  47
68994       1    40        0       0  50
66318       1    10        0       1  46
188208      0     0        1       1  46
71492       0     0        1       0  47
    Point estimate:-0.1328116

Bootstrap iteration 19...
       worked hours morekids samesex yob
112070      1    30        0       0  46
33518       1    20        0       1  46
27855       0     0        0       1  45
189567      1    35        0       1  47
11277       0     0        0       1  48
57139       0     0        0       1  49
    Point estimate:-0.1327473

Bootstrap iteration 20...
       worked hours morekids samesex yob
116822      0     0        0       0  51
195976      1    40        0       0  45
14198       0     0        1       1  53
86404       1    37        0       1  51
34785       0     0        1       0  46
135760      0     0        0       0  48
    Point estimate:-0.132745

--------------------------------------------------
Results
--------------------------------------------------

Point estimate of the target parameter: -0.09162913

Number of bootstraps: 20

Bootstrapped confidence intervals (nonparametric):
    90%: [-0.1506921, -0.09069669]
    95%: [-0.1506921, -0.08989487]
    99%: [-0.1506921, -0.08989487]
p-value: 0

(No J-test output since we are just identified)

@a-torgovitsky
Copy link
Collaborator Author

How are the random draws being done?
Are you using a resampling function, or coding this yourself?

@jkcshea
Copy link
Owner

jkcshea commented Oct 24, 2019

I'm just using the sample function from R.

ivmte/R/mst.R

Lines 1631 to 1636 in ce1045d

while (b <= bootstraps) {
set.seed(bseeds[b])
bootIDs <- sample(x = seq(1, nrow(data)),
size = bootstraps.m,
replace = bootstraps.replace)
bdata <- data[bootIDs, ]

@a-torgovitsky
Copy link
Collaborator Author

Hmmm...can you try a couple of different things just to see whether things look different?

  1. Change the overall starting seed from which bseeds is generated.
  2. Instead of using bseeds[b], just set then seed to b
  3. Instead of using sample, just draw some integers.

@jkcshea
Copy link
Owner

jkcshea commented Oct 25, 2019

All three methods generate different results, but do exhibit runs of similar estimates.
It looks like the parameter estimate has a bimodal distribution, though---not sure if this raises any alarms, considering how large the AE data set is.
Most estimates seem close to either -0.13 or -0.09.
Occasionally we get -0.15.
But if the distribution is bimodal, then maybe seeing a run of similar estimates is not too unlikely.


Change the overall starting seed from which bseeds is generated.

I set the seed of the function to 13.

LP solver: Gurobi ('gurobi')

Obtaining propensity scores...
Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...
Generating IV-like moments...
    Moment 1...
    Moment 2...
    Moment 3...
    Moment 4...

Point estimate of the target parameter: -0.09162913

Bootstrap iteration 1...
    Point estimate:-0.1327445

Bootstrap iteration 2...
    Point estimate:-0.1351167

Bootstrap iteration 3...
    Point estimate:-0.1293295

Bootstrap iteration 4...
    Point estimate:-0.09226189

Bootstrap iteration 5...
    Point estimate:-0.09095696

Bootstrap iteration 6...
    Point estimate:-0.09225218

Bootstrap iteration 7...
    Point estimate:-0.1371488

Bootstrap iteration 8...
    Point estimate:-0.1327441

Bootstrap iteration 9...
    Point estimate:-0.1501948

Bootstrap iteration 10...
    Point estimate:-0.09098966

Bootstrap iteration 11...
    Point estimate:-0.1325365

Bootstrap iteration 12...
    Point estimate:-0.0922624

Bootstrap iteration 13...
    Point estimate:-0.09161383

Bootstrap iteration 14...
    Point estimate:-0.1164453

Bootstrap iteration 15...
    Point estimate:-0.1125589

Bootstrap iteration 16...
    Point estimate:-0.0907676

Bootstrap iteration 17...
    Point estimate:-0.1424088

Bootstrap iteration 18...
    Point estimate:-0.09137832

Bootstrap iteration 19...
    Point estimate:-0.1310918

Bootstrap iteration 20...
    Point estimate:-0.1357936

--------------------------------------------------
Results 
--------------------------------------------------

Point estimate of the target parameter: -0.09162913

Number of bootstraps: 20

Bootstrapped confidence intervals (nonparametric):
    90%: [-0.1501948, -0.09095696]
    95%: [-0.1501948, -0.0907676]
    99%: [-0.1501948, -0.0907676]
p-value: 0

Instead of using bseeds[b], just set then seed to b

> r <- do.call(ivmte, args)

LP solver: Gurobi ('gurobi')

Obtaining propensity scores...
Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...
Generating IV-like moments...
    Moment 1...
    Moment 2...
    Moment 3...
    Moment 4...

Point estimate of the target parameter: -0.09162913

Bootstrap iteration 1...
    Point estimate:-0.1292164

Bootstrap iteration 2...
    Point estimate:-0.09175264

Bootstrap iteration 3...
    Point estimate:-0.09090694

Bootstrap iteration 4...
    Point estimate:-0.1501942

Bootstrap iteration 5...
    Point estimate:-0.09188939

Bootstrap iteration 6...
    Point estimate:-0.1498179

Bootstrap iteration 7...
    Point estimate:-0.09158588

Bootstrap iteration 8...
    Point estimate:-0.1291656

Bootstrap iteration 9...
    Point estimate:-0.09163886

Bootstrap iteration 10...
    Point estimate:-0.1365796

Bootstrap iteration 11...
    Point estimate:-0.1326229

Bootstrap iteration 12...
    Point estimate:-0.132833

Bootstrap iteration 13...
    Point estimate:-0.1327524

Bootstrap iteration 14...
    Point estimate:-0.1294383

Bootstrap iteration 15...
    Point estimate:-0.132808

Bootstrap iteration 16...
    Point estimate:-0.09226911

Bootstrap iteration 17...
    Point estimate:-0.1313839

Bootstrap iteration 18...
    Point estimate:-0.09254063

Bootstrap iteration 19...
    Point estimate:-0.1328078

Bootstrap iteration 20...
    Point estimate:-0.09096197

--------------------------------------------------
Results 
--------------------------------------------------

Point estimate of the target parameter: -0.09162913

Number of bootstraps: 20

Bootstrapped confidence intervals (nonparametric):
    90%: [-0.1501942, -0.09096197]
    95%: [-0.1501942, -0.09090694]
    99%: [-0.1501942, -0.09090694]
p-value: 0

Instead of using sample, just draw some integers.

LP solver: Gurobi ('gurobi')

Obtaining propensity scores...
Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...
Generating IV-like moments...
    Moment 1...
    Moment 2...
    Moment 3...
    Moment 4...

Point estimate of the target parameter: -0.09162913

Bootstrap iteration 1...
    Point estimate:-0.1504226

Bootstrap iteration 2...
    Point estimate:-0.09181055

Bootstrap iteration 3...
    Point estimate:-0.09153195

Bootstrap iteration 4...
    Point estimate:-0.09222055

Bootstrap iteration 5...
    Point estimate:-0.09122626

Bootstrap iteration 6...
    Point estimate:-0.1261849

Bootstrap iteration 7...
    Point estimate:-0.1258882

Bootstrap iteration 8...
    Point estimate:-0.09180798

Bootstrap iteration 9...
    Point estimate:-0.1349838

Bootstrap iteration 10...
    Point estimate:-0.09159996

Bootstrap iteration 11...
    Point estimate:-0.1325613

Bootstrap iteration 12...
    Point estimate:-0.09158434

Bootstrap iteration 13...
    Point estimate:-0.09225638

Bootstrap iteration 14...
    Point estimate:-0.0913753

Bootstrap iteration 15...
    Point estimate:-0.1288676

Bootstrap iteration 16...
    Point estimate:-0.09243477

Bootstrap iteration 17...
    Point estimate:-0.1258887

Bootstrap iteration 18...
    Point estimate:-0.09138309

Bootstrap iteration 19...
    Point estimate:-0.1372906

Bootstrap iteration 20...
    Point estimate:-0.129436

--------------------------------------------------
Results 
--------------------------------------------------

Point estimate of the target parameter: -0.09162913

Number of bootstraps: 20

Bootstrapped confidence intervals (nonparametric):
    90%: [-0.1504226, -0.0913753]
    95%: [-0.1504226, -0.09122626]
    99%: [-0.1504226, -0.09122626]
p-value: 0

@a-torgovitsky
Copy link
Collaborator Author

The fact that the bootstrap distribution is bimodal really doesn't make sense to me.

This is a linear MTR case.
It is exactly identified.
So by Brinch, Mogstad and Wiswall, we know we can recover something like the ATT by basically just transforming some regression coefficients.
The resulting estimator should be asymptotically normal.
It's just point identified GMM, so the bootstrap should be picking this up.

I did the following experiment:

  1. Try to replicate the ivmte results through basic calculations. I was able to do this in two different ways. They match almost exactly.
  2. Use sample to draw B bootstrap samples. Each time, estimate the att using a basic calculation.
  3. Histogram the result.

Lo and behold, looks almost perfectly normal.

I think there is something quite seriously wrong going on in either the bootstrap or the point identification procedure.
It seems to be something about the interaction between the two, since changing the redrawing doesn't have an effect, and since my basic calculations match ivmte in the sample.

Here my is code and the histrogram.

rm(list = ls())
library("ivmte")

# Compute the ATT using my worked out calculations
att <- function(df) {
    p1 <- mean(df[df$samesex == 1, "morekids"])
    p0 <- mean(df[df$samesex == 0, "morekids"])
    ey1gd1 <- mean(df[df$morekids == 1, "worked"])
    beta0 <- 2*(   mean(df[(df$morekids == 0) & (df$samesex == 1), "worked"])
                 - mean(df[(df$morekids == 0) & (df$samesex == 0), "worked"])
               ) / (p1 - p0)
    alpha0 <- mean(df[(df$morekids == 0) & (df$samesex == 1), "worked"]) -
                beta0/2 - (beta0/2)*p1
    prz1gd1 <- mean(df[df$morekids == 1, "samesex"])
    ey0gd1 <- alpha0 + (beta0/2)*(p1*prz1gd1 + p0*(1 - prz1gd1))

    att <- ey1gd1 - ey0gd1

    return(att)
}

# Compute the ATT using regressions as in BMW
att_bmw <- function(df) {
    lrres <- lm(data = df, morekids ~ samesex)
    df$p <- predict(lrres, type = "response")
    lrd1 <- lm(data = df, worked ~ p, (morekids == 1))
    lrd0 <- lm(data = df, worked ~ p, (morekids == 0))

    beta0bmw <- 2*lrd0$coefficients["p"]
    alpha0bmw <- lrd0$coefficients[1] - beta0bmw/2

    p1 <- mean(df[df$samesex == 1, "morekids"])
    p0 <- mean(df[df$samesex == 0, "morekids"])
    ey0gd1bmw <- alpha0bmw + (beta0bmw/2)*(p1*prz1gd1 + p0*(1 - prz1gd1))

    attbmw <- ey1gd1 - ey0gd1bmw
}

ae <- AE

args <- list(data = ae,
             target = "att",
             m0 = ~ u,
             m1 = ~ u,
             ivlike = worked ~ morekids + samesex + morekids*samesex,
             propensity = morekids ~ samesex,
             point = TRUE)
res <- do.call(ivmte, args)

# EVERYTHING MATCHES -- GREAT
print(res$pointestimate)
print(att(ae))
print(att_bmw(ae))

# NOW BOOTSTRAP -- SHOULD GET TWO MODES?

B <- 500
attbs = NaN*c(1:B)
for (b in 1:B) {
    set.seed(b)
    ids <- sample(x = seq(1, nrow(ae)), size = nrow(ae), replace = TRUE)
    bdata <- ae[ids, ]
    attbs[b] <- att(bdata)
}
hist(attbs, breaks = 20) # Looks pretty normal to me

bootstrap

@a-torgovitsky
Copy link
Collaborator Author

I wonder if maybe the propensity score is not being recalculated on each bootstrap draw?
Maybe that could lead to something like this...I'm not sure

@jkcshea
Copy link
Owner

jkcshea commented Oct 27, 2019

It looks like it's the optimization in gmm that is causing the problem.

Below, I estimate the treatment effect in three ways:

  1. Estimating the MTR coefficients via basic OLS, since we're using the identify weighting.
    And then taking the inner product of those estimates with the gamma moments.
  2. What is currently being done in the package (i.e. same as above, but the MTR coefficients are estimated using gmm)
  3. Your code.

The treatment effect estimates from (1) and (3) always match.
However, the TE estimate from (2) is different, since the MTR coefficient estimates don't match with (1).
(Note: Convergence code = 0 means that the optimization algorithm indeed converged)

> r <- do.call(ivmte, args)

LP solver: Gurobi ('gurobi')

Obtaining propensity scores...
Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...
Generating IV-like moments...
    Moment 1...
    Moment 2...
    Moment 3...
    Moment 4...

[1] "OLS MTR COEFFICIENTS"
          [,1]      [,2]      [,3]      [,4]
[1,] 0.5145152 0.1017559 0.4160259 0.1429137

[1] "TE ESTIMATE USING OLS COEFFICIENTS"
[1] -0.09160436


[1] "GMM MTR COEFFICIENT ESTIMATES"
Method
 One step GMM with W = identity 

Objective function value:  3.663935e-08 

Theta[1]  Theta[2]  Theta[3]  Theta[4]  
 0.51473   0.10135   0.41630   0.14197  

Convergence code =  0 


[1] "TE ESTIMATE USING GMM COEFFICIENTS"
[1] -0.09162913


Point estimate of the target parameter: -0.09162913


[1] "AT's code:"
[1] -0.09160436

Bootstrap iteration 1...

[1] "OLS MTR COEFFICIENTS"
          [,1]       [,2]     [,3]      [,4]
[1,] 0.5331028 0.07443749 0.405718 0.2137902

[1] "TE ESTIMATE USING OLS COEFFICIENTS"
[1] -0.1040566


[1] "GMM MTR COEFFICIENT ESTIMATES"
Method
 One step GMM with W = identity 

Objective function value:  6.744737e-06 

Theta[1]  Theta[2]  Theta[3]  Theta[4]  
0.567508  0.020094  0.432941  0.030981  

Convergence code =  0 


[1] "TE ESTIMATE USING GMM COEFFICIENTS"
[1] -0.1327445


[1] "AT's code:"
[1] -0.1040566

    Point estimate:-0.1327445

Bootstrap iteration 2...

[1] "OLS MTR COEFFICIENTS"
          [,1]        [,2]      [,3]      [,4]
[1,] 0.6234279 -0.06156888 0.3655344 0.4308433

[1] "TE ESTIMATE USING OLS COEFFICIENTS"
[1] -0.1752616


[1] "GMM MTR COEFFICIENT ESTIMATES"
Method
 One step GMM with W = identity 

Objective function value:  6.537698e-06 

Theta[1]  Theta[2]  Theta[3]  Theta[4]  
0.571073  0.015019  0.434586  0.023184  

Convergence code =  0 


[1] "TE ESTIMATE USING GMM COEFFICIENTS"
[1] -0.1351167


[1] "AT's code:"
[1] -0.1752616

    Point estimate:-0.1351167

Bootstrap iteration 3...

[1] "OLS MTR COEFFICIENTS"
          [,1]      [,2]      [,3]      [,4]
[1,] 0.4166834 0.2471149 0.4151487 0.1530698

[1] "TE ESTIMATE USING OLS COEFFICIENTS"
[1] -0.01721885


[1] "GMM MTR COEFFICIENT ESTIMATES"
Method
 One step GMM with W = identity 

Objective function value:  5.652358e-06 

Theta[1]  Theta[2]  Theta[3]  Theta[4]  
0.563034  0.026957  0.431246  0.041694  

Convergence code =  0 


[1] "TE ESTIMATE USING GMM COEFFICIENTS"
[1] -0.1293295


[1] "AT's code:"
[1] -0.01721885

    Point estimate:-0.1293295

Since we're not using the gmm package to do inference, can we not just use OLS to obtain the MTR coefficients under identity weighting?
If the user decides to use efficient weighting, then we can call on gmm.

@a-torgovitsky
Copy link
Collaborator Author

Ok, great work finding the problem!
This could explain the bimodality, since it might be sensitive to starting values.

But it's really concerning that gmm can't solve this problem.
It's a linear moment condition model! Couldn't be simpler!

I think this means we need to just ditch gmm completely, unfortunately.
If it can't do this, then we can't rely on it for anything.

Could you write me a PDF of how you are setting this up as OLS?
Then I will think about whether it is possible to implement efficient weighting by doing a weighted OLS...I think it should be.

My recollection (and correct me if I am wrong) was that initially we tried to do the point identified case by hand and ran into some problems. Do you remember what those problems were?

@a-torgovitsky
Copy link
Collaborator Author

I guess the issue is the overidentified case...we won't be able to set that up as OLS.
But maybe we should take another stab at implementing the linear GMM ourselves.
It should just work out to some matrix algebra.

@a-torgovitsky a-torgovitsky added the bug Something isn't working label Oct 28, 2019
@a-torgovitsky
Copy link
Collaborator Author

Here's the GMM formulation
I think I did it correctly
gmm-pointid.pdf

@jkcshea
Copy link
Owner

jkcshea commented Oct 28, 2019

Sorry for being slow on this.
But yes, this matches with what I have.
I'm just confused with what went wrong last time, though...

@a-torgovitsky
Copy link
Collaborator Author

No worries
Glad we both reached the same conclusion independently

I do not remember what went wrong last time...but presumably it had to do with inverting something.
So the first thing to check is whether the sample analogue of E[G]'E[G] is invertible.
E[G] is a d_{S} \times d_{\theta} dimensional matrix...so E[G]'E[G] should be invertible as long as we have at least \theta independent moments --- same intuition as usual.

After that, the next culprit is the efficient weighting matrix. But again, I think that should be fine since it's constructed by taking the average of an outer product.

So I don't really see what the problem was from the first try.

@jkcshea
Copy link
Owner

jkcshea commented Oct 28, 2019

I have it setup for the specific example above, and everything runs just fine.
In fact, it runs much faster than I remember.
Note that I have to loop through all 209,000 observations in the AE data to construct the G matrices in your notes.
I remember having to do this in the previous attempt, and that really slowed things down.
Yet, this new GMM procedure takes less than 4 seconds.
In contrast, the gmm package takes 13 seconds.

For my own peace of mind, I'm going to try figure out what went wrong previously.
Perhaps I'm still doing something wrong here, or I just did something really wrong in the first attempt.

@jkcshea
Copy link
Owner

jkcshea commented Oct 28, 2019

Okay, we were running into problems in the past because we weren't running GMM, but FGLS.

This comment also indicates I was inverting a matrix for every observation, which explains why it previously took so long.
And then there was a bunch of very strange issues regarding tolerances and positive definiteness...

Fortunately, it doesn't look like we have that anymore.
Thanks for taking care of this.
I'll clean up the code and see how the bootstrap distributions look with this change.

@a-torgovitsky
Copy link
Collaborator Author

Ok, great.

The G matrices are basically the same as the "gamma" terms in the linear programming formulation, aren't they? So it shouldn't take long to construct them.

What about the optimally weighted version? Does that work too?

@jkcshea
Copy link
Owner

jkcshea commented Oct 29, 2019

Yep, everything looks to be working.
With the large AE data set, it takes some time, though.
The confidence intervals are much more symmetric now.

--------------------------------------------------
Results 
--------------------------------------------------

Point estimate of the target parameter: -0.09143571

Number of bootstraps: 50

Bootstrapped confidence intervals (nonparametric):
    90%: [-0.15318, -0.01884546]
    95%: [-0.1619731, -0.01628884]
    99%: [-0.1815772, -0.01391194]
p-value: 0

Bootstrapped J-test p-value: 0

The bootstrap distribution certainly looks more normal, but with only 50 draws it's hard to be sure.
I'll run a bigger test tomorrow to make sure it's okay.
I'll also check the run time.

Not quite sure what the best way to make sure the Bootstrapped J-test is working properly.
Hall and Horowitz state that the bootstrap J statistic isn't asymptotically chi squared, but should the distribution of the J statistics at least have a similar shape?

@a-torgovitsky
Copy link
Collaborator Author

Why does it take a long time to run?
It involves the same objects as the partially identified case, right?
So instead of solving an LP we just do some matrix algebra...should be faster, shouldn't it?

The bootstrapped J-test can be implemented same as what we talked about before in #66
If you want to verify whether it is working correctly, a Monte Carlo is the way to do it.
Draw data from a correctly-specified but overidentified model.
Each data draw, do the bootstrapped J-test.
If everything is working, then the proportion of data draws for which the J-test rejects at the 5% level should be roughly 5%.

@jkcshea
Copy link
Owner

jkcshea commented Oct 29, 2019

Ah, I shouldn't have said it took a long time to run.
It was late, and I was being impatient.
For a sample of 200,000, the two-step GMM takes about 5 seconds to complete.
For 200 bootstraps, the full estimation procedure ran for 32 minutes.
This is still much better than the gmm package.

Here's the distribution of point estimates:
image
Looks fairly normal, albeit there is a slight negative skew...

A little more about the run time:
Most of the time is actually spent getting the matrices in the right shape (not calculating the entries, since that was all done before hand).
For example, if I instead use a random subsample of 10,000 observations, all 200 bootstraps can be completed in just over a minute.


Sure thing, I'll set up the Monte Carlo.

@a-torgovitsky
Copy link
Collaborator Author

Hm, why does shaping the matrices take time? Is that some R quirk?
Does multiplying the matrices take a lot of time?

@jkcshea
Copy link
Owner

jkcshea commented Oct 29, 2019

Hm, why does shaping the matrices take time? Is that some R quirk?

I would say the quirk is to do with me, i.e. inefficient.
I had done everything with loops, as I didn't think I could do it with matrix multiplication.
Not surprisingly, I was very wrong.

I've rewritten the the code so the matrices are now shaped via matrix multiplication.
Constructing the average G matrices for a sample of 200,000 is about 16x faster than before.
The run time for 200 bootstraps has fallen to 14 minutes.
Most of this time is spent on the other parts of the estimation procedure.

@a-torgovitsky
Copy link
Collaborator Author

Ok, that makes a lot of sense.
I would expect that one GMM run shouldn't take more time than one LP run, and probably quite a bit less.

@jkcshea
Copy link
Owner

jkcshea commented Oct 31, 2019

The Monte Carlo simulations seem to check out.
However, we need rather large samples before getting close to the asymptotic distributions.

For N = 1000, we reject 1% of the time at the alpha = 0.05 level.
For N = 2000, we reject at 3%.
For N = 5000, we're at 4%.
For N = 10,000, we're at 4.6%.

And for added details: each simulation had 2,500 iterations, and 500 bootstraps.
The models were relatively simple, too.

args <- list(data = dt,
+              target = "att",
+              m0 = ~ 1,
+              m1 = ~ 1,
+              propensity = d ~ 1 + factor(z),
+              link = "linear",
+              ivlike = c(y ~ d,
+                         y ~ d | z,
+                         y ~ d | factor(z),
+                         y ~ z,
+                         y ~ factor(z)),
+              point = TRUE,
+              noisy = FALSE,
+              bootstraps = bootstraps)

@a-torgovitsky
Copy link
Collaborator Author

Hmm, seems strange but could be the case.
Can you give me some more detail on how you're doing the test?
(w/ the recentering, etc...just to make sure we are on the same page)

@a-torgovitsky
Copy link
Collaborator Author

Oh, and how many support points does z have in this example?

@jkcshea
Copy link
Owner

jkcshea commented Oct 31, 2019

Oh, and how many support points does z have in this example?

Ah, just 2.
The factor was left over from other DGPs I tried.

Can you give me some more detail on how you're doing the test?

I just realized I forgot to set point.eyeweight = FALSE...
Putting that in now, I'm getting errors when inverting the weight matrix.
I can investigate this, as well as come up with another DGP where two-step GMM works.
For the AE data, two-step GMM ran just fine, though.

Nevertheless, I follow #66:

  1. First, I perform the estimation procedure as usual, and save the J statistic.
    I also save the sample moment condition evaluated at the optimal solution---call that $\bar{g}$.
  2. A new bootstrap sample is then drawn, and the estimation procedure is re-run.
    However, at the GMM stage, I subtract off $\bar{g}$ from the moment condition, and estimate accordingly.
    Had I set point.eyeweight = FALSE, then the weight matrix would have been constructed using the centered GMM estimate.
  3. I save the J statistic based on the centered moment conditions, which will be used to construct the null distribution of the J statistic.
  4. Repeat steps 2 and 3 for B = 500 times.
  5. Once I have completed all the bootstraps, I compare the J statistic from the original sample against the null distribution obtained via bootstrap.
    I reject at the alpha = 0.05 level if the p-value is less than 0.05, i.e. the J statistic from the original sample is in the top 1 - alpha percentile of the null distribution.

@a-torgovitsky
Copy link
Collaborator Author

Ok, but you should definitely investigate why the two step doesn't work in this DGP. It should work generally, so this may be a problem with how it is implemented.

I'll expand my note to cover the bootstrap procedure so we can be sure we are on the same page. I think now that we are solving this explicitly, we might as well use the recentered criterion for the estimator as well. Will try to put that up later today.

@jkcshea
Copy link
Owner

jkcshea commented Oct 31, 2019

If I understand things correctly, the problem with inverting matrices stems from collinear S-functions.
In case more explanation is required, I thought it'd be easier to describe with a short document.
gmm-collinearity.pdf

We can probably remove all the redundant components for the user, and throw out a message informing them of this.

@a-torgovitsky
Copy link
Collaborator Author

Thanks for the document -- that's perfect.

The structure of what you have written is like an IV model with S as a vector of instruments for \tilde{G}.
So, the problem you are pointing out should be the same as saying that E[SS'] is not invertible, i.e. perfect multicollinearity among the instruments.

Is there a way to extend that logic to specifications with D in them?

I had tried last weekend to set up the GMM problem like an IV problem, but couldn't figure out how to accommodate such specifications. Would be great if that were possible, since then we could just have one option that does TSLS with S as the instrument for G. We could use the AER package for that and life would be much easier.

Regarding your example above, I guess the solution then is just to change ivlike so that it has one fully saturated specification?

@jkcshea
Copy link
Owner

jkcshea commented Oct 31, 2019

Is there a way to extend that logic to specifications with D in them?

I was trying to do this, but I don't think it's possible.
My reasoning is that, in order to do so, must be be able to factor out the vector $S_i$ from $Y_i S_i - G_i \theta$.
But suppose $S_i$ is a function of $D_i$.
$Y_i S_i$ will always include $S_i$ evaluated at the realized value of $D_i$.
Whereas $G_i$ will always include $S_i$ evaluated at both $D = 0$ and $D = 1$.
Consequently, we can't factor $S_i$ out so long as it varies with $D_i$.

Regarding your example above, I guess the solution then is just to change ivlike so that it has one fully saturated specification?

Yep, that did the trick, another simulation with two-step GMM is being run now.

@a-torgovitsky
Copy link
Collaborator Author

That was my reasoning too.
One way we could potentially do it is by considering a system of moments E[YDS] and E[Y(1-D)S], then dropping E[YS]. But I don't know if it's a good idea to change at this point.
Plus, then we would have a system IV problem, so we wouldn't get the benefits of being able to apply AER.

Anyway, what is your proposal for checking whether the moments are redundant?
Is that possible?

Also, here is an expanded writeup that works out the bootstrap estimator and the J-test. You should check it before implementing to make sure it makes sense (and that I didn't make an algebra mistake)!

gmm-pointid.pdf

@jkcshea
Copy link
Owner

jkcshea commented Oct 31, 2019

Plus, then we would have a system IV problem, so we wouldn't get the benefits of being able to apply AER.

Yes I agree.

Anyway, what is your proposal for checking whether the moments are redundant?
Is that possible?

Perhaps I'm wrong, but I was thinking of just checking for collinearity when constructing the weight matrix.
In that same document, each column of the $H(\theta)$ matrix corresponds to a particular moment/component from an IV-like specification.
If that $H(\theta)$ matrix is not full column rank, then that would imply some of the moments are linearly dependent.
I could then use R's qr function to determine which columns (moments) to drop.

@a-torgovitsky
Copy link
Collaborator Author

Wouldn't we want to drop the redundant moments before doing the first step estimation, though? Or this doesn't depend on \theta?

@jkcshea
Copy link
Owner

jkcshea commented Oct 31, 2019

Ah you're right, this doesn't depend on theta.
So yes, we can just evaluate the moment function for any theta, and drop the moments before the first step estimation.

@a-torgovitsky
Copy link
Collaborator Author

Or better yet, decompose H and look at what part is the problem
Should be G'G right?

@jkcshea
Copy link
Owner

jkcshea commented Nov 1, 2019

Or better yet, decompose H and look at what part is the problem

Yes, I think that's nice, so I gave it a shot.

Should be G'G right?

Hm, I thought I need H'H, since I care about the errors?

But currently, the function does the following.

  1. It constructs the Y*S and G matrices.
  2. It generates a random \theta to test for collinearity in Y*S - G \theta.
    [Edit:] I do an eigen decomposition.
  3. It removes the redundant moments.
  4. Then GMM is performed with the remaining moments.

The output looks as follows:

LP solver: Gurobi ('gurobi')

Obtaining propensity scores...
Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...
Generating IV-like moments...
    Moment 1...
    Moment 2...
    Moment 3...
    Moment 4...
    Moment 5...
    Moment 6...
    Moment 7...
    Moment 8...
    Moment 9...

Point estimate of the target parameter: 6.007976

Warning message:
The following components have been dropped due to collinearity: 
 IV-like Spec.     Component
 ----------------------------------------------
 2                 factor(z)2 
 3                 (Intercept) 
 3                 d 
 4                 (Intercept) 
 4                 d 
 4                 z 

And the simulation results suggest the bootstrap J test for GMM (now using optimal weighting) is working as it should, albeit we still need large samples for the rejection rates to look correct.
For n = 5,000, at alpha = 0.05, the rejection rate is 4.4%.
For n = 10,000, it is at 5%.

jkcshea added a commit that referenced this issue Nov 1, 2019
…oment conditions. However, the bootstrap is now failing when dropping moments---this is being investigated.
@a-torgovitsky
Copy link
Collaborator Author

Yeah I guess you do need the whole matrix H.
Ok, this looks great then.

And just to check -- now we are using the recentered moments also for computing the bootstrapped estimates, right?

I will try to play around with this in simulations soon just to make sure I can't break it.

@jkcshea
Copy link
Owner

jkcshea commented Nov 1, 2019

And just to check -- now we are using the recentered moments also for computing the bootstrapped estimates, right?

Ah, sorry, not yet, I had forgotten about that, I will put that in.
The program currently does everything in your extended notes, except that the bootstrapped \theta estimates do not use the centered moments.
The centered moments are only used for the J-test.

But I am curious: I thought recentering was not necessary for the parameter estimates, but only for the J-test.
Or perhaps this is simply to avoid having to perform GMM twice (once for non-centered moments, once for centered)?

I also suspect I'm being too aggressive when dropping redundant moments...
I will check this this morning and let you know once everything is ready,

@jkcshea
Copy link
Owner

jkcshea commented Nov 1, 2019

Okay, everything has been implemented.
I reran the Monte Carlos for identity weighting and efficient weighting, and everything looks as they should be.

@a-torgovitsky
Copy link
Collaborator Author

Yes, it's true that it is not necessary to recenter the moments for the bootstrapped parameter estimates to be consistent.
But it's easy to do now that we worked this out from scratch, so might as well. It will probably work better in finite samples.

@a-torgovitsky
Copy link
Collaborator Author

Haven't been able to break this, so let's call it done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test-this Things that need to be tested
Projects
None yet
Development

No branches or pull requests

2 participants