Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: cannot allocate vector of size 19.6 Gb #27

Closed
ccmullally opened this issue Dec 27, 2021 · 4 comments
Closed

Error: cannot allocate vector of size 19.6 Gb #27

ccmullally opened this issue Dec 27, 2021 · 4 comments

Comments

@ccmullally
Copy link

I ran into something else. I'm using a dataframe with about 60,000 observations and six clusters. I am running into the error above when running boottest. Boottest on Stata handles everything with no issues using the same data set. I am also able to do the wild cluster bootstrap "by hand" using a foreach loop. I'm happy to share my data and complete code if that would help with the diagnosis.
Here is the offending code:

lm.fit <- lm(price ~  CAXpost + CA + post, data = df, weights = units)

# bootstrap inference 
boot_feols <- boottest(lm.fit, clustid = "market", param = "CAXpost", B = 499, type = 'webb')
@s3alfisc
Copy link
Owner

s3alfisc commented Dec 27, 2021

Hi @ccmullally,

Thanks for the feedback! I'm sorry about your problems, and also a little surprised to hear that you run out of memory with six clusters - the main constraint on memory is in creating the bootstrap weights matrix v, which is of dimension G x(B+1) , which I would not have expected to lead to memory issues when G = 6. How many bootstrap iterations are you running? Can you confirm that the memory problem arises when creating the weights matrix?
The weights are created in boot_algo2(), with the following lines

    v <- wild_draw_fun(n = N_G_bootcluster * (boot_iter + 1))
    dim(v) <- c(N_G_bootcluster, boot_iter + 1)
    v[, 1] <- 1

Anyways, this is a problem I am aware of and it is good that you mention it, so it is now back on my to-do list :)

For a quick fix from 'within' R, you could also try wildboottestjlr, which is a wrapper around @droodman's WildBootTests.jl - it follows the same syntax as fwildclusterboot::boottest(), but you would have to install Julia (e.g. via the wildboottestjlr_setup() function).

EDIT
The error occurs probably in the matrix multiplication part, not in the creation of the weights matrix v.

@ccmullally
Copy link
Author

I was running 499 replications. I will admit that I am a total R rookie (I'm moving my grad class problem sets from Stata to R) I don't know how to isolate the part of the code where it is breaking. Is there an equivalent to Stata's "trace" in R?

@s3alfisc
Copy link
Owner

I think the R equivalent should be the traceback() function, see this tutorial here. In your case, the debug() function might be useful too. But maybe it would be easiest if you sent me your data set and code? My email is alexander-fischer1801@t-online.de. Btw, very cool that you are helping your students to learn R! :)

s3alfisc added a commit that referenced this issue Dec 28, 2021
@s3alfisc
Copy link
Owner

Hi, I have now checked your code & data and you have indeed found a bug!

In a nutshell, the error arose due to the use of column labels - I have updated the development version so that the error no longer arises. Can you confirm that the bootstrap now runs without any troubles? :)

In around 30-60 minutes, you should be able to installl a compiled version of fwildclusterboot from r-universe by running

# from r-universe (windows & mac, compiled R > 4.0 required)
install.packages('fwildclusterboot', repos ='https://s3alfisc.r-universe.dev')

I will submit the package to CRAN after Jan 3rd when the CRAN team is back from their winter break.

s3alfisc added a commit that referenced this issue Dec 28, 2021
s3alfisc added a commit that referenced this issue Dec 28, 2021
s3alfisc added a commit that referenced this issue Jan 17, 2022
…e specified as vectors but without reference to the input data set, hence not as data$weights. While this is legal for feols(), lm() and felm(), I want to make sure that the weights vector is part of the input data.frame - which is neccessary for reasons of data processing
s3alfisc added a commit that referenced this issue Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants