Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mfx for a subset of regressors #65

Closed
vincentarelbundock opened this issue May 30, 2017 · 3 comments
Closed

mfx for a subset of regressors #65

vincentarelbundock opened this issue May 30, 2017 · 3 comments

Comments

@vincentarelbundock
Copy link
Contributor

When a model has a lot of regressors (e.g., a within-model specified using a long list of dummies), margins can take a while to compute because the function calculates quantities for all the terms in the model. Often, the user only cares about a subset of those terms.

A simple way to speed things up would thus be to allow the user to specify a subset of variables for which they want to compute marginal effects and variances, and to ignore the rest.

Looking at your code, it seems like adding that option would be pretty easy. A super simple approach could look like this: https://github.com/vincentarelbundock/margins/commit/137b8bbc6800d6e6cbf56c29dc628e35dbccadbc

With my fork installed, I get the following speedup:

> library(margins)
> library(microbenchmark)
> N = 1000
> dat = data.frame('x' = rnorm(N),
+                  'y' = sample(0:1, N, replace = TRUE),
+                  'w' = rnorm(N),
+                  'z' = sample(letters[1:10], N, replace = TRUE))
> mod = glm(y ~ x + w + z, dat, family = binomial())
> margins_slow = function() margins::margins(mod)
> margins_fast = function() margins::margins(mod, variables = 'x')
> microbenchmark(margins_slow(), margins_fast(), times = 3)
Unit: milliseconds
           expr      min       lq     mean   median       uq      max neval
 margins_slow() 795.1751 805.3067 813.4454 815.4383 822.5805 829.7228     3
 margins_fast() 200.0212 201.4152 202.0093 202.8093 203.0033 203.1974     3

Is this something you would want to implement? If so, what would you like to see in a PR?

@vincentarelbundock
Copy link
Contributor Author

PS: my understanding is that variances and unit_ses are calculated by looking at the names of the marginal effects data frame, so subsetting at the mfx level propagates further when users ask for variances or unit ses.

@leeper
Copy link
Owner

leeper commented May 30, 2017

Nice idea. Yes, want to send a PR? If so, can you move the variables argument to right after data rather than at the end of the current options?

@leeper
Copy link
Owner

leeper commented May 31, 2017

Thanks for the PR. It looks good. I'm making some tweaks for consistency and to fix the failing tests but I'll be merging it shortly.

@leeper leeper closed this as completed in 98d22a3 May 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants