Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

margins() ignores use of I() inside formula #150

Open
1 of 3 tasks
lang-benjamin opened this issue Jun 21, 2020 · 11 comments
Open
1 of 3 tasks

margins() ignores use of I() inside formula #150

lang-benjamin opened this issue Jun 21, 2020 · 11 comments
Labels

Comments

@lang-benjamin
Copy link

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

Using I() inside a formula does not work, margins() ignores it. As an example, let's do a linear regression of mpg vs. hp from the mtcars data and re-scale hp.

library(margins)
# re-scale predictor variable hp manually works
mtcars2 <- mtcars
mtcars2$hp <- mtcars2$hp / 50
margins(lm(mpg ~ hp, data = mtcars2))
#> Average marginal effects
#> lm(formula = mpg ~ hp, data = mtcars2)
#>      hp
#>  -3.411

# re-scale within formula does not work:
# margins() ignores re-scaling via I(hp / 50) and produces same result as in
# margins(lm(mpg ~ hp, data = mtcars))
ame_scaled <- margins(lm(mpg ~ I(hp / 50), data = mtcars))
ame_scaled
#> Average marginal effects
#> lm(formula = mpg ~ I(hp/50), data = mtcars)
#>        hp
#>  -0.06823
all.equal(
  summary(ame_scaled)$AME[[1]],
  summary(margins(lm(mpg ~ hp, data = mtcars)))$AME[[1]]
)
#> [1] TRUE

Created on 2020-06-21 by the reprex package (v0.3.0)

Session info
sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] margins_0.3.23
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.4.6      digest_0.6.25     MASS_7.3-51.6     magrittr_1.5     
#>  [5] evaluate_0.14     highr_0.8         rlang_0.4.6       stringi_1.4.6    
#>  [9] data.table_1.12.8 rmarkdown_2.1     prediction_0.3.14 tools_4.0.0      
#> [13] stringr_1.4.0     xfun_0.14         yaml_2.2.1        compiler_4.0.0   
#> [17] htmltools_0.4.0   knitr_1.28
@leeper
Copy link
Owner

leeper commented Jun 22, 2020

This is by design. margins() reports effects at the scale of the original variable, not the term generated by I().

@lang-benjamin
Copy link
Author

Thanks for the quick reply! Let me ask one more question about the general principle behind this. So, how about other transformations like log(hp), stats::poly(hp, 2) or splines::ns(hp)? The first two are not ignored by margins() whereas the last one will again be ignored.

@leeper
Copy link
Owner

leeper commented Jun 24, 2020

Thanks. I'll look into that more - the intention is for it to report based on original variable. If that's not happening, it's probably a bug.

@lang-benjamin
Copy link
Author

Thanks. Sorry again, where is this intention coming from? That essentially means margins does not allow for non-linear modelling/transformation. Are there specific obstacles when calculating marginal effects for such transformations?

@tzoltak
Copy link

tzoltak commented Oct 9, 2020

Obviously when it comes to non-linear transformation marginal effect at the scale of the original variable is no longer constant across different values of the original predictor.

@elbersb
Copy link

elbersb commented Apr 22, 2021

Thanks for the quick reply! Let me ask one more question about the general principle behind this. So, how about other transformations like log(hp), stats::poly(hp, 2) or splines::ns(hp)? The first two are not ignored by margins() whereas the last one will again be ignored.

Just adding a comment here, because I was confused by this issue. In my testing, all of these transformations are reported on the scale of the original predictor (which is also what I would have expected):

library(margins)
margins(lm(mpg ~ wt + I(wt^2), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ wt + I(wt^2), data = mtcars)
#>      wt
#>  -5.845
margins(lm(mpg ~ stats::poly(wt, 2), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ stats::poly(wt, 2), data = mtcars)
#>      wt
#>  -5.845
margins(lm(mpg ~ log(wt), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ log(wt), data = mtcars)
#>      wt
#>  -5.855
margins(lm(mpg ~ splines::ns(wt), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ splines::ns(wt), data = mtcars)
#>      wt
#>  -5.344
margins(lm(mpg ~ splines::ns(wt, 5), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ splines::ns(wt, 5), data = mtcars)
#>      wt
#>  -5.931

Created on 2021-04-22 by the reprex package (v1.0.0)

@tzoltak
Copy link

tzoltak commented Apr 22, 2021

By the way - however perhaps this is not the best place to talk about it - rather unexpected consequence of margins() reporting effects for variables on their original scale is that it is - in some situations - sensitive to change in the data, that was used to estimate the model. See:

library(margins)
# model without within-formula data transformations
m1 <- lm(mpg ~ wt, data = mtcars)
# model with some within-formula data transformation
m2 <- lm(mpg ~ I(wt^2), data = mtcars)
margins(m1)
#> Average marginal effects
#> lm(formula = mpg ~ wt, data = mtcars)
#>      wt
#>  -5.344
margins(m2)
#> Average marginal effects
#> lm(formula = mpg ~ I(wt^2), data = mtcars)
#>      wt
#>  -4.542

# now let's change the dataset
mtcars <- subset(mtcars, carb < 4)

# and let's call margins() on the (unaffected!) models once more

# for model with no  within-formula data transformations we get the same results
margins(m1)
#> Average marginal effects
#> lm(formula = mpg ~ wt, data = mtcars)
#>      wt
#>  -5.344

# but for model with some within-formula data transformations results are different!
margins(m2)
#> Average marginal effects
#> lm(formula = mpg ~ I(wt^2), data = mtcars)
#>      wt
#>  -4.068

Created on 2021-04-22 by the reprex package (v2.0.0)

That's because in the latter model there is no way to get the original variable other than turn back to the dataset (there is no untransformed data in the a model object) and consequently margins() works on the dataset in its current state. On the other hand, with no within-formula data transformation margins() can rely on a model frame that is stored in a model object itself to perform all the computations, so these results stands unaffected.

@elbersb
Copy link

elbersb commented Apr 23, 2021

That's an interesting find, and I think it should be included in the documentation somewhere, if it isn't yet.

For the specific case you describe (subsetting), margins could even figure out that the data was changed, because the design matrix has fewer rows than the updated mtcars data. I think that margins should throw an error in that case. Of course, it would not be possible to detect other changes in the original data.

I think it would be worth creating a new issue for this problem.

@tzoltak
Copy link

tzoltak commented Apr 23, 2021

It should be possible to detect any changes (that affect results) by comparing model frame from the model object with model frame constructed using data object in its current state and the model call. However detecting such changes perhaps should result in warning instead of error.

@elbersb
Copy link

elbersb commented Apr 26, 2021

Yes, that would be a pretty expensive operation for large models, though, and you would have to do it for every margins() call. I definitely think that this should be addressed in some way.

@lang-benjamin
Copy link
Author

Just adding a comment here, because I was confused by this issue. In my testing, all of these transformations are reported on the scale of the original predictor (which is also what I would have expected):
...

Thanks for the answer! I realize that I misunderstood the initial answer. I thought that the intention was to ignore usage of I() or any other transformation. It is however not ignored, just the final reporting is then in terms of the original variable (which makes sense).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants