margins() ignores use of I() inside formula #150

lang-benjamin · 2020-06-21T11:28:36Z

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Using I() inside a formula does not work, margins() ignores it. As an example, let's do a linear regression of mpg vs. hp from the mtcars data and re-scale hp.

library(margins)
# re-scale predictor variable hp manually works
mtcars2 <- mtcars
mtcars2$hp <- mtcars2$hp / 50
margins(lm(mpg ~ hp, data = mtcars2))
#> Average marginal effects
#> lm(formula = mpg ~ hp, data = mtcars2)
#>      hp
#>  -3.411

# re-scale within formula does not work:
# margins() ignores re-scaling via I(hp / 50) and produces same result as in
# margins(lm(mpg ~ hp, data = mtcars))
ame_scaled <- margins(lm(mpg ~ I(hp / 50), data = mtcars))
ame_scaled
#> Average marginal effects
#> lm(formula = mpg ~ I(hp/50), data = mtcars)
#>        hp
#>  -0.06823
all.equal(
  summary(ame_scaled)$AME[[1]],
  summary(margins(lm(mpg ~ hp, data = mtcars)))$AME[[1]]
)
#> [1] TRUE

^{Created on 2020-06-21 by the reprex package (v0.3.0)}

Session info

sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] margins_0.3.23
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.4.6      digest_0.6.25     MASS_7.3-51.6     magrittr_1.5     
#>  [5] evaluate_0.14     highr_0.8         rlang_0.4.6       stringi_1.4.6    
#>  [9] data.table_1.12.8 rmarkdown_2.1     prediction_0.3.14 tools_4.0.0      
#> [13] stringr_1.4.0     xfun_0.14         yaml_2.2.1        compiler_4.0.0   
#> [17] htmltools_0.4.0   knitr_1.28

The text was updated successfully, but these errors were encountered:

leeper · 2020-06-22T13:40:09Z

This is by design. margins() reports effects at the scale of the original variable, not the term generated by I().

lang-benjamin · 2020-06-22T18:47:40Z

Thanks for the quick reply! Let me ask one more question about the general principle behind this. So, how about other transformations like log(hp), stats::poly(hp, 2) or splines::ns(hp)? The first two are not ignored by margins() whereas the last one will again be ignored.

leeper · 2020-06-24T08:05:40Z

Thanks. I'll look into that more - the intention is for it to report based on original variable. If that's not happening, it's probably a bug.

lang-benjamin · 2020-07-07T14:41:07Z

Thanks. Sorry again, where is this intention coming from? That essentially means margins does not allow for non-linear modelling/transformation. Are there specific obstacles when calculating marginal effects for such transformations?

tzoltak · 2020-10-09T21:43:14Z

Obviously when it comes to non-linear transformation marginal effect at the scale of the original variable is no longer constant across different values of the original predictor.

elbersb · 2021-04-22T09:00:27Z

Thanks for the quick reply! Let me ask one more question about the general principle behind this. So, how about other transformations like log(hp), stats::poly(hp, 2) or splines::ns(hp)? The first two are not ignored by margins() whereas the last one will again be ignored.

Just adding a comment here, because I was confused by this issue. In my testing, all of these transformations are reported on the scale of the original predictor (which is also what I would have expected):

library(margins)
margins(lm(mpg ~ wt + I(wt^2), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ wt + I(wt^2), data = mtcars)
#>      wt
#>  -5.845
margins(lm(mpg ~ stats::poly(wt, 2), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ stats::poly(wt, 2), data = mtcars)
#>      wt
#>  -5.845
margins(lm(mpg ~ log(wt), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ log(wt), data = mtcars)
#>      wt
#>  -5.855
margins(lm(mpg ~ splines::ns(wt), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ splines::ns(wt), data = mtcars)
#>      wt
#>  -5.344
margins(lm(mpg ~ splines::ns(wt, 5), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ splines::ns(wt, 5), data = mtcars)
#>      wt
#>  -5.931

^{Created on 2021-04-22 by the reprex package (v1.0.0)}

tzoltak · 2021-04-22T20:18:15Z

By the way - however perhaps this is not the best place to talk about it - rather unexpected consequence of margins() reporting effects for variables on their original scale is that it is - in some situations - sensitive to change in the data, that was used to estimate the model. See:

library(margins)
# model without within-formula data transformations
m1 <- lm(mpg ~ wt, data = mtcars)
# model with some within-formula data transformation
m2 <- lm(mpg ~ I(wt^2), data = mtcars)
margins(m1)
#> Average marginal effects
#> lm(formula = mpg ~ wt, data = mtcars)
#>      wt
#>  -5.344
margins(m2)
#> Average marginal effects
#> lm(formula = mpg ~ I(wt^2), data = mtcars)
#>      wt
#>  -4.542

# now let's change the dataset
mtcars <- subset(mtcars, carb < 4)

# and let's call margins() on the (unaffected!) models once more

# for model with no  within-formula data transformations we get the same results
margins(m1)
#> Average marginal effects
#> lm(formula = mpg ~ wt, data = mtcars)
#>      wt
#>  -5.344

# but for model with some within-formula data transformations results are different!
margins(m2)
#> Average marginal effects
#> lm(formula = mpg ~ I(wt^2), data = mtcars)
#>      wt
#>  -4.068

^{Created on 2021-04-22 by the reprex package (v2.0.0)}

That's because in the latter model there is no way to get the original variable other than turn back to the dataset (there is no untransformed data in the a model object) and consequently margins() works on the dataset in its current state. On the other hand, with no within-formula data transformation margins() can rely on a model frame that is stored in a model object itself to perform all the computations, so these results stands unaffected.

elbersb · 2021-04-23T07:01:17Z

That's an interesting find, and I think it should be included in the documentation somewhere, if it isn't yet.

For the specific case you describe (subsetting), margins could even figure out that the data was changed, because the design matrix has fewer rows than the updated mtcars data. I think that margins should throw an error in that case. Of course, it would not be possible to detect other changes in the original data.

I think it would be worth creating a new issue for this problem.

tzoltak · 2021-04-23T18:58:10Z

It should be possible to detect any changes (that affect results) by comparing model frame from the model object with model frame constructed using data object in its current state and the model call. However detecting such changes perhaps should result in warning instead of error.

elbersb · 2021-04-26T07:15:01Z

Yes, that would be a pretty expensive operation for large models, though, and you would have to do it for every margins() call. I definitely think that this should be addressed in some way.

lang-benjamin · 2021-05-13T05:54:20Z

Just adding a comment here, because I was confused by this issue. In my testing, all of these transformations are reported on the scale of the original predictor (which is also what I would have expected):
...

Thanks for the answer! I realize that I misunderstood the initial answer. I thought that the intention was to ignore usage of I() or any other transformation. It is however not ignored, just the final reporting is then in terms of the original variable (which makes sense).

leeper added the question label Jun 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

margins() ignores use of I() inside formula #150

margins() ignores use of I() inside formula #150

lang-benjamin commented Jun 21, 2020

leeper commented Jun 22, 2020

lang-benjamin commented Jun 22, 2020

leeper commented Jun 24, 2020

lang-benjamin commented Jul 7, 2020

tzoltak commented Oct 9, 2020

elbersb commented Apr 22, 2021

tzoltak commented Apr 22, 2021

elbersb commented Apr 23, 2021

tzoltak commented Apr 23, 2021

elbersb commented Apr 26, 2021

lang-benjamin commented May 13, 2021

margins() ignores use of I() inside formula #150

margins() ignores use of I() inside formula #150

Comments

lang-benjamin commented Jun 21, 2020

leeper commented Jun 22, 2020

lang-benjamin commented Jun 22, 2020

leeper commented Jun 24, 2020

lang-benjamin commented Jul 7, 2020

tzoltak commented Oct 9, 2020

elbersb commented Apr 22, 2021

tzoltak commented Apr 22, 2021

elbersb commented Apr 23, 2021

tzoltak commented Apr 23, 2021

elbersb commented Apr 26, 2021

lang-benjamin commented May 13, 2021