Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterated model building using formulas #51

Closed
mschubert opened this issue Jul 27, 2015 · 1 comment
Closed

Iterated model building using formulas #51

mschubert opened this issue Jul 27, 2015 · 1 comment
Labels

Comments

@mschubert
Copy link
Owner

Consider the following setup of A, B, and C, where you want to apply a formula (e.g. lm(...), rlm(...), etc. - lm() supports matrix row indexing, but most other do not). The proposed syntax would add support for iterated calculation of a model where this is not implicitly supported, and a common output format.

A = matrix(1:4, nrow=2, ncol=2, dimnames=list(c('a','b'),c('x','y')))
B = matrix(5:6, nrow=2, ncol=1, dimnames=list(c('b','a'),'z'))
C = matrix(4:5, nrow=2, ncol=2)
# (A)   x y    (B)   z    (C)     [,1] [,2]
#     a 1 3        b 5      [1,]    4    4
#     b 2 4        a 6      [2,]    5    5

With those, you should be able to write (using lm here as an example):

# multiple outcomes - this works as expected
lm(A ~ B) # calculates lm(A[,1] ~ B), lm(A[,2] ~ B), etc.
# multiple inputs - this doesn't work
lm(B ~ A) # calculates lm(B ~ A[,1]), lm(B ~ A[,2]) + the general case
# each should return effect size, p-value, and other metrics

Now, if I want a generalized syntax to specify matrix iterations in models that only need to work on vectors, which one would be the best option?

  • implicitly assume all matrices are iterated, specify grouping with additional arguments
  • specify grouping using interaction syntax
# example 1: iterate A and C through columns. don't iterate B
x1 = create_formula_index(A ~ B + C) # option 1
x1 = create_formula_index(A ~ B:0 + C:0) # option 2
#   A B C
#1 x z 1
#2 y z 1
#3 x z 2
#4 y z 2

# example2: iterate A and C together, don't iterate B
x2 = create_formula_index(A ~ B + C, group=c("A", "C"), atomic="B") # option 1
x2 = create_formula_index(A:1 ~ B:0 + C:1) # option 2
#   A B C
#1 x z x
#2 y z y

The advantage of option 1 is that this is what lm(...) does, but it can be a bit verbose if want groups and atomic variables. Option 2 is more verbose when iterating, less when not.

Which option should be preferred?

@mschubert mschubert changed the title RFC: iterated model building using formulas Iterated model building using formulas Aug 7, 2015
@mschubert
Copy link
Owner Author

replaced by narray::lambda https://github.com/mschubert/narray#mapping-functions-on-arrays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant