Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target gets duplicated depending on transformation input #1010

Closed
bart1 opened this issue Sep 13, 2019 · 6 comments
Closed

Target gets duplicated depending on transformation input #1010

bart1 opened this issue Sep 13, 2019 · 6 comments
Assignees

Comments

@bart1
Copy link

bart1 commented Sep 13, 2019

Sorry for reporting all these strange edge cases, it took me a while to reduce the plan as much as possible from the large plan of the original analysis. It seems the bug is not reproduces if the targets br or dataTrainList are not included even though they seem independent. In this case depending on the input to the crossing of data the target dataTestList gets duplicated. It seems this duplication only occurs if the second element of cvo can be interpreted as a numeric (see the first two of the examples below). I expected in all cases below that only two targets for dataTestList would be generated.

require(drake)
#> Loading required package: drake
p<-drake_plan(trace=T,tidy_eval = T, transform = F,
  data = target(
    command = crossValOmit(radar, crossValOmission),
    transform = cross(radar=!!'dd',
      crossValOmission = !!cvo,
      .id = c(radar, crossValOmission)
    )
  ),
  br = target(
    command = annotate_model(data),
    transform = combine(data, 
                        .by = data)
  ),
  b = target(
    command = list(crossValId, data),
    transform = cross(data,
      crossValId = !!1,
      .id = c(radar, crossValOmission, crossValId)
    )
  ),
  a = target(
    command = list(b),
    transform = combine(b, .by = data)
  ),
  dataTrainList = target(
    command = list2(a, data),
    transform = map(a, data,
      .id = c(crossValOmission, radar)
    )
  ),
  dataTestList = target(
    command = list(a, data),
    transform = map(a, data,
      .id = c(crossValOmission, radar)
    )
  )
)
cvo<-c('a3', "2")
grep('dataTestList', transform_plan(p)$target, value=T)
#> [1] "dataTestList_2_dd"    "dataTestList_a3_dd"   "dataTestList_2_dd_2" 
#> [4] "dataTestList_a3_dd_2"
cvo<-c('3', "2")
grep('dataTestList', transform_plan(p)$target, value=T)
#> [1] "dataTestList_2_dd"   "dataTestList_3_dd"   "dataTestList_2_dd_2"
#> [4] "dataTestList_3_dd_2"
cvo<-c('3', "a2")
grep('dataTestList', transform_plan(p)$target, value=T)
#> [1] "dataTestList_3_dd"  "dataTestList_a2_dd"

Created on 2019-09-13 by the reprex package (v0.3.0)

@wlandau
Copy link
Member

wlandau commented Sep 13, 2019

Where is the variable dd?

@wlandau
Copy link
Member

wlandau commented Sep 13, 2019

Oh, it's supposed to be just a string. Never mind...

@bart1
Copy link
Author

bart1 commented Sep 13, 2019

yes sorry that was a quick hack to reduce the multiplication that was in the original plan

@wlandau
Copy link
Member

wlandau commented Sep 13, 2019

The problem actually has to do with NA values. We need to remove them from the grid before we apply map(). Working on a patch, but a meeting is about to interrupt me...

@bart1
Copy link
Author

bart1 commented Sep 13, 2019

No problem your amazingly responsive in any case.

@wlandau
Copy link
Member

wlandau commented Sep 13, 2019

Should be fixed now. map() is now better at respecting the graph topology (e.g. nestings) of old grouping variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants