Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cbinding data with different row number or row ids #309

Closed
mb706 opened this issue Aug 6, 2019 · 6 comments
Closed

cbinding data with different row number or row ids #309

mb706 opened this issue Aug 6, 2019 · 6 comments

Comments

@mb706
Copy link
Collaborator

mb706 commented Aug 6, 2019

Is there a way to make $cbind() behave like an outer join, so data with different rows can be added and the missing fields filled with NAs? See mlr-org/mlr3pipelines#216

@mllg
Copy link
Sponsor Member

mllg commented Aug 6, 2019

Not very well tested yet, but should work.

@mb706
Copy link
Collaborator Author

mb706 commented Aug 7, 2019

I guess I would have to construct the cbind backend explicitly instead of using Task$cbind for this?

@mllg
Copy link
Sponsor Member

mllg commented Aug 7, 2019

Is there an assertion in task$cbind() or why do you want to do this explicitly?

@mb706
Copy link
Collaborator Author

mb706 commented Aug 8, 2019

task$cbind() doesn't work:

> it = mlr_tasks$get("iris")
> it$cbind(data.table(x = 1:200))
Error in `[[<-.data.frame`(`*tmp*`, pk, value = 1:150) : 
  replacement has 150 rows, data has 200
> it$cbind(data.table(x = 1:100))
Error in `[[<-.data.frame`(`*tmp*`, pk, value = 1:150) : 
  replacement has 150 rows, data has 100

and even if it did work, there would probably need to be a mechanic to indicate which row of the new data belongs to which row of the task.

What does work is something like

> it$row_roles$use = (1:10) * 10  # say we want to put the new data on every 10th row
> it$cbind(data.table(z = 1:10))
> it$row_roles$use = 1:150

is this the intended way of doing this?

@mllg
Copy link
Sponsor Member

mllg commented Aug 8, 2019

We should do a call to figure this out.

@mllg
Copy link
Sponsor Member

mllg commented Sep 29, 2021

I guess it should be fine to manually fill with NAs in mlr3pipelines.

@mllg mllg closed this as completed Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants