Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling datasets with multiple targets? #84

Closed
relf opened this issue Feb 9, 2021 · 10 comments
Closed

Handling datasets with multiple targets? #84

relf opened this issue Feb 9, 2021 · 10 comments

Comments

@relf
Copy link
Member

relf commented Feb 9, 2021

I would like to add the linnerud dataset to linfa-dataset but I've noticed that currently targets are only handled as a one-dimensional array. Should we introduce something like:

pub type MultiTargetDataset<D, T> =
    DatasetBase<ArrayBase<OwnedRepr<D>, Ix2>, ArrayBase<OwnedRepr<T>, Ix2>>;

or is there a better way?

@bytesnake
Copy link
Member

Hey,
this wasn't considered for now but should be as multi-target prediction is very common.

The targets matrix can be 1, 2 or three dimensional in three different cases:

  • 1dim - 1 target per N observations
  • 2dim - M targets per N observations
  • 3dim - L probabilities for M targets per N observations

here is L the number of labels, M the number of targets and N the number of points. Because ndarray supports three dimensional arrays we should switch the Dataset definition to

pub type Dataset<D, T> = DatasetBase<ArrayBase<OwnedRepr<D>, Ix2>, ArrayBase<OwnedRepr<T>, Ix3>>;

and then update the whole machinery in linfa/dataset/.

@bytesnake
Copy link
Member

I added a point to #70 for this feature

@relf
Copy link
Member Author

relf commented Feb 10, 2021

Ok thanks. I was too conservative and not aware of the third case (do you have an example in mind?)

I added a point to #70 for this feature

Does it mean you plan to work on that in the coming weeks?

@bytesnake
Copy link
Member

yes playing around with this right now, because it is a prerequisite for #66 I will open a PR once I have something useful

@bytesnake
Copy link
Member

work started in #88, I think the basic mechanism is in place and I will finish the PR in the next days

@bytesnake
Copy link
Member

finally merged #88, can you give feedback whether everything works as expected?

@relf
Copy link
Member Author

relf commented Feb 23, 2021

Ok, I will. I will resume my work on linnerud dataset. FYI, I am working on a port of PlsRegression which is tested against that dataset. If it goes well, maybe it could land in linfa? Where? A new member linfa-cross-decomposition or ...? What do you think?

@bytesnake
Copy link
Member

sounds good, we should add CCA at a later point as well. If you want to implement the SVD version, you can use TruncatedSvd from ndarray-linalg

@relf
Copy link
Member Author

relf commented Feb 23, 2021

A suggestion: now that we have several targets, we could have target_names as well.

@bytesnake
Copy link
Member

multi-target datasets were introduced #88 and a sample dataset added in #89. I also added a note in #70 regarding target naming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants