Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TupleTransformer #247

Open
wdevazelhes opened this issue Aug 8, 2019 · 2 comments
Open

TupleTransformer #247

wdevazelhes opened this issue Aug 8, 2019 · 2 comments

Comments

@wdevazelhes
Copy link
Member

As discussed with @bellet, it would be useful to have a sort of TupleTransformer object, that would take as __init__ a regular scikit-learn Transformer (so it would be a MetaEstimator), and that would fit/transform on tuples using the given Transformer (instead of the dataset of points)
i.e. it would deduplicate the points inside, fit the transformer on the dataset, and be able to transform it. This would allow to use it in a pipeline like:

from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from metric_learn import TupleTransformer, ITML
from sklearn.model_selection import cross_val_score

model = make_pipeline(TupleTransformer(PCA()), ITML())
cross_val_score(model, pairs, y_pairs)

It could also be useful in some cases to have an way to use metric learning algorithms to transform tuples, like a transform_tuples method for instance

There may be other options too, this issue is to discuss about this

@perimosocordiae
Copy link
Contributor

Can you explain a little more on the inputs and outputs of the TupleTransformer? It seems like it would need access to the label information at some point, but I'm not familiar enough with the MetaEstimator API to see how that would work.

@bellet
Copy link
Member

bellet commented Aug 13, 2019

I think the TupleTransformer would simply take tuples as input, internally turn them into a plain unlabeled dataset X (by collecting all points involved in tuples) and feed this as input to whatever regular unsupervised transformer given at init?

We won't be able to use any label information (e.g., similar/dissimilar labels for pairs) in the since they are not at the individual point level. So only unsupervised transformers should be allowed (e.g., PCA, but not LDA).

@terrytangyuan terrytangyuan removed their assignment Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants