Assign Doubt for Dissimilarity from Labelled Set #12

koaning · 2021-11-14T17:14:20Z

Suppose that y can contain NaN values if they aren't labeled. In that case, we may want to favor a subset of these NaN values. In particular: if they differ substantially from the already labeled datapoints.

The idea here is that we may be able to sample more diverse datapoints.

The text was updated successfully, but these errors were encountered:

koaning · 2021-11-30T08:44:24Z

Snorkel seems to have a similar notion with their ABSTAIN label. In their case -1 indicates "no label" and positive integers would indicate a label. Not 100% sure if I want to commit to treating -1 as a special citizen though. Maybe nan is better. Dunno yet.

glevv · 2021-12-13T12:57:33Z

It should be doable with the help of sklearn semi-supervised methods

koaning · 2021-12-13T13:02:56Z

Before working on an implementation it would be good to first confirm via a relevant example that the approach has merit to it. But I agree that it'd be grand to re-use sklearn tools.

Garve · 2021-12-15T09:10:24Z

I don't exactly understand the goal here. So, the usual reasons should output weirdly labeled samples in some way.
Here, you want to output weird samples, right? So we mix two goals here. If the ensemble outputs indices, we first have to check if there is a label assigned to the corresponding sample (check the label in this case) or not (label this yourself to make it easier for any model because this is a unique or perhaps hard to classify sample).

Do you mean it like this?

koaning · 2021-12-15T21:02:21Z

You're right to say this library tries to help find "weirdly labeled samples". But ... the idea is that we may also want to find examples that haven't been labeled yet but which certainly deserve attention. For example; one could argue that an outlier deserves attention, even without a label attached.

What I'm proposing here is that we might also help the user find examples worth checking even if there are only a few labels. The user-case here might be the early phase of a project where we only have relatively few datapoints with labels.

Garve · 2021-12-15T21:12:35Z

This is also something that we can do with ModAL already, as great active learning library.

koaning · 2021-12-15T21:14:01Z

Never heard of that, got a link?

Garve · 2021-12-15T21:23:45Z

Here.

I think it also has some overlap with what you want for doubtlab.

koaning · 2021-12-15T21:57:55Z

Hah! How have I not know about this library before. It's grand!

I think we might be able to host some specific query strategies in this library (for text/images specifically let's say). But before going there I may just make some calmcode videos on this library. It looks really well designed.

Garve · 2021-12-15T22:23:50Z

Happy to inspire you! I also like this one a lot.

koaning closed this as completed Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assign Doubt for Dissimilarity from Labelled Set #12

Assign Doubt for Dissimilarity from Labelled Set #12

koaning commented Nov 14, 2021 •

edited

koaning commented Nov 30, 2021

glevv commented Dec 13, 2021

koaning commented Dec 13, 2021

Garve commented Dec 15, 2021

koaning commented Dec 15, 2021

Garve commented Dec 15, 2021

koaning commented Dec 15, 2021

Garve commented Dec 15, 2021

koaning commented Dec 15, 2021

Garve commented Dec 15, 2021

Assign Doubt for Dissimilarity from Labelled Set #12

Assign Doubt for Dissimilarity from Labelled Set #12

Comments

koaning commented Nov 14, 2021 • edited

koaning commented Nov 30, 2021

glevv commented Dec 13, 2021

koaning commented Dec 13, 2021

Garve commented Dec 15, 2021

koaning commented Dec 15, 2021

Garve commented Dec 15, 2021

koaning commented Dec 15, 2021

Garve commented Dec 15, 2021

koaning commented Dec 15, 2021

Garve commented Dec 15, 2021

koaning commented Nov 14, 2021 •

edited