New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assign Doubt for Dissimilarity from Labelled Set #12
Comments
Snorkel seems to have a similar notion with their |
It should be doable with the help of sklearn semi-supervised methods |
Before working on an implementation it would be good to first confirm via a relevant example that the approach has merit to it. But I agree that it'd be grand to re-use sklearn tools. |
I don't exactly understand the goal here. So, the usual reasons should output weirdly labeled samples in some way. Do you mean it like this? |
You're right to say this library tries to help find "weirdly labeled samples". But ... the idea is that we may also want to find examples that haven't been labeled yet but which certainly deserve attention. For example; one could argue that an outlier deserves attention, even without a label attached. What I'm proposing here is that we might also help the user find examples worth checking even if there are only a few labels. The user-case here might be the early phase of a project where we only have relatively few datapoints with labels. |
This is also something that we can do with ModAL already, as great active learning library. |
Never heard of that, got a link? |
I think it also has some overlap with what you want for doubtlab. |
Hah! How have I not know about this library before. It's grand! I think we might be able to host some specific query strategies in this library (for text/images specifically let's say). But before going there I may just make some calmcode videos on this library. It looks really well designed. |
Happy to inspire you! I also like this one a lot. |
Suppose that
y
can containNaN
values if they aren't labeled. In that case, we may want to favor a subset of theseNaN
values. In particular: if they differ substantially from the already labeled datapoints.The idea here is that we may be able to sample more diverse datapoints.
The text was updated successfully, but these errors were encountered: