Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFAnalysis - Get Incorrect Instances #1602

Closed
DavidKoleczek opened this issue Aug 4, 2020 · 2 comments
Closed

LFAnalysis - Get Incorrect Instances #1602

DavidKoleczek opened this issue Aug 4, 2020 · 2 comments

Comments

@DavidKoleczek
Copy link
Contributor

DavidKoleczek commented Aug 4, 2020

Is your feature request related to a problem? Please describe.

When using LFAnalysis and the lf_summary method I often find myself wondering what the incorrect instances for a particular labeling function actually are. It would be useful to have a way to return all the incorrectly labeled instances for a particular LF, or optionally a sample of the incorrect instances.

Describe the solution you'd like

A new method added to LFAnalysis. This could be called lf_incorrect.
It would need to take in your data_points and corresponding Y. It would then return the instances from data_points that do not correspond to Y. Since all the other lf_ methods work for each LF, I think this could return a dictionary mapping LF names to their incorrectly labeled instances.
If large datasets with a lot of incorrect instances are a concern, I could add an optional parameter “max_instances” to return.

Additional context

This is something I would be looking to submit a PR for.

@bhancock8
Copy link
Member

Great idea! Similar functionality exists in the get_label_buckets method under snorkel/analysis/error_analysis: https://github.com/HazyResearch/snorkel/blob/e316d5700cbfd2243c0d5485537ef310fc0e7a1e/snorkel/analysis/error_analysis.py#L9. To use it, you would pass a gold labels vector and an LF labels vector, and that will return different error buckets you could pull from to get the indices of the corresponding data points where the LF was incorrect. If you wanted to submit a PR that wraps that method and has the functionality you described, that'd be great! You could likely stick it in that same error_analysis file.

bhancock8 pushed a commit that referenced this issue Sep 5, 2020
Addresses #1602. Added a method to analysis/error_analysis that wraps get_label_buckets functionality. Given a bucket, a NumPy array x of your data, and corresponding y label(s), it will return to you x with only the instances corresponding to that bucket.
@github-actions
Copy link

github-actions bot commented Nov 5, 2020

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants