LFAnalysis - Get Incorrect Instances #1602

DavidKoleczek · 2020-08-04T13:29:04Z

Is your feature request related to a problem? Please describe.

When using LFAnalysis and the lf_summary method I often find myself wondering what the incorrect instances for a particular labeling function actually are. It would be useful to have a way to return all the incorrectly labeled instances for a particular LF, or optionally a sample of the incorrect instances.

Describe the solution you'd like

A new method added to LFAnalysis. This could be called lf_incorrect.
It would need to take in your data_points and corresponding Y. It would then return the instances from data_points that do not correspond to Y. Since all the other lf_ methods work for each LF, I think this could return a dictionary mapping LF names to their incorrectly labeled instances.
If large datasets with a lot of incorrect instances are a concern, I could add an optional parameter “max_instances” to return.

Additional context

This is something I would be looking to submit a PR for.

The text was updated successfully, but these errors were encountered:

bhancock8 · 2020-08-06T23:19:22Z

Great idea! Similar functionality exists in the get_label_buckets method under snorkel/analysis/error_analysis: https://github.com/HazyResearch/snorkel/blob/e316d5700cbfd2243c0d5485537ef310fc0e7a1e/snorkel/analysis/error_analysis.py#L9. To use it, you would pass a gold labels vector and an LF labels vector, and that will return different error buckets you could pull from to get the indices of the corresponding data points where the LF was incorrect. If you wanted to submit a PR that wraps that method and has the functionality you described, that'd be great! You could likely stick it in that same error_analysis file.

Addresses #1602. Added a method to analysis/error_analysis that wraps get_label_buckets functionality. Given a bucket, a NumPy array x of your data, and corresponding y label(s), it will return to you x with only the instances corresponding to that bucket.

github-actions · 2020-11-05T12:21:12Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

DavidKoleczek mentioned this issue Aug 26, 2020

Issue #1602 - Add get_label_instances to Analysis #1608

Merged

5 tasks

github-actions bot added the no-issue-activity label Nov 5, 2020

github-actions bot closed this as completed Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LFAnalysis - Get Incorrect Instances #1602

LFAnalysis - Get Incorrect Instances #1602

DavidKoleczek commented Aug 4, 2020 •

edited

Loading

bhancock8 commented Aug 6, 2020

github-actions bot commented Nov 5, 2020

LFAnalysis - Get Incorrect Instances #1602

LFAnalysis - Get Incorrect Instances #1602

Comments

DavidKoleczek commented Aug 4, 2020 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Additional context

bhancock8 commented Aug 6, 2020

github-actions bot commented Nov 5, 2020

DavidKoleczek commented Aug 4, 2020 •

edited

Loading