Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results and accuracy down to 10% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5 #1587

Closed
durgeshiitj opened this issue May 6, 2020 · 5 comments
Assignees

Comments

@durgeshiitj
Copy link

Issue description

I ran snorkel(v 0.9.5) on a dataset using PandasParrallelLFApplier and to my surprise I got 10% accuracy which I was expecting to be 90%. Then tried to use PandasLFApplier just to cross verify and I got 90% accuracy. When I compared the LabelMatrixs, both were not eqauls.

Before I was using 0.9.3 never faced problem. Just to cross verify I ran the same dataset on a different sytem having version 0.9.3 with both PandasParallelLFApplier and PandasLFApplier and found that in 0.9.3, both are yielding same Label-Matrix and same accuracy with same LFAnalysis.

Expected behavior

Both LFAppliers should yield similar results.

Screenshots

I'm attaching screenshots for your reference.

V 0.9.5 Analysis:

PandasLFApplier:
nonp095

PandasParallelLFApplier:
paralle095

Label-Matrix Comparison:
npequals095

V 0.9.3 Analysis:

PandasLFApplier:
pandasLfApp

PandasParallelLFApplier:
parallel

Label-Matrix Comparison:
noeqals093

System info

  • How you installed Snorkel (conda, pip, source): PIP
  • OS: Windows/Linux
  • Python version: 3.7
  • Snorkel version: 0.9.3(Windows)/ 0.9.5(Linux)

Additional context

Please look into this asap.

@durgeshiitj durgeshiitj changed the title Different results and accuracy down to 80% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5 Different results and accuracy down to 10% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5 May 6, 2020
@henryre
Copy link
Member

henryre commented May 17, 2020

Hi @durgeshiitj, apologies for the delayed response here! This is likely due to using an unsorted index with PandasParallelLFApplier. I've opened up #1589 but in the meantime, you can just use the standard PandasLFApplier or sort your index before using PandasParallelLFApplier so that the order of the rows of L is expected.

@durgeshiitj
Copy link
Author

durgeshiitj commented May 17, 2020

Hi @durgeshiitj, apologies for the delayed response here! This is likely due to using an unsorted index with PandasParallelLFApplier. I've opened up #1589 but in the meantime, you can just use the standard PandasLFApplier or sort your index before using PandasParallelLFApplier so that the order of the rows of L is expected.

Hi Henry,
Thanks for following up.
However, I tried debugging at my end as well. I found out that the system where Snorkel 0.9.5 is installed, the Dask version was 2.14.2 and where 0.9.3 was installed the Dask version was 2.5.2.
So I tried downgrading Dask to 2.5.2 to run with Snorkel 0.9.5 and to my surprise there the PandasParallelLFApplier worked normally. So I please check that as well, because in requirement Dask version mentioned is <3 so 2.14 should not have caused any issue as well.

@henryre
Copy link
Member

henryre commented May 18, 2020

Hi @durgeshiitj, thanks for reporting and we'll look into version compatibility on our side!

@durgeshiitj
Copy link
Author

Hi @durgeshiitj, thanks for reporting and we'll look into version compatibility on our side!

I didn't get any update on the issue

@github-actions
Copy link

github-actions bot commented Oct 9, 2020

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants