Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert() and fieldmap() swallow errors silently by default #365

Open
thatneat opened this issue Jan 18, 2016 · 16 comments · Fixed by #460

Comments

@thatneat
Copy link
Contributor

commented Jan 18, 2016

This has led to a lot of difficult-to-debug scenarios in my experience.
As you're developing an ETL pipeline, you're often iterating on your conversion functions quickly, which often leads to exceptions being raised. Currently petl eats those errors silently, making it far more difficult to navigate this process.
In the regular operation of an ETL pipeline, it's also undesirable for new, unexpected errors to pass silently.

The current workaround is to set failonerror=True in every call to fieldmap() and the like. I suggest you make failonerror=True be the default. This might be something that would require bumping to version 2.0 since it will likely cause a lot of exceptions to be raised that were previously ignored (for better or for worse!).

@alimanfoo

This comment has been minimized.

Copy link
Collaborator

commented Jan 18, 2016

Thanks @thatneat, I have contemplated this many times myself for the same reason. I would be happy to make this change.

Regarding version number, strictly speaking this probably should be a major version bump. However I don't have any other major changes planned, so maybe this could be just a minor release with some very clear documentation about the change in behaviour.

@pmariani

This comment has been minimized.

Copy link

commented Jan 18, 2016

+1.
In addition, it seems that that flag isn't available on rowgroupmap, but I am not sure why. Is there an implementation reason why it wouldn't be possible?

@alimanfoo

This comment has been minimized.

Copy link
Collaborator

commented Jan 18, 2016

That is an inconsistency, would be good to review all transform functions
and consistify error handling.

On Monday, January 18, 2016, pmariani notifications@github.com wrote:

    1. In addition, it seems that that flag isn't available on
      rowgroupmap, but I am not sure why. Is there an implementation reason why
      it wouldn't be possible?


Reply to this email directly or view it on GitHub
#365 (comment).

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Tel: +44 (0)1865 287721

@thatneat

This comment has been minimized.

Copy link
Contributor Author

commented Jan 19, 2016

Your call of course, but I'd still lean towards the major version bump, so nobody gets unwanted surprises. Plus being on version 2.0 would help give the appearance of maturity that this project deserves ;)

I could probably come up with a few more breaking changes if you'd like.

@e3krisztian

This comment has been minimized.

Copy link

commented Jan 19, 2016

I would also prefer a more strict error behavior by default.
convert is the one I have run into problems with custom functions.

TBH I considered this swallow-exceptions-by-default the only major issue with PETL but thought it is such a big change for backward compatibility that I dared not hope it would ever change.

Because this is a big user visible change, that will break a lot of existing code, it must be a major version bump and I agree with @thatneat on "version 2.0 would help give the appearance of maturity that this project deserves" :)

@alimanfoo

This comment has been minimized.

Copy link
Collaborator

commented Jan 19, 2016

Thanks @thatneat, @krisztianfekete, I would be very happy to include these changes in a 2.0 release. I'll write an email to the list as well asking if anyone has any other backwards-incompatible changes they'd like to propose for 2.0.

@rsyring

This comment has been minimized.

Copy link

commented Jan 20, 2016

FWIW, I agree with the proposal to have errors thrown by default and to bump the version to 2.0.

You may want to do an intermediate release that throws warnings when the errors are swallowed letting devs know that the 2.0 release will be releasing a breaking change in those instances. Let that simmer for 3-6 months?

@alimanfoo alimanfoo modified the milestone: v2.0 Jan 21, 2016

@alimanfoo

This comment has been minimized.

Copy link
Collaborator

commented Jan 21, 2016

On Wednesday, January 20, 2016, Randy Syring notifications@github.com
wrote:

FWIW, I agree with the proposal to have errors thrown by default and to
bump the version to 2.0.

Thanks Randy.

You may want to do an intermediate release that throws warnings when the
errors are swallowed letting devs know that the 2.0 release will be
releasing a breaking change in those instances. Let that simmer for 3-6
months?

Worth considering, but FWIW I think I'd be inclined to go straight for a
2.0 release and ensure behaviour changes are very clear in documentation
and release announcements, would be simpler to implement at least.


Reply to this email directly or view it on GitHub
#365 (comment).

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Tel: +44 (0)1865 287721

@bmaggard

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2016

Yes, please:

  • the default behaviour should be changed such that any errors are propagated by default instead.
  • make it happen in a new major release (2.0).

And, once again, thank you for this wonderful tool.

@alimanfoo

This comment has been minimized.

Copy link
Collaborator

commented Jan 25, 2016

Thanks Brad.

On Monday, 25 January 2016, bmaggard notifications@github.com wrote:

Yes, please:

  • the default behaviour should be changed such that any errors are
    propagated by default instead.
  • make it happen in a new major release (2.0).

And, once again, thank you for this wonderful tool.


Reply to this email directly or view it on GitHub
#365 (comment).

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Tel: +44 (0)1865 287721

@thatneat

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2016

Can you give an ETA for this @alimanfoo? Trying to decide whether to audit our codebase and insert a bunch of failonerror=True or wait for 2.0 to come out.

@alimanfoo

This comment has been minimized.

Copy link
Collaborator

commented Apr 11, 2016

Hi @thatneat, I won't be able to get to this for at least a couple of
months, I would suggest to go ahead and insert the failonerror=True
statements.

On Friday, April 8, 2016, thatneat notifications@github.com wrote:

Can you give an ETA for this @alimanfoo https://github.com/alimanfoo?
Trying to decide whether to audit our codebase and insert a bunch of
failonerror=True or wait for 2.0 to come out.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#365 (comment)

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721

@bmaggard

This comment has been minimized.

Copy link
Contributor

commented Jan 10, 2017

If failonerror=False, petl doesn't eat errors silently, it returns 'errorvalue'.
So, how about also adding a 'returnexceptions' option, akin to:

In [1]: import petl as etl

In [2]: t1 = etl.wrap([ ['foo'], ['A'], [1] ]).convert('foo', 'lower', errorvalue=Exception)

In [3]: t1
Out[3]:
+---------------------+
| foo                 |
+=====================+
| 'a'                 |
+---------------------+
| <class 'Exception'> |
+---------------------+

... but to have the original exceptions returned instead? This would enable categorizing failures directly.

@alimanfoo

This comment has been minimized.

Copy link
Collaborator

commented Jan 10, 2017

@thatneat

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2019

@alimanfoo just to be clear, it looks to me like #460 explicitly does not solve the named issue in this ticket, right? It looks like the default is still failonerror=False. Is there still a plan to change that in a future release?

The configuration option does look helpful but it'd be nice to know what the plan is.

@alimanfoo alimanfoo reopened this Aug 6, 2019

@alimanfoo

This comment has been minimized.

Copy link
Collaborator

commented Aug 6, 2019

Apologies, I closed this issue too hastily. Personally I'd be happy to change the default to failonerror=True but that would need a major version bump. I need to make a quick release for a downstream project, and so was taking the opportunity to merge in some PRs, but didn't want to introduce any changes that might surprise people.

How about we do a quick release 1.3 which includes the new petl.config.failonerror config option, set to False by default (current behaviour). Then follow up with a 2.0 release shortly after which switches the default to True, but precede that with a message on the mailing list to see if anyone has any objections to the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.