-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table.convert()
skips falsey values
#527
Comments
I agree, this is a design flaw. It's technically a breaking change. As such, I would need to bump to v4 to responsibly release this. I'd rather bundle in a few more breaking changes before shipping that version. One thing we could do here is add an extra argument to table.convert(col, lambda x: x+1, skip_false=False) This would trigger the new, improved behaviour without breaking existing code that depends on how it works at the moment. Then in What do you think? (I'm open to suggestions for better names for this parameter too) |
OK, I implemented that at the Python API level. I need to decide how it should work for the |
I'm going to go with |
Sorry, I completely missed your first comment whilst on Easter break. This looks like a good practical compromise before v4. Thanks! |
Summary
By design,
Table.convert()
does not attempt conversion of falsey values (None
,""
,0
, ...). This is surprising (directly contradicts the docstring) andconvert()
may quietly skip cells where the user assumed a conversion would take place.Example
Increment a column of integers by one
Another example might be, say, transforming cells containing empty string to
NULL
.Discussion
This was, I think, a pragmatic choice so that consumers can skip writing guard clauses for these falsey values (particularly from the CLI). But this surprising undocumented behavior can lead to incorrect data. I don't think this is a good trade-off between convenience and correctness.
In the absence of this convenience users will either have to write guard clauses into their conversion expressions (or adapt the called function to do the same), so:
instead of:
This is more typing and sometimes I will forget, and there will be errors. (But they will be noisy errors, which is a good thing).
Such a change will certainly inconvenience some existing consumers; there will be some breakage. But I think this is worth it to avoid quietly not converting some values by default, which can lead to quietly bad data.
I have a PR that I will attach, please take a look and see what you think.
The text was updated successfully, but these errors were encountered: