Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table.convert() skips falsey values #527

Closed
mcarpenter opened this issue Feb 10, 2023 · 5 comments
Closed

Table.convert() skips falsey values #527

mcarpenter opened this issue Feb 10, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@mcarpenter
Copy link
Contributor

Summary

By design, Table.convert() does not attempt conversion of falsey values (None, "", 0, ...). This is surprising (directly contradicts the docstring) and convert() may quietly skip cells where the user assumed a conversion would take place.

Example

Increment a column of integers by one

from sqlite_utils import Database

db = Database(memory=True)
table = db['table']
col = 'x'
table.insert_all([{col: 0}, {col:1}])
print(table.get(1)) # 0
print(table.get(2)) # 1
print()

table.convert(col, lambda x: x+1)
print(table.get(1)) # got 0, expected 1 ⚠⚠⚠
print(table.get(2)) # got 2, expected 2

Another example might be, say, transforming cells containing empty string to NULL.

Discussion

This was, I think, a pragmatic choice so that consumers can skip writing guard clauses for these falsey values (particularly from the CLI). But this surprising undocumented behavior can lead to incorrect data. I don't think this is a good trade-off between convenience and correctness.

In the absence of this convenience users will either have to write guard clauses into their conversion expressions (or adapt the called function to do the same), so:

    fn(value) if value else value

instead of:

    fn(value)

This is more typing and sometimes I will forget, and there will be errors. (But they will be noisy errors, which is a good thing).

Such a change will certainly inconvenience some existing consumers; there will be some breakage. But I think this is worth it to avoid quietly not converting some values by default, which can lead to quietly bad data.

I have a PR that I will attach, please take a look and see what you think.

@simonw simonw added the bug Something isn't working label Apr 13, 2023
@simonw
Copy link
Owner

simonw commented Apr 13, 2023

I agree, this is a design flaw.

It's technically a breaking change. As such, I would need to bump to v4 to responsibly release this.

I'd rather bundle in a few more breaking changes before shipping that version.

One thing we could do here is add an extra argument to .convert() - something like this:

table.convert(col, lambda x: x+1, skip_false=False)

This would trigger the new, improved behaviour without breaking existing code that depends on how it works at the moment.

Then in sqlite-utils 4 we can change the default of that option.

What do you think?

(I'm open to suggestions for better names for this parameter too)

simonw added a commit that referenced this issue May 8, 2023
@simonw
Copy link
Owner

simonw commented May 8, 2023

OK, I implemented that at the Python API level. I need to decide how it should work for the sqlite-utils convert command too: https://sqlite-utils.datasette.io/en/stable/cli-reference.html#convert

@simonw
Copy link
Owner

simonw commented May 8, 2023

I'm going to go with --no-skip-false as the CLI option. It's ugly, but this whole thing is ugly. I'm going to make a note to remove this misfeature in sqlite-utils 4.

simonw added a commit that referenced this issue May 8, 2023
@mcarpenter
Copy link
Contributor Author

Sorry, I completely missed your first comment whilst on Easter break.

This looks like a good practical compromise before v4. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants