Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite-utils transform should set empty strings to null when converting text columns to integer/float #488

Open
simonw opened this issue Sep 14, 2022 · 5 comments
Labels
bug Something isn't working cli-tool enhancement New feature or request python-library

Comments

@simonw
Copy link
Owner

simonw commented Sep 14, 2022

/tmp % echo "id,age,weight\n1,3,2.5\n2,," | sqlite-utils insert test.db test - --csv
/tmp % sqlite-utils schema test.db                                                  
CREATE TABLE [test] (
   [id] TEXT,
   [age] TEXT,
   [weight] TEXT
);
/tmp % sqlite-utils transform test.db test --type age integer --type weight float   
/tmp % sqlite-utils schema test.db                                               
CREATE TABLE "test" (
   [id] TEXT,
   [age] INTEGER,
   [weight] FLOAT
);
/tmp % sqlite-utils rows test.db test
[{"id": "1", "age": 3, "weight": 2.5},
 {"id": "2", "age": "", "weight": ""}]

It would be neat if this resulted in the following instead:

 {"id": "2", "age": null, "weight": null}

Related Discord discussion: https://discord.com/channels/823971286308356157/823971286941302908/1019635490833567794

@simonw
Copy link
Owner Author

simonw commented Sep 14, 2022

Frustratingly I think this counts as a backwards-incompatible change.

Could have it be opt-in with a new option / method parameter, and then change it to the default if I release a sqlite-utils 4.

@simonw
Copy link
Owner Author

simonw commented Sep 21, 2022

No, I'm going to say that this is a bug - it's WEIRD having a integer or float column containing an empty string. I'm OK changing that - I very much doubt anyone is relying on this functionality.

So no need for a new option here - just fixing the bug is sensible.

@simonw simonw added the bug Something isn't working label Sep 21, 2022
@simonw
Copy link
Owner Author

simonw commented Sep 21, 2022

New tests should go in: https://github.com/simonw/sqlite-utils/blob/main/tests/test_transform.py

I think the implementation fix needs to go near here:

copy_sql = "INSERT INTO [{new_table}] ({new_cols})\n SELECT {old_cols} FROM [{old_table}];".format(
new_table=new_table_name,
old_table=self.name,
old_cols=", ".join("[{}]".format(col) for col in old_cols),
new_cols=", ".join("[{}]".format(col) for col in new_cols),
)

The trick is going to be teaching that generated SQL to know which columns are integer or float and to convert "" to null as part of that operation.

@simonw
Copy link
Owner Author

simonw commented Sep 21, 2022

It looks like SQLite has a SELECT NULLIF(value, '') function which returns null if that value is equal to ''.

We need to only apply that function to columns that we know to be of type integer or float though - text columns containing empty strings should not be rewritten to null.

@ar-jan
Copy link

ar-jan commented Dec 23, 2022

text columns containing empty strings should not be rewritten to null.

I would actually appreciate an option to do just that for text columns as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cli-tool enhancement New feature or request python-library
Projects
None yet
Development

No branches or pull requests

2 participants