Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest type conversions for TEXT columns #13

Closed
simonw opened this issue Sep 25, 2020 · 10 comments
Closed

Suggest type conversions for TEXT columns #13

simonw opened this issue Sep 25, 2020 · 10 comments
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@simonw
Copy link
Owner

simonw commented Sep 25, 2020

It would be great to suggest things like "the category column is all numbers but is currently TEXT, you could convert it to INTEGER"

@simonw simonw added the enhancement New feature or request label Sep 25, 2020
@simonw
Copy link
Owner Author

simonw commented Sep 25, 2020

Here's a query that can do that:

select
  'contains_non_integer' as col
from
  sortable
where
  cast(cast(sortable AS INTEGER) AS TEXT) != sortable
limit
  1

This will return 0 results if every column is an integer - it will shortcut the query and return a result as soon as it finds a non-integer column.

Example for a column containing only integers: https://latest.datasette.io/fixtures?sql=select%0D%0A++%27contains_non_integer%27+as+col%0D%0Afrom%0D%0A++sortable%0D%0Awhere%0D%0A++cast%28cast%28sortable+AS+INTEGER%29+AS+TEXT%29+%21%3D+sortable%0D%0Alimit%0D%0A++1

And for a column that instead contains floating point: https://latest.datasette.io/fixtures?sql=select%0D%0A++%27contains_non_integer%27+as+col%2C+*%0D%0Afrom%0D%0A++sortable%0D%0Awhere%0D%0A++cast%28cast%28sortable_with_nulls+AS+INTEGER%29+AS+TEXT%29+%21%3D+sortable_with_nulls%0D%0Alimit%0D%0A++1

@simonw
Copy link
Owner Author

simonw commented Sep 25, 2020

Maybe sqlite-utils should grow utility functions for guessing types in this way.

@simonw
Copy link
Owner Author

simonw commented Sep 25, 2020

Idea: return the first detected value that is not an integer. Then code can see if that value is not-a-float - if it's obviously not-a-float we don't have to run the float detection on it.

@simonw simonw changed the title Suggest type conversions for columns Suggest type conversions for TEXT columns Sep 25, 2020
@simonw
Copy link
Owner Author

simonw commented Sep 25, 2020

Running this query against my dogsheep-beta index is a good test, because the key column there starts with 25,000 numeric tweet IDs before getting the first non-integer GitHub commit. It takes 421ms.

select
  'contains_non_integer' as col, *
from
  search_index
where
  cast(cast(key AS INTEGER) AS TEXT) != key
limit
  1

@simonw
Copy link
Owner Author

simonw commented Sep 25, 2020

This method works for detecting integers but doesn't work for floats - because cast(cast(key AS FLOAT) AS TEXT) run against e.g. 415 produces 415.0.

@simonw
Copy link
Owner Author

simonw commented Sep 25, 2020

@simonw
Copy link
Owner Author

simonw commented Oct 5, 2021

This would be really useful, especially as a prompt that displays at the top of the table page (via a fetch() so as not to delay page loading).

Every time I demonstrate CSV import to someone I have to give them a little spiel about how important it is to ensure the types are correct - this would be a good way to improve that.

@simonw
Copy link
Owner Author

simonw commented Oct 5, 2021

In simonw/sqlite-utils#282 I added the sqlite-utils insert --detect-types mechanism, but it doesn't use any of the clever SQL from this thread - it instead wraps the ingested CSV data in a not-yet-documented TypeTracker class and uses that to figure out what the types should be.

So there's no sqlite-utils Python library solution for detecting types for an existing table yet.

@simonw
Copy link
Owner Author

simonw commented Oct 18, 2021

Could I use a custom sqlite3 aggregate Python function for this? Might be a bit slow.

@simonw
Copy link
Owner Author

simonw commented Sep 8, 2023

I'm not going to do this. The new example display from here solves the same problem:

@simonw simonw closed this as completed Sep 8, 2023
@simonw simonw added the wontfix This will not be worked on label Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant