Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite-utils transform/insert --detect-types #179

Closed
simonw opened this issue Sep 26, 2020 · 4 comments
Closed

sqlite-utils transform/insert --detect-types #179

simonw opened this issue Sep 26, 2020 · 4 comments
Labels
cli-tool duplicate This issue or pull request already exists enhancement New feature or request python-library

Comments

@simonw
Copy link
Owner

simonw commented Sep 26, 2020

Idea from simonw/datasette-edit-schema#13 - provide Python utility methods and accompanying CLI options for detecting the likely types of TEXT columns.

So if you have a text column that actually contained exclusively integer string values, it can let you know and let you run transform against it.

@simonw
Copy link
Owner Author

simonw commented Sep 26, 2020

SQL query for detecting integers:

select
  'contains_non_integer' as result
from
  mytable
where
  cast(cast(mycolumn AS INTEGER) AS TEXT) != mycolumn
limit
  1

This will return a single row with a 1 as soon as it comes across a column that contains a non-integer - so it short circuits quickly on TEXT columns with non-integers in them.

If everything in the column is an integer it will scan the whole thing before returning no rows.

More extensive demo:

select
  value,
  cast(cast(value AS INTEGER) AS TEXT) = value as is_valid_int
from
  (
    select
      '1' as value
    union
    select
      '1.1' as value
    union
    select
      'dog' as value
    union
    select
      null as value
  )

https://latest.datasette.io/fixtures?sql=select%0D%0A++value%2C%0D%0A++cast%28cast%28value+AS+INTEGER%29+AS+TEXT%29+%3D+value+as+is_valid_int%0D%0Afrom%0D%0A++%28%0D%0A++++select%0D%0A++++++%271%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++%271.1%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++%27dog%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++null+as+value%0D%0A++%29

value is_valid_int
   
1 1
1.1 0
dog 0

@simonw
Copy link
Owner Author

simonw commented Sep 26, 2020

Posed a question about this on the SQLite forum here: https://sqlite.org/forum/forumpost/ab0dcd66ef

@simonw
Copy link
Owner Author

simonw commented Sep 27, 2020

This recipe looks like it might be the way to detect floats:

select
  value,
  cast(cast(value AS REAL) AS TEXT) in (value, value || '.0') as is_valid_float
from
  (
    select
      '1' as value
    union
    select
      '1.1' as value
    union
    select
      'dog' as value
    union
    select
      null as value
  )

Demo: https://latest.datasette.io/fixtures?sql=select%0D%0A++value%2C%0D%0A++cast%28cast%28value+AS+REAL%29+AS+TEXT%29+in+%28value%2C+value+%7C%7C+%27.0%27%29+as+is_valid_float%0D%0Afrom%0D%0A++%28%0D%0A++++select%0D%0A++++++%271%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++%271.1%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++%27dog%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++null+as+value%0D%0A++%29

value is_valid_float
   
1 1
1.1 1
dog 0

@simonw
Copy link
Owner Author

simonw commented Jun 19, 2021

This work is going to happen in #282.

@simonw simonw closed this as completed Jun 19, 2021
@simonw simonw added the duplicate This issue or pull request already exists label Jun 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli-tool duplicate This issue or pull request already exists enhancement New feature or request python-library
Projects
None yet
Development

No branches or pull requests

1 participant