Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"sqlite-utils convert" command to replace the separate "sqlite-transform" tool #251

Closed
simonw opened this issue Mar 25, 2021 · 15 comments
Closed
Labels
cli-tool enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Mar 25, 2021

See simonw/sqlite-transform#11 - I built a separate sqlite-transform tool a while ago that uses the word "transform" to means something entirely different from sqlite-utils transform - I'd like to resolve this by merging the two tools.

@simonw simonw added enhancement New feature or request cli-tool labels Mar 25, 2021
@simonw
Copy link
Owner Author

simonw commented Mar 25, 2021

Here's the full current implementation of that tool: https://github.com/simonw/sqlite-transform/blob/0.5/sqlite_transform/cli.py

My current plan is to make this functionality available as the following:

sqlite-utils convert jsonsplit mydb.db mytable mycolumn
sqlite-utils convert parsedatetime mydb.db mytable mycolumn
sqlite-utils convert parsedate mydb.db mytable mycolumn
sqlite-utils convert lambda mydb.db mytable mycolumn --code='str(value).upper()'

@simonw
Copy link
Owner Author

simonw commented Mar 25, 2021

Idea: enhance lambda to allow it to return a dictionary of values, which will then be used to populate new columns. Use a --multicolumn option to indicate this:

sqlite-utils convert lambda mydb.db mytable mycolumn \
  --code '{"first_name": value.split()[0], "last_name": value.split()[1]}' \
  --multicolumn --drop

The --drop means "drop the mycolumn column after making this change".

Maybe --multi is a better name than --multicolumn here, since either way it's going to need additional explanation somewhere.

Would this overlap with #239 at all?

@simonw
Copy link
Owner Author

simonw commented Jul 24, 2021

sqlite-utils convert jsonsplit mydb.db mytable mycolumn
sqlite-utils convert parsedatetime mydb.db mytable mycolumn
sqlite-utils convert parsedate mydb.db mytable mycolumn
sqlite-utils convert lambda mydb.db mytable mycolumn --code='str(value).upper()'

This is a bit verbose - and having added --multi and --output the lambda command keeps getting more and more flexible compared to the others.

New idea: ditch the sub-sub-commands and move the jsonsplit and parsedate recipes to be options of convert - maybe like this:

sqlite-utils convert my.db mytable col1 --jsonsplit

or:

sqlite-utils convert my.db mytable col1 --recipe jsonsplit

or -r jsonsplit for short.

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

Note that there's already a concept of conversions which might be confused with convert? https://sqlite-utils.datasette.io/en/stable/python-api.html#converting-column-values-using-sql-functions

db["example"].insert({
    "name": "The Bigfoot Discovery Museum"
}, conversions={"name": "upper(?)"})

I think that's OK though - that's a Python library feature, sqlite-utils convert is a CLI thing.

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

Problem with the -r/--recipe idea: the parsedate and parsedatetime and jsonsplit recipes in the current sqlite-transform tool all take additional options.

For sqlite-transform parsedate and parsedatetime:

@click.option(
    "--dayfirst",
    is_flag=True,
    help="Assume day comes first in ambiguous dates, e.g. 03/04/05",
)
@click.option(
    "--yearfirst",
    is_flag=True,
    help="Assume year comes first in ambiguous dates, e.g. 03/04/05",
)

For jsonsplit:

@click.option("--delimiter", default=",", help="Delimiter to split on")
@click.option(
    "--type",
    type=click.Choice(("int", "float")),
    help="Type to use for values - int or float (defaults to string)",
)

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

Back to the design board then. One way to handle this would be the long-form:

sqlite-utils convert jsonsplit mydb.db mytable mycolumn
sqlite-utils convert parsedatetime mydb.db mytable mycolumn
sqlite-utils convert parsedate mydb.db mytable mycolumn
sqlite-utils convert lambda mydb.db mytable mycolumn --code='str(value).upper()'

I like the idea that lambda is the default action, but in this form it's required that the second argument (the word after convert) be the name of the recipe that is being applied to avoid any potential confusion with the database filename.

An ugly solution would be to make all four of those options available on sqlite-utils convert - and return an error if you try and use one of those without specifying the accompanying recipe. That's a bit gross though.

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

Or.... how about making the parsedate() and parsedatetime() and jsonsplit() functions available within the namespace that is configured for the --code block?

Then you could do something like this:

sqlite-utils convert mydb.db mytable col --code 'parsedatetime(value, dayfirst=True)'
sqlite-utils convert mydb.db mytable col --code 'jsonsplit(value, delimiter=":")'

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

That's a pretty neat fix, though it's a bit more challenging on the documentation front - maybe the help text for sqlite-utils convert --help gets a fair bit longer?

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

I could stick them in a recipe namespace so you do this:

sqlite-utils convert mydb.db mytable col --code 'recipe.parsedatetime(value, dayfirst=True)'
sqlite-utils convert mydb.db mytable col --code 'recipe.jsonsplit(value, delimiter=":")'

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

In which case I think --code should be a positional argument instead:

sqlite-utils convert mydb.db mytable col 'recipe.parsedatetime(value, dayfirst=True)'
sqlite-utils convert mydb.db mytable col 'recipe.jsonsplit(value, delimiter=":")'
sqlite-utils convert mydb.db mytable col 'recipe.jsonsplit(value, delimiter=":")'
sqlite-utils convert mydb.db mytable col '{"lower": value.lower(), "upper": value.upper()}' --multi

One problem with this: we already accept one or more columns. I think that's OK though since the code is now a required argument, so it means we have to treat everything between the table and the final code argument as a column.

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

Would make sense to accept code from standard input too:

echo 'value.upper()' | sqlite-utils convert my.db mytable col -

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

I've started an implementation in the convert branch - no documentation yet, and I've not implemented the recipes.

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

I'll finish the work on this in a PR.

@simonw
Copy link
Owner Author

simonw commented Aug 1, 2021

I'm going with recipes.jsonsplit() rather than recipe.jsonsplit() because the Python module containing the recipes will be called recipes. I'll set up a r.jsonsplit() shortcut too as a convenience.

@simonw
Copy link
Owner Author

simonw commented Aug 2, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli-tool enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant