Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable Named Parameters #980

Merged
merged 16 commits into from
Jan 29, 2024
Merged

Conversation

bryannho
Copy link

@bryannho bryannho commented Jan 12, 2024

Describe your changes

  • Edited the named_parameters option in config to have 3 options:
    • warn: default, it behaves the same as we set it to right now False
    • enabled: behaves as setting to True
    • disabled: completely disables named parameters using exec_driver_sql
  • Created a new file, src/sql/traits.py, which holds a custom traitlet type for named_parameters. This is necessary for backward compatibility ie. converting named_parameters=True to named_parameters="enabled". The Unicode trait doesn't allow you to do this by default. For more info see here: https://traitlets.readthedocs.io/en/stable/defining_traits.html
  • When named parameters are disabled, connection.py will now use self._connection.exec_driver_sql() rather than self._connection.execute(). This allows execution through sqlalchemy while bypassing the parsing of bind parameters and other sql compilation steps. Found this solution here.
  • Updated doc/api/configuration.md and doc/user-guide/template.md to with new feature description

Issue number

Closes #972 and #971

Checklist before requesting a review


📚 Documentation preview 📚: https://jupysql--980.org.readthedocs.build/en/980/

@bryannho bryannho changed the title 972 bind params Disable Named Parameters Jan 12, 2024
@bryannho bryannho marked this pull request as ready for review January 16, 2024 16:00
@bryannho bryannho requested review from neelasha23 and edublancas and removed request for edublancas January 16, 2024 16:00
@bryannho
Copy link
Author

bryannho commented Jan 16, 2024

weird, RTD build is failing but all the other tests pass. not sure if it's a bug I caused but looking into this now. should still be ready for review!

@edublancas
Copy link

I see some errors in the doc building logs:

    raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
%%sql
SELECT *
FROM languages
WHERE rating > :rating
------------------

----- stderr -----
RuntimeError: (Your query contains named parameters (rating) but the named parameters feature is 'false'. Enable it with: %config SqlMagic.named_parameters='true' or completely disable it with: %config SqlMagic.named_parameters='disable')
(sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter 'rating'
[SQL: SELECT *
FROM languages
WHERE rating > ?]
[parameters: [{}]]
(Background on this error at: https://sqlalche.me/e/20/cd3x)
If you need help solving this issue, send us a message: https://ploomber.io/community
------------------



 [mystnb.exec]
/home/docs/checkouts/readthedocs.org/user_builds/jupysql/checkouts/980/doc/api/configuration.md: WARNING: Notebook exception traceback saved in: /home/docs/checkouts/readthedocs.org/user_builds/jupysql/checkouts/980/_readthedocs/html/reports/api/configuration.err.log [mystnb.exec]
/home/docs/checkouts/readthedocs.org/user_builds/jupysql/checkouts/980/doc/user-guide/template.md: WARNING: Executing notebook failed: CellExecutionError
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupysql/conda/980/lib/python3.10/site-packages/jupyter_cache/executors/utils.py", line 58, in single_nb_execution
    executenb(
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupysql/conda/980/lib/python3.10/site-packages/nbclient/client.py", line 1305, in execute
    return NotebookClient(nb=nb, resources=resources, km=km, **kwargs).execute()
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupysql/conda/980/lib/python3.10/site-packages/jupyter_core/utils/__init__.py", line 166, in wrapped
    return loop.run_until_complete(inner)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupysql/conda/980/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupysql/conda/980/lib/python3.10/site-packages/nbclient/client.py", line 705, in async_execute
    await self.async_execute_cell(
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupysql/conda/980/lib/python3.10/site-packages/nbclient/client.py", line 1058, in async_execute_cell
    await self._check_raise_for_error(cell, cell_index, exec_reply)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupysql/conda/980/lib/python3.10/site-packages/nbclient/client.py", line 914, in _check_raise_for_error
    raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
%%sql
SELECT *
FROM penguins.csv
WHERE sex = :sex
------------------

----- stderr -----
RuntimeError: (Your query contains named parameters (sex) but the named parameters feature is 'false'. Enable it with: %config SqlMagic.named_parameters='true' or completely disable it with: %config SqlMagic.named_parameters='disable')
(sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter 'sex'
[SQL: SELECT *
FROM penguins.csv
WHERE sex = ?]
[parameters: [{}]]
(Background on this error at: https://sqlalche.me/e/20/cd3x)
If you need help solving this issue, send us a message: https://ploomber.io/community
------------------

looks like some examples are failing because they're executed with the named parameters option disabled?

@neelasha23
Copy link

neelasha23 commented Jan 16, 2024

This section also needs to be modified: https://jupysql.ploomber.io/en/latest/api/configuration.html#named-parameters

Please change to %config SqlMagic.named_parameters='true'. Also, add more details about the options 'true', 'false', 'disable' and what's the difference b/w false and disable. @bryannho

Another observation:

With %config SqlMagic.named_parameters="false" the query raises error:

Screenshot 2024-01-17 at 12 32 43 AM

But if i change it to disable it still throws error:

Screenshot 2024-01-17 at 12 32 58 AM

We can provide more context here like provide link to the docs of named_parameter.

@bryannho
Copy link
Author

@edublancas Fixed the documentation errors.

@neelasha23 I updated the documentation with more context here. I also included a new error message when the error is thrown but named parameters are disabled here.

src/sql/magic.py Outdated Show resolved Hide resolved
src/tests/test_magic.py Outdated Show resolved Hide resolved
src/tests/test_magic.py Outdated Show resolved Hide resolved
@bryannho
Copy link
Author

@edublancas @neelasha23 Addressed comments and made some changes, code should be cleaner and better documented. Ready for another review!

@edublancas
Copy link

@neelasha23 please review

src/sql/run/run.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved

## Disabling named parameters

The named parameters option can be _Disabled_ using `%config SqlMagic.named_parameters = 2`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using numbers? they're not very informative?

it'd be better to use strings. i saw some comments about it but it's still unclear

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was made per @neelasha23's comment here. The thought was users would confuse string values with boolean, ie. they would write %config SqlMagic.named_parameters=false instead of %config SqlMagic.named_parameters="false" so the integer values were more of a distinction. @edublancas I can change it based on whatever you and @neelasha23 decide

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this issue has been stuck for a while so let's simplify this so we can merge this asap.

currently, %config SqlMagic.named_parameters=false is the default value. if it detects that your query has :variable, it'll produce this error:

In [5]: %sql select * from penguins.csv limit :limit
Running query in 'duckdb://'
RuntimeError: (Your query contains named parameters (limit) but the named parameters feature is disabled. Enable it with: %config SqlMagic.named_parameters=True)
(sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter 'limit'
[SQL: select * from penguins.csv limit $1]
[parameters: [{}]]
(Background on this error at: https://sqlalche.me/e/20/cd3x)
If you need help solving this issue, send us a message: https://ploomber.io/community

I did this to help users, and I didn't anticipate that this would cause issues. so, let's do this:

let's keep true/false as the allowed values. but false should now complete disable named parameters, effectively fixing: #972

let's not add any deprecation warnings, we'll just break the API. we try not to break the API that often but this is a niche feature that I don't think many users know about. since this will break the API, we'll have to change the current development version from 0.10.8dev to 0.11.0dev

questions @bryannho?

@@ -712,18 +712,18 @@ def dialect(self):
def driver(self):
return self._driver

def _connection_execute(self, query, parameters=None):
def _connection_execute(self, query, parameters={}):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default function parameters should not be set this way, because Python will re-use the same dictionary across calls, which can lead to shared stated.

None is the way to go here. any reason why you did this change?

Copy link
Author

@bryannho bryannho Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when named_parameters was false, parameters = None. Then when named_parameters was true, parameters = user_ns, a non-empty dict.

I made this change so that we can add a third option in as disable, and this order seemed the most logical:

  • When named_parameters is disabled, parameters = None.
  • When named_parameters is false, parameters = {} (an empty dict). This is the default.
  • When named_parameters is true, parameters = user_ns (a non-empty dict).

Because the false option was the default, I changed the default for parameters to be {}.

To avoid this issue I can change it to this:

  • When named_parameters is disabled, parameters = {} (an empty dict).
  • When named_parameters is false, parameters = None
  • When named_parameters is true, parameters = user_ns (a non-empty dict).

@edublancas would this make sense?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutable objects should never be passed as default values in Python functions. see this

whatever logic you're implementation, find an alternative approach because passing parameters={} will have undesirable side-effects

@bryannho bryannho marked this pull request as ready for review January 24, 2024 06:37
@edublancas
Copy link

@bryannho, the last point you mentioned is a bit concerning.

can you ask around? try opening an issue on sqlalchemy's github and asking a question on stack overflow.

ideally, we want a simple way to turn off parameter binding without any extra side effects. since this is the default behavior, it might bring undesired side effects for many users.

@bryannho
Copy link
Author

@edublancas I agree there shouldn't be any extra side effects besides disabling bound parameters. I will ask around / create some issues and see what I can find!

@bryannho
Copy link
Author

@edublancas I asked the question on the sqlalchemy github and they provided me the same answer - there isn't a simple way to turn off parameter binding without any extra side effects. I have asked the question in a few other places though and I can keep looking.

One thing we could do is escape every instance of :variable, then pass it to the normal execute() function. Upon manual testing it seems to work but there could be some corner cases. We already do a similar thing with :"variable" so it shouldn't be too difficult. It's not ideal but it allows us to avoid extra side effects - would you like me to try this?

@edublancas
Copy link

in the question you asked you mentioned this:

compilation/transpilation/rollback

what are these steps?

@bryannho
Copy link
Author

what are these steps?

@edublancas While I'm not fully aware of the exact steps sqlalchemy takes under the hood, my understanding was that when you call .execute(), it does some pre-execution steps like compiling the parameters into the SQL statement, then calls .exec_driver_sql() which executes the compiled statement, then evaluates that response and conducts some post-execution steps like a rollback if necessary.

I assumed that missing these pre-execution or post-execution steps is what made tests like this and this fail when we switched from execute() to exec_driver_sql().

From the docs: exec_driver_sql() Executes a string SQL statement on the DBAPI cursor directly, without any SQL compilation steps.

The docs for execute imply that it handles other steps/options along with the actual sql execution.

I used compilation/transpilation/rollback to mean any steps that execute() takes that don't relate to bound parameters, since we want to simply and exclusively disable bound parameters without affecting anything else.

@edublancas
Copy link

edublancas commented Jan 26, 2024

ok, thanks for digging into this. here's my feedback.

let's name named_parameters a string parameter that takes three values

  • warn: default, it behaves the same as we set it to right now False
  • enabled: behaves as setting to True
  • disabled: completely disables named parameters, let's try first with exec_driver_sql and see if users encounter errors

we need to modify the error message displayed when named_parameters=warn and explain users about the two other options enabled/disabled, along with a link to the docs for a broader explanation.

to keep backward compatibility, we should allow users to set False (which should translate to warn) and True (translate to enabled), but if those values are set, we should display a warning saying they should change their code to set it to enable/disable

@bryannho
Copy link
Author

to keep backward compatibility, we should allow users to set False (which should translate to disabled) and True (translate to enabled)

@edublancas just to clarify, if a user set named parameters to False, you want it to translate to disabled rather than warn? False translating to disabled makes logical sense, but it may break things for some users.

Before this update, users built things with the expectation that False equates to the warn behavior. If it now translates to disabled, then user's old code will start behaving differently. I think the warning will give users enough context to correct the issue, but I just wanted to clarify that this may break things for some users. To ensure full backwards compatibility without changing any 'default' behavior, I think it might be better to translate False to warn, and provide users a link to the new option disable.

@edublancas
Copy link

I just realized that my original comment was inaccurate. yes False should translate to warn to keep backward compatibility

@bryannho
Copy link
Author

@edublancas @neelasha23 Ready for another review.

  • named_parameters is now a string that accepts 3 values: warn, enabled, and disabled
  • For backwards compatibility, if users enter named_parameters as False it translates to warn. True translates to enabled. I had to define a new traitlet type Parameters for this, since the validate() function for Unicode types doesn't allow you to translate boolean entries into string values.
  • Modified the error message displayed when named_parameters=warn with explanation about the two other options enabled/disabled, along with a link to the docs for a broader explanation.
  • Added a new error message when named_parameters=disable and an exception occurs. it tells users that named parameters is disabled (in case they aren't aware) and provides a link to the docs.
  • Updated documentation to reflect these changes and add some new context

Copy link

@edublancas edublancas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works well, just minor comments!

src/sql/connection/connection.py Outdated Show resolved Hide resolved
src/sql/connection/connection.py Outdated Show resolved Hide resolved
src/sql/traits.py Outdated Show resolved Hide resolved
doc/api/configuration.md Show resolved Hide resolved
@edublancas edublancas merged commit dbe2681 into ploomber:master Jan 29, 2024
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A value is required for bind parameter 'fruit' (snowflake)
3 participants