Skip to content

feat(sql): add dialect-aware SQL escape helper#8

Open
MiroCillik wants to merge 13 commits intomainfrom
feat/sdk-quote-helper
Open

feat(sql): add dialect-aware SQL escape helper#8
MiroCillik wants to merge 13 commits intomainfrom
feat/sdk-quote-helper

Conversation

@MiroCillik
Copy link
Copy Markdown
Member

@MiroCillik MiroCillik commented Apr 22, 2026

Summary

Adds a dialect-aware SQL escape helper to the Keboola Query Service Python SDK so users can safely interpolate untrusted values into raw SQL. Addresses the SQL injection gap flagged in the Storage Access docs review (PR keboola/connection-docs#910).

New public API under from keboola_query_service import SQL:

  • SQL(dialect) — factory bound to "snowflake" or "bigquery" (validates at construction)
  • sql.literal(value) — escapes None / bool / int / float / str / date / datetime / list / tuple; rejects unknown types with a clear message suggesting str(value)
  • sql.ident(*parts) — quoted identifier joined with ., dots inside a part are preserved (so "in.c-main"."customers" works)
  • sql.date(value) — explicit DATE literal from date or "YYYY-MM-DD" string
  • sql.raw(s) — reviewed escape hatch for pre-formed SQL fragments (CURRENT_TIMESTAMP, function calls)
  • sql.format(template, **values)str.format-style named interpolation, every value routed through literal() unless already SafeSql

Usage:

from keboola_query_service import SQL

sql = SQL("snowflake")
query = sql.format(
    "UPDATE {t} SET status = {status}, updated_at = {ts} WHERE id = {id}",
    t=sql.ident("in.c-main", "approvals"),
    status="approved",
    ts=sql.raw("CURRENT_TIMESTAMP"),
    id=123,
)

Correctness decisions worth noting

  • Snowflake strings escape backslash (\\\). Snowflake interprets backslash sequences inside single-quoted literals (\n, \t, \\…), so a naive escape that only doubled single quotes would corrupt round-trips for any string containing \. Covered by a dedicated regression test.
  • bool dispatched before intisinstance(True, int) is True, so the order matters. Regression test in TestLiteralBoolBeforeInt.
  • Empty lists emit (NULL), not ()IN () is a syntax error; IN (NULL) returns no rows (semantically correct for an empty set).
  • Datetime shape → SQL type mapping, no UTC conversion — naive datetimes emit TIMESTAMP_NTZ / DATETIME; tz-aware emit TIMESTAMP_TZ / TIMESTAMP with the original offset.
  • format_spec rejectedsql.format("{p:.2f}", p=1.2345) raises ValueError rather than silently dropping the format spec.
  • Positional placeholders rejected{0} / {} raise IndexError (named-only in v1).

Cross-SDK byte-equality

This PR pairs with keboola/query-service-api-js-sdk#3. Both produce byte-identical output for the docs example (verified: same 101-byte string, same MD5).

Specs and plan

Now live in this repo alongside the feature they describe:

Test plan

  • 96 new unit tests in tests/test_sql.py — full suite passes (pytest tests/ -v → 109 passed)
  • ruff check clean on src/ and new test file (pre-existing tests/test_client.py issues are unrelated to this PR)
  • mypy strict clean on src/keboola_query_service/
  • Byte-equality with the JS SDK verified for the docs example
  • Task 23 — manual Snowflake round-trip (deferred, will address manually) — run a statement against a real Snowflake Storage Access workspace with a mix of types (string with ', string with \, date, tz-aware datetime, IN list, sql.raw("CURRENT_TIMESTAMP")) and confirm the backslash-containing string round-trips byte-for-byte
  • Task 24 — release prep (deferred, will address manually) — version bump (currently 0.1.7), _version.py, and README snippet

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant