Skip to content

API Server#116

Merged
nicosuave merged 4 commits intomainfrom
api-http-server
Mar 29, 2026
Merged

API Server#116
nicosuave merged 4 commits intomainfrom
api-http-server

Conversation

@nicosuave
Copy link
Copy Markdown
Member

No description provided.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 46f404bc84

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +352 to +353
if ";" in normalized:
raise ValueError("Multiple SQL statements are not supported")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow semicolons inside SQL string literals

Treating any remaining ; as a multi-statement separator rejects valid single-statement SQL such as SELECT ';' AS sep or predicates with semicolons inside string literals. This makes /sql and /sql/compile fail on legitimate inputs even though they are not multiple statements; statement-count validation should be parse-based instead of a raw substring check.

Useful? React with 👍 / 👎.

Comment on lines +322 to +323
table = record_batch_reader_to_table(reader)
response_format = _resolve_response_format(request, format_override)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid materializing full result before Arrow response

The response path reads the entire RecordBatchReader into a table before deciding the output format, so Arrow requests still fully materialize and then re-serialize all rows in memory. For large query results this can cause high memory usage or OOMs and defeats the expected scalability of Arrow transport.

Useful? React with 👍 / 👎.

- Fix unused duckdb import and unsorted imports in build_demo_duckdb.py
- Use sqlglot.parse() for multi-statement detection instead of raw
  semicolon check, so semicolons inside string literals are accepted
- Stream Arrow responses directly from RecordBatchReader to IPC bytes
  without materializing the full table first
- Add tests for semicolon-in-literal and multi-statement rejection
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 49fec6607c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +81 to +82
body = await request.body()
if len(body) > self.max_body_bytes:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce body limit before reading full request into memory

The size guard still buffers the entire payload with await request.body() before rejecting, so requests without a trustworthy Content-Length (e.g., chunked uploads) can consume unbounded memory and only fail after allocation. That defeats max_request_body_bytes as a hard safety limit and can OOM the API process under large or malicious inputs.

Useful? React with 👍 / 👎.

# mistaken for statement separators.
if ";" in sql:
statements = [s.strip() for s in sql.split(";") if s.strip()]
statements = sqlglot.parse(sql, dialect=self.dialect)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep strict=False fallback when semicolon parse fails

In non-strict mode this rewriter is expected to pass through unsupported SQL, but the new semicolon handling calls sqlglot.parse(...) outside the parse fallback block. If sqlglot cannot parse a semicolon-terminated client/system statement, it now raises before the strict=False passthrough can run, which can break PostgreSQL server flows that rely on non-strict rewriting.

Useful? React with 👍 / 👎.

Comment on lines +40 to +41
if ";" in filter_str:
raise ValueError("Filter contains disallowed SQL: multi-statement input")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow semicolons in literal filter values

This rejects any filter containing ; before parsing, so valid single-expression filters like status = ';' (or other text values containing semicolons) are treated as disallowed SQL. That causes legitimate /query and /compile structured requests to fail even though no multi-statement execution is happening.

Useful? React with 👍 / 👎.

- Stream request body incrementally in MaxRequestBodyMiddleware so
  chunked uploads without Content-Length are rejected before full
  buffering (prevents OOM under large/malicious inputs)
- Wrap sqlglot.parse() in query_rewriter semicolon check with
  try/except so non-strict mode still passes through unparseable SQL
- Remove raw semicolon substring check from validate_filter_expression;
  use sqlglot.parse() after successful parse_one so semicolons inside
  string literals in filters are accepted
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af912b65f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# literals (e.g. SELECT ';') are not rejected.
import sqlglot

statements = sqlglot.parse(normalized)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle parse failures as 4xx in SQL normalization

/sql and /sql/compile both call _normalize_sql_query, but sqlglot.parse(normalized) can throw non-ValueError parse exceptions on malformed input; those bypass the app’s ValueError handler and surface as 500 responses. That turns ordinary client-side bad SQL into an internal server error path, so parse errors should be caught here and re-raised as ValueError (or HTTPException) to return a stable 4xx.

Useful? React with 👍 / 👎.

Prevents malformed SQL from surfacing as 500 instead of 400.
@nicosuave nicosuave merged commit 99c898d into main Mar 29, 2026
15 checks passed
@nicosuave nicosuave deleted the api-http-server branch March 29, 2026 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant