Skip to content

Fix a query performance issue with tables without a PK.#561

Merged
mildbyte merged 1 commit intomasterfrom
bugfix/no-pk-overlapping-chunks
Nov 4, 2021
Merged

Fix a query performance issue with tables without a PK.#561
mildbyte merged 1 commit intomasterfrom
bugfix/no-pk-overlapping-chunks

Conversation

@mildbyte
Copy link
Copy Markdown
Contributor

@mildbyte mildbyte commented Nov 4, 2021

When partitioning data, if the table doesn't have a primary key, we use a "surrogate"
primary key by concatenating the whole row as a string on the PG side (this is because
the whole row can sometimes contain NULLs which we can't compare in PG).

The query planner used to use the non-textual composite PK to figure out which
chunks overlapped in order to materialize them. This meant false positives where
something like (2, 'orange') would force a materialization because numerically,
it's greater than 10 and lexicographically, it isn't.

To fix this, regenerate the textual surrogate PK at query plan time and use it
to figure out when chunks overlap.

When partitioning data, if the table doesn't have a primary key, we use a "surrogate"
primary key by concatenating the whole row as a string on the PG side (this is because
the whole row can sometimes contain NULLs which we can't compare in PG).

The query planner used to use the non-textual composite PK to figure out which
chunks overlapped in order to materialize them. This meant false positives where
something like (2, 'orange') would force a materialization because numerically,
it's greater than 10 and lexicographically, it isn't.

To fix this, regenerate the textual surrogate PK at query plan time and use it
to figure out when chunks overlap.
@mildbyte mildbyte merged commit c0e3fed into master Nov 4, 2021
@mildbyte mildbyte deleted the bugfix/no-pk-overlapping-chunks branch November 4, 2021 12:28
mildbyte added a commit that referenced this pull request Nov 17, 2021
  * Splitfile speedups (#567)
  * Various query speedups (#563, #561)
  * More robust CSV querying (#562)

Full set of changes: [`v0.2.17...v0.2.18`](v0.2.17...v0.2.18)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant