Unary steps #1973

benjie · 2024-02-27T10:41:52Z

Description

Fixes "Global dependencies" crystal-pre-merge#505

In Grafast, steps depend on other steps to form a DAG which we call the execution plan. At execution time, the steps get executed in "layers" where all steps in the same layer will have the same "size" (i.e. will process the same number of values). The root layer will always be one value (there's one "variableValues", one "rootValue", one "context", one set of input arguments, etc) but as execution goes through lists (and thus __ItemStep steps), polymorphism, and other concerns the size of a layer may change, and thus steps may process a larger number of items.

It turns out that the root bucket, the layer that always has size 1, is super critical, and we want to be able to make decisions on it. Previously we "multiply up" root values into the buckets that depend on them, for example if you have a query { allPosts { authors(first: 2) { id } } } then that number 2 for the >allPosts>authors(first:) argument would need a representation for each of the posts, so assuming there were 10 posts the steps involved with calculating >allPosts>authors would receive [2, 2, 2, 2, 2, 2, 2, 2, 2, 2] as the batched values for first. (Obviously this can get a lot larger in many circumstances.) If this step then wants to use this value hardcoded into an expression, for example select ... limit 2, it cannot trust that the value is always the same value - it must explicitly check. Fortunately PostgreSQL is smart enough that we've not had to deal with this problem, but SQLite, for example, needs assistance.

The current situation is far from ideal, and it makes it hard to support arbitrary data sources via Grafast.

This PR introduces the concept of "unary" steps - these are steps within the execution plan which are guaranteed to always have exactly one value in their batch. We currently determine these dynamically (we don't just use them for variableValues/context/rootValue/arguments, but also for derivatives of these). When it comes time to execute a step we pass it the "multiplied up" values for non-unary steps, and we pass it the single value for unary steps. This allows the step to automatically know that this 2 is safe to use for all inputs, building an SQL expression with fewer placeholders and more literals. When a step adds a dependency, it may require that the step is unary, allowing it to make assumptions during execution.

One major advantage of this approach, other than the above, is that it means the needs to "eval" values is diminished - we can instead take the unary value at runtime and derive the required actions from that.

However, what if a uniry value actually is a list. For example "friends" might be a unary value: [[Alice, Bob, Carl]] - the batch only contains one entry, but that entry is an array. We clearly can't use the type of the entries in the values argument to execute(count, values, extra) to determine whether the related step is unary or not. Instead, we change the entries in the values tuple to be objects that differentiate between batched and unbatched.

To keep this from being a breaking change, we introduce the executeV2 method which accepts this new shape, and we have a fallback executeV2 which automatically transforms (via multiplying up) the unaries to feed into the legacy execute method. Everyone should move to using executeV2 for efficiency sake (multiplying up increases burden on the garbage collector).

TODO:

Change all execute methods to executeV2
Change all this.execute = methods to executeV2
Change all stream methods to streamV2
Either have PostGraphile leverage unaries for more optimal SQL, or file an issue about it
Either reduce the number of .eval...() calls, or file an issue about it

Performance impact

Significant! Unknown! Probably bad! Also some of the previous optimizations have been disabled because they don't work for the new pattern.

Right now I'm focussed on getting Grafast the features it needs, once everything is in place I'll go through and refactor and do performance optimization again.

Security impact

None known.

Checklist

My code matches the project's code style and yarn lint:fix passes.
I've added tests for the new feature, and yarn test passes.
I have detailed the new feature in the relevant documentation.
I have added this feature to 'Pending' in the RELEASE_NOTES.md file (if one exists).
If this is a breaking change I've explained why.

changeset-bot · 2024-02-27T10:41:56Z

🦋 Changeset detected

Latest commit: 94f2a36

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages

Name	Type
tamedevil	Patch
pg-sql2	Patch
postgraphile	Patch
@dataplan/json	Patch
@dataplan/pg	Patch
grafast	Patch
ruru-components	Patch
graphile-build-pg	Patch
graphile-utils	Patch
pgl	Patch
graphile	Patch
@localrepo/grafast-bench	Patch
@grafserv/persisted	Patch
grafserv	Patch
@localrepo/grafast-website	Patch
graphile-build	Patch
graphile-export	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

…a printing a formatted version.

…o using new executeV2 execution method which leverages them. Backwards compatibility maintained, but users should move to executeV2.

benjie added 20 commits February 20, 2024 15:52

Add test

72a7895

Handle errors in toStringMeta

2c063e4

Allow wrapping any step with access without path

23c8b9d

Unary dependencies almost working

9b1c0fb

Fix LayerPlan optimized

a87c024

Not sure that this is really needed, and it is expensive... But 🤷

085c4aa

Fix storing errors from unary steps

fbb3a77

Add missing size=1

b03673a

More unary stuff

fe7613a

Don't log warnings in test

4d9f83d

Improve error message

3d805f7

Fix new bucket with unary root step

b552551

Update mermaid plans

3736539

Use test mode; use pretty SQL in test mode

7b6a4e4

Prevent defer hang

b09c19d

Fix comparison of errors

fcdcc94

Fix the bug

d72a869

Various unary fixes

432ee9d

Update mermaid plans

1a0a4cb

Remove unused variables

a842b4c

benjie added 8 commits February 27, 2024 10:57

Unaries isn't a record, it's a list the same length as values

45f6099

Change executeV2 to have a single object argument

f3bcd13

New format of executeV2

46eb1aa

Convert various steps to use executeV2

002bcc3

A couple more executes

ba7b70c

Catch another scenario

f03546d

Stream to streamV2

ad2d10b

Lint fixes

6b384d7

benjie mentioned this pull request Feb 27, 2024

Have PgSelectStep leverage unary steps #1974

Open

benjie added 8 commits February 27, 2024 14:24

docs(changeset): Add te.debug helper for debugging a te expression vi…

b788dd8

…a printing a formatted version.

docs(changeset): Fix processing of GRAPHILE_ENV to allow "test"

94a0506

docs(changeset): Add 'unary steps' concept to codebase and refactor t…

a0e82b9

…o using new executeV2 execution method which leverages them. Backwards compatibility maintained, but users should move to executeV2.

Async execute -> async executeV2

caffbc2

async stream -> async streamV2

8ec9704

Smoosh unaries back into values again; add .at(i) accessor for ease

6bb7ca2

Lint

386e794

Update docs

f4bebc2

benjie marked this pull request as ready for review February 28, 2024 19:50

benjie added 12 commits February 28, 2024 19:57

Remove comments

e958af3

Introduce new indexMap function and use it for all executeV2 methods

4fb4e41

Move docs to using indexMap

503c873

Minor edits

8b797ed

Restore old behavior where data contained promises

4e3fccb

Simplify isBatch expressions down to .at()

6ce79d5

Minor optimizations

b8cfdcf

Rename command that confuses people

458bed2

Add website commands for people

5696496

Clarification and better advice

3b392a6

This error code wasn't used in the end

71ac409

Lint/efficiency

94f2a36

benjie merged commit a858c13 into main Mar 1, 2024
24 checks passed

benjie deleted the unary-steps branch March 1, 2024 09:13

benjie mentioned this pull request Mar 1, 2024

Fix bugs in unary step logic #1977

Merged

5 tasks

This was referenced Apr 2, 2024

Early exit #2013

Closed

Refactor unary logic #1995

Merged

benjie mentioned this pull request May 10, 2024

Epic: removing $step.eval() #2060

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unary steps #1973

Unary steps #1973

benjie commented Feb 27, 2024 •

edited

changeset-bot bot commented Feb 27, 2024 •

edited

Unary steps #1973

Unary steps #1973

Conversation

benjie commented Feb 27, 2024 • edited

Description

Performance impact

Security impact

Checklist

changeset-bot bot commented Feb 27, 2024 • edited

🦋 Changeset detected

benjie commented Feb 27, 2024 •

edited

changeset-bot bot commented Feb 27, 2024 •

edited