Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unary steps #1973

Merged
merged 48 commits into from
Mar 1, 2024
Merged

Unary steps #1973

merged 48 commits into from
Mar 1, 2024

Conversation

benjie
Copy link
Member

@benjie benjie commented Feb 27, 2024

Description

In Grafast, steps depend on other steps to form a DAG which we call the execution plan. At execution time, the steps get executed in "layers" where all steps in the same layer will have the same "size" (i.e. will process the same number of values). The root layer will always be one value (there's one "variableValues", one "rootValue", one "context", one set of input arguments, etc) but as execution goes through lists (and thus __ItemStep steps), polymorphism, and other concerns the size of a layer may change, and thus steps may process a larger number of items.

It turns out that the root bucket, the layer that always has size 1, is super critical, and we want to be able to make decisions on it. Previously we "multiply up" root values into the buckets that depend on them, for example if you have a query { allPosts { authors(first: 2) { id } } } then that number 2 for the >allPosts>authors(first:) argument would need a representation for each of the posts, so assuming there were 10 posts the steps involved with calculating >allPosts>authors would receive [2, 2, 2, 2, 2, 2, 2, 2, 2, 2] as the batched values for first. (Obviously this can get a lot larger in many circumstances.) If this step then wants to use this value hardcoded into an expression, for example select ... limit 2, it cannot trust that the value is always the same value - it must explicitly check. Fortunately PostgreSQL is smart enough that we've not had to deal with this problem, but SQLite, for example, needs assistance.

The current situation is far from ideal, and it makes it hard to support arbitrary data sources via Grafast.

This PR introduces the concept of "unary" steps - these are steps within the execution plan which are guaranteed to always have exactly one value in their batch. We currently determine these dynamically (we don't just use them for variableValues/context/rootValue/arguments, but also for derivatives of these). When it comes time to execute a step we pass it the "multiplied up" values for non-unary steps, and we pass it the single value for unary steps. This allows the step to automatically know that this 2 is safe to use for all inputs, building an SQL expression with fewer placeholders and more literals. When a step adds a dependency, it may require that the step is unary, allowing it to make assumptions during execution.

One major advantage of this approach, other than the above, is that it means the needs to "eval" values is diminished - we can instead take the unary value at runtime and derive the required actions from that.

However, what if a uniry value actually is a list. For example "friends" might be a unary value: [[Alice, Bob, Carl]] - the batch only contains one entry, but that entry is an array. We clearly can't use the type of the entries in the values argument to execute(count, values, extra) to determine whether the related step is unary or not. Instead, we change the entries in the values tuple to be objects that differentiate between batched and unbatched.

To keep this from being a breaking change, we introduce the executeV2 method which accepts this new shape, and we have a fallback executeV2 which automatically transforms (via multiplying up) the unaries to feed into the legacy execute method. Everyone should move to using executeV2 for efficiency sake (multiplying up increases burden on the garbage collector).

TODO:

  • Change all execute methods to executeV2
  • Change all this.execute = methods to executeV2
  • Change all stream methods to streamV2
  • Either have PostGraphile leverage unaries for more optimal SQL, or file an issue about it
  • Either reduce the number of .eval...() calls, or file an issue about it

Performance impact

Significant! Unknown! Probably bad! Also some of the previous optimizations have been disabled because they don't work for the new pattern.

Right now I'm focussed on getting Grafast the features it needs, once everything is in place I'll go through and refactor and do performance optimization again.

Security impact

None known.

Checklist

  • My code matches the project's code style and yarn lint:fix passes.
  • I've added tests for the new feature, and yarn test passes.
  • I have detailed the new feature in the relevant documentation.
  • I have added this feature to 'Pending' in the RELEASE_NOTES.md file (if one exists).
  • If this is a breaking change I've explained why.

Copy link

changeset-bot bot commented Feb 27, 2024

🦋 Changeset detected

Latest commit: 94f2a36

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages
Name Type
tamedevil Patch
pg-sql2 Patch
postgraphile Patch
@dataplan/json Patch
@dataplan/pg Patch
grafast Patch
ruru-components Patch
graphile-build-pg Patch
graphile-utils Patch
pgl Patch
graphile Patch
@localrepo/grafast-bench Patch
@grafserv/persisted Patch
grafserv Patch
@localrepo/grafast-website Patch
graphile-build Patch
graphile-export Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@benjie benjie marked this pull request as ready for review February 28, 2024 19:50
@benjie benjie merged commit a858c13 into main Mar 1, 2024
24 checks passed
@benjie benjie deleted the unary-steps branch March 1, 2024 09:13
@benjie benjie mentioned this pull request Mar 1, 2024
5 tasks
This was referenced Apr 2, 2024
@benjie benjie mentioned this pull request May 10, 2024
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

"Global dependencies"
1 participant