Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query-engine-wasm): [Performance variant] Refactor psl::ValidatedSchema for query-engine-wasm to allow byte schema deserialization #4708

Closed
wants to merge 16 commits into from

Conversation

jkomyno
Copy link
Contributor

@jkomyno jkomyno commented Feb 8, 2024

WORK IN PROGRESS

This is a variant of #4706, optimised for performance and memory safety.
This PR closes https://github.com/prisma/team-orm/issues/892.
This PR deprecates #4696 and #4603.


TODOs:


Technical Background

As of February 20, 2024, Prisma reads the schema.prisma file into a struct called psl_core::ValidatedSchema. This struct is the result of a multi-stage pipeline, as it:

  • Parses the schema.prisma textual contents via pest, a parser crate dependency that accounts for ~72 KB of the Wasm Query Engine before gzip compression is applied.
  • Builds the AST
  • Validates the AST (with each validation rule carrying lots of static strings for error reporting purposes)
  • Attaches additional useful object to the schema information (such as, the active connector)
pub struct ValidatedSchema {
    pub configuration: psl_core::Configuration,
    pub connector: &'static dyn psl_core::datamodel_connector::Connector, 
    pub db: parser_database::ParserDatabase,   
    relation_mode: psl_core::datamodel_connector::RelationMode,
}

The psl_core::ValidatedSchema struct is ubiquitous, as it’s used by every engine:

  • The formatter / validator
  • The Schema Engine (former Migration & Introspection Engines)
  • The Query Engine(s).

In reality, all Query Engines, and the Wasm Query Engine in particular, need fewer schema information than what psl parses, constructs, and provides. There is thus a size-optimisation opportunity.

In particular:

  • The configuration: Configuration field is overkill, as query engines only need to read the preview_features contained therein.

  • The connector: &'static dyn datamodel_connector::Connector field is also “too powerful” for the query engine, which, for a given active connector, only needs to:

    • Read its provider name
    • Know its capabilities (e.g., does the connector support autoincrement columns?)
    • Parse native types (e.g., DateTime )
    • Check support for certain referential action (for relationMode = "prisma")
    • [New] Know how to behave when the nativeJoins preview feature is in use

    Also, most methods exposed by the datamodel_connector::Connector are used for validation purposes, .

pub struct ValidatedSchema {
    // query engines just need to derive `preview_features` from `configuration`
    pub configuration: Configuration,

    // query engines need way fewer features than what `Connector` offers
    pub connector: &'static dyn datamodel_connector::Connector, 

    pub db: parser_database::ParserDatabase,   
    relation_mode: datamodel_connector::RelationMode,
}

Problems & Past Attempts

  • Generating a ValidatedSchema value from the String content of a schema.prisma file is an expensive operation, both in terms of size and CPU performance. This operation happens when constructing any new PrismaClient instance, so, on Edge Cloud platforms, it essentially translates to a PSL parsing step per HTTP request.
  • We already disabled some validations in the past (for query-engine-wasm only, see feat(query-engine-wasm): shrink gzipped Wasm size down to 1.3MB #4552), shoving off ~1 MB before gzip, indicating that PSL validation tend to be very expensive in terms of size. Yet, many more validations are currently in place to construct fields like configuration.
  • We already tried avoiding parsing text with pest by deserializing a byte-encoded Prisma schema instead (see tmp: Try to encode schema into bincode #4603) with only slightly size improvements, but we only restricted our attention to the AST (which is just a portion of the db: parser_database::ParserDatabase field of psl::ValidatedSchema ). This means that the PrismaClient constructor would still have to perform additional validations and object constructions in order to create a psl::ValidatedSchema value.
  • We could combine the knowledge gained with these previous attempt plus some creativity to potentially improve both size and CPU performance at once.

Novel Ideas

  1. Use minimal versions of psl_core::ValidatedSchema and psl_core::datamodel_connector::Connector for Query Engine.
  2. Extend the byte-encoding idea to the whole psl_core::ValidatedSchema, not just the AST, but don’t (de)serialize the fields un-needed by the Query Engine.

Results of this PR

  • It splits the psl_core::datamodel_connector::Connector into two traits: ValidatedConnector, which only exposes the methods required by query-engine-wasm. This split accounts for ~14.5 fewer KB after gzip.
  • It introduces a smaller, (de)serializable alternative to psl_core::ValidatedSchema, called ValidatedSchemaForQE, that only holds what's necessary to query-engine-wasm. (De)serialization is handled via serde, as usual.
  • An additional trait, ValidSchema, is also introduced. This was necessary as query-engine-node-api and query-engine need a few more functionalities and validations than query-engine-wasm, and because every Query Engine shares a few utilities (whose function signatures I have changed from psl::ValidatedSchema to dyn psl::ValidSchema).
  • It allows for query-engine-wasm to load a ValidatedSchemaForQE value directly from a byte buffer, avoiding any validation. Of course, we expect prisma generate to create a binary dump of its schema.prisma file, and ensure its validity.
  • It introduces compiler-cli, a playground for byte (de)serialization of schema.prisma files.

A few of the new names introduced by this PR can probably be improved, clarity-wise. Comments are welcome.

Copy link

codspeed-hq bot commented Feb 8, 2024

CodSpeed Performance Report

Merging #4708 will not alter performance

Comparing feat/qe-specific-psl-performance (dca41aa) with main (5a9203d)

Summary

✅ 11 untouched benchmarks

Copy link
Contributor

github-actions bot commented Feb 8, 2024

WASM Size

Engine This PR Base branch Diff
Postgres 1.905MiB 2.062MiB -160.769KiB
Postgres (gzip) 755.069KiB 814.224KiB -59.156KiB
Mysql 1.907MiB 2.044MiB -140.919KiB
Mysql (gzip) 753.759KiB 806.078KiB -52.319KiB
Sqlite 1.876MiB 2.005MiB -132.013KiB
Sqlite (gzip) 744.105KiB 792.464KiB -48.360KiB

@jkomyno jkomyno added this to the 5.10.0 milestone Feb 12, 2024
@jkomyno jkomyno closed this Feb 12, 2024
@jkomyno jkomyno reopened this Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants