Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with multi-file schema handling in PSL #4243

Merged
merged 6 commits into from Apr 8, 2024

Conversation

tomhoule
Copy link
Contributor

@tomhoule tomhoule commented Sep 15, 2023

This commit implements multi-file schema handling in the Prisma Schema Language.

At a high level, instead of accepting a single string, psl::validate_multi_file() is an alternative to psl::validate() that accepts something morally equivalent to:

{
  "./prisma/schema/a.prisma": "datasource db { ... }",
  "./prisma/schema/nested/b.prisma": "model Test { ... }"
}

There are tests for PSL validation with multiple schema files, but most of the rest of engines still consumes the single file version of psl::validate(). The implementation and the return type are shared between psl::validate_multi_file() and psl::validate(), so the change is completely transparent, other than the expectation of passing in a list of (file_name, file_contents) instead of a single string. The psl::validate() entry point should behave exactly the same as psl::multi_schema() with a single file named schema.prisma. In particular, it has the exact same return type.

Implementation

This is achieved by extending Span to contain, in addition to a start and end offset, a FileId. The FileId is a unique identifier for a file and its parsed SchemaAst inside ParserDatabase. The identifier types for AST items in ParserDatabase are also extended to contain the FileId, so that they can be uniquely referred to in the context of the (multi-file) schema. After the analysis phase (the parser_database crate), consumers of the analyzed schema become multi-file aware completely transparently, no change is necessary in the other engines.

The only changes that will be required at scattered points across the codebase are the psl::validate() call sites that will need to receive a Vec<Box<Path>, SourceFile> instead of a single SourceFile. This PR does not deal with that, but it makes where these call sites are obvious by what entry points they use: psl::validate(), psl::parse_schema() and the various *_assert_single() methods on ParserDatabase.

The PR contains tests confirming that schema analysis, validation and displaying diagnostics across multiple files works as expected.

Status of this PR

This is going to be directly mergeable after review, and it will not affect the current schema handling behaviour when dealing with a single schema file.

Next steps

  • Replace all calls to psl::validate() with calls to psl::validate_multi_file().
  • The *_assert_single() calls should be progressively replaced with their multi-file counterparts across engines.
  • The language server should start sending multiple files to prisma-schema-wasm in all calls. This is not in the spirit of the language server spec, but that is the most immediate solution. We'll have to make range_to_span() in prisma-fmt multi-schema aware by taking a FileId param.

Links

Relevant issue: prisma/prisma#2377

Also see the internal design doc.

Close prisma/team-orm#1034

@tomhoule tomhoule requested a review from a team as a code owner September 15, 2023 11:18
@tomhoule tomhoule added this to the 5.4.0 milestone Sep 15, 2023
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 15, 2023

CodSpeed Performance Report

Merging #4243 will degrade performances by 5.45%

Comparing psl-multi-file-schema (3fee3aa) with main (3d92748)

Summary

❌ 3 regressions
✅ 8 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main psl-multi-file-schema Change
large_read 7.7 ms 8.1 ms -5.03%
build (medium) 12.2 ms 12.8 ms -5.01%
build (small) 1.4 ms 1.5 ms -5.45%

@janpio janpio changed the title Implement multi-file schema handling in PSL Experiment with multi-file schema handling in PSL Sep 15, 2023
@tomhoule tomhoule requested a review from a team as a code owner March 18, 2024 16:16
@tomhoule tomhoule requested review from jkomyno and removed request for a team March 18, 2024 16:16
Copy link
Contributor

github-actions bot commented Mar 18, 2024

WASM Query Engine file Size

Engine This PR Base branch Diff
Postgres 2.124MiB 2.081MiB 43.961KiB
Postgres (gzip) 836.162KiB 821.679KiB 14.483KiB
Mysql 2.092MiB 2.051MiB 42.054KiB
Mysql (gzip) 823.329KiB 808.751KiB 14.578KiB
Sqlite 1.987MiB 1.946MiB 42.143KiB
Sqlite (gzip) 784.102KiB 769.762KiB 14.340KiB

Copy link
Contributor

github-actions bot commented Mar 18, 2024

❌ WASM query-engine performance will worsen by 1.53%

Full benchmark report
DATABASE_URL="postgresql://postgres:postgres@localhost:5432/bench?schema=imdb_bench&sslmode=disable" \
node --experimental-wasm-modules query-engine/driver-adapters/executor/dist/bench.mjs
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
cpu: AMD EPYC 7763 64-Core Processor
runtime: node v18.20.0 (x64-linux)

benchmark                   time (avg)             (min … max)       p75       p99      p999
-------------------------------------------------------------- -----------------------------
• movies.findMany() (all - ~50K)
-------------------------------------------------------------- -----------------------------
Web Assembly: Baseline     364 ms/iter       (360 ms … 372 ms)    368 ms    372 ms    372 ms
Web Assembly: Latest       454 ms/iter       (451 ms … 459 ms)    458 ms    459 ms    459 ms
Web Assembly: Current      453 ms/iter       (450 ms … 458 ms)    457 ms    458 ms    458 ms
Node API: Current          194 ms/iter       (190 ms … 198 ms)    197 ms    198 ms    198 ms

summary for movies.findMany() (all - ~50K)
  Web Assembly: Current
   2.34x slower than Node API: Current
   1.24x slower than Web Assembly: Baseline
   1x faster than Web Assembly: Latest

• movies.findMany({ take: 2000 })
-------------------------------------------------------------- -----------------------------
Web Assembly: Baseline  14'661 µs/iter (14'479 µs … 15'266 µs) 14'698 µs 15'266 µs 15'266 µs
Web Assembly: Latest    18'206 µs/iter (18'062 µs … 18'565 µs) 18'245 µs 18'565 µs 18'565 µs
Web Assembly: Current   18'374 µs/iter (18'048 µs … 20'018 µs) 18'368 µs 20'018 µs 20'018 µs
Node API: Current        7'900 µs/iter   (7'680 µs … 8'279 µs)  7'952 µs  8'279 µs  8'279 µs

summary for movies.findMany({ take: 2000 })
  Web Assembly: Current
   2.33x slower than Node API: Current
   1.25x slower than Web Assembly: Baseline
   1.01x slower than Web Assembly: Latest

• movies.findMany({ where: {...}, take: 2000 })
-------------------------------------------------------------- -----------------------------
Web Assembly: Baseline   2'309 µs/iter   (2'166 µs … 3'920 µs)  2'319 µs  3'442 µs  3'920 µs
Web Assembly: Latest     2'865 µs/iter   (2'753 µs … 4'961 µs)  2'842 µs  3'418 µs  4'961 µs
Web Assembly: Current    2'862 µs/iter   (2'748 µs … 4'709 µs)  2'835 µs  3'643 µs  4'709 µs
Node API: Current        1'430 µs/iter   (1'322 µs … 2'279 µs)  1'413 µs  1'954 µs  2'279 µs

summary for movies.findMany({ where: {...}, take: 2000 })
  Web Assembly: Current
   2x slower than Node API: Current
   1.24x slower than Web Assembly: Baseline
   1x faster than Web Assembly: Latest

• movies.findMany({ include: { cast: true } take: 2000 }) (m2m)
-------------------------------------------------------------- -----------------------------
Web Assembly: Baseline     566 ms/iter       (559 ms … 591 ms)    564 ms    591 ms    591 ms
Web Assembly: Latest       753 ms/iter       (748 ms … 765 ms)    755 ms    765 ms    765 ms
Web Assembly: Current      779 ms/iter       (772 ms … 795 ms)    795 ms    795 ms    795 ms
Node API: Current          473 ms/iter       (450 ms … 509 ms)    501 ms    509 ms    509 ms

summary for movies.findMany({ include: { cast: true } take: 2000 }) (m2m)
  Web Assembly: Current
   1.65x slower than Node API: Current
   1.38x slower than Web Assembly: Baseline
   1.03x slower than Web Assembly: Latest

• movies.findMany({ where: {...}, include: { cast: true } take: 2000 }) (m2m)
-------------------------------------------------------------- -----------------------------
Web Assembly: Baseline  78'075 µs/iter (77'705 µs … 78'501 µs) 78'415 µs 78'501 µs 78'501 µs
Web Assembly: Latest       106 ms/iter       (105 ms … 106 ms)    106 ms    106 ms    106 ms
Web Assembly: Current      110 ms/iter       (109 ms … 111 ms)    111 ms    111 ms    111 ms
Node API: Current       61'632 µs/iter (61'121 µs … 62'071 µs) 61'743 µs 62'071 µs 62'071 µs

summary for movies.findMany({ where: {...}, include: { cast: true } take: 2000 }) (m2m)
  Web Assembly: Current
   1.78x slower than Node API: Current
   1.4x slower than Web Assembly: Baseline
   1.04x slower than Web Assembly: Latest

• movies.findMany({ take: 2000, include: { cast: { include: { person: true } } } })
-------------------------------------------------------------- -----------------------------
Web Assembly: Baseline   1'018 ms/iter   (1'005 ms … 1'039 ms)  1'026 ms  1'039 ms  1'039 ms
Web Assembly: Latest     1'255 ms/iter   (1'244 ms … 1'276 ms)  1'270 ms  1'276 ms  1'276 ms
Web Assembly: Current    1'296 ms/iter   (1'287 ms … 1'314 ms)  1'302 ms  1'314 ms  1'314 ms
Node API: Current          860 ms/iter       (834 ms … 891 ms)    886 ms    891 ms    891 ms

summary for movies.findMany({ take: 2000, include: { cast: { include: { person: true } } } })
  Web Assembly: Current
   1.51x slower than Node API: Current
   1.27x slower than Web Assembly: Baseline
   1.03x slower than Web Assembly: Latest

• movie.findMany({ where: { ... }, take: 2000, include: { cast: { include: { person: true } } } })
-------------------------------------------------------------- -----------------------------
Web Assembly: Baseline     143 ms/iter       (142 ms … 144 ms)    144 ms    144 ms    144 ms
Web Assembly: Latest       174 ms/iter       (173 ms … 175 ms)    175 ms    175 ms    175 ms
Web Assembly: Current      180 ms/iter       (179 ms … 181 ms)    181 ms    181 ms    181 ms
Node API: Current          107 ms/iter       (105 ms … 109 ms)    109 ms    109 ms    109 ms

summary for movie.findMany({ where: { ... }, take: 2000, include: { cast: { include: { person: true } } } })
  Web Assembly: Current
   1.69x slower than Node API: Current
   1.26x slower than Web Assembly: Baseline
   1.04x slower than Web Assembly: Latest

• movie.findMany({ where: { reviews: { author: { ... } }, take: 100 }) (to-many -> to-one)
-------------------------------------------------------------- -----------------------------
Web Assembly: Baseline   1'055 µs/iter     (982 µs … 1'836 µs)  1'045 µs  1'675 µs  1'836 µs
Web Assembly: Latest     1'349 µs/iter   (1'291 µs … 1'844 µs)  1'357 µs  1'653 µs  1'844 µs
Web Assembly: Current    1'361 µs/iter   (1'298 µs … 1'884 µs)  1'362 µs  1'741 µs  1'884 µs
Node API: Current          793 µs/iter     (700 µs … 1'345 µs)    802 µs  1'167 µs  1'345 µs

summary for movie.findMany({ where: { reviews: { author: { ... } }, take: 100 }) (to-many -> to-one)
  Web Assembly: Current
   1.72x slower than Node API: Current
   1.29x slower than Web Assembly: Baseline
   1.01x slower than Web Assembly: Latest

• movie.findMany({ where: { cast: { person: { ... } }, take: 100 }) (m2m -> to-one)
-------------------------------------------------------------- -----------------------------
Web Assembly: Baseline   1'031 µs/iter     (981 µs … 1'704 µs)  1'028 µs  1'455 µs  1'704 µs
Web Assembly: Latest     1'366 µs/iter   (1'318 µs … 2'111 µs)  1'369 µs  1'708 µs  2'111 µs
Web Assembly: Current    1'335 µs/iter   (1'298 µs … 1'587 µs)  1'346 µs  1'522 µs  1'587 µs
Node API: Current          790 µs/iter     (724 µs … 1'171 µs)    809 µs    853 µs  1'171 µs

summary for movie.findMany({ where: { cast: { person: { ... } }, take: 100 }) (m2m -> to-one)
  Web Assembly: Current
   1.69x slower than Node API: Current
   1.3x slower than Web Assembly: Baseline
   1.02x faster than Web Assembly: Latest

After changes in 3fee3aa

@SevInf SevInf self-assigned this Mar 20, 2024
This commit implements multi-file schema handling in the Prisma Schema Language.

At a high level, instead of accepting a single string, `psl::validate_multi_file()` is an alternative to `psl::validate()` that accepts something morally equivalent to:

```json
{
  "./prisma/schema/a.prisma": "datasource db { ... }",
  "./prisma/schema/nested/b.prisma": "model Test { ... }"
}
```

There are tests for PSL validation with multiple schema files, but most of the rest of engines still consumes the single file version of `psl::validate()`. The implementation and the return type are shared between `psl::validate_multi_file()` and `psl::validate()`, so the change is completely transparent, other than the expectation of passing in a list of (file_name, file_contents) instead of a single string. The `psl::validate()` entry point should behave exactly the same as `psl::multi_schema()` with a single file named `schema.prisma`. In particular, it has the exact same return type.

Implementation
==============

This is achieved by extending `Span` to contain, in addition to a start and end offset, a `FileId`. The `FileId` is a unique identifier for a file and its parsed `SchemaAst` inside `ParserDatabase`. The identifier types for AST items in `ParserDatabase` are also extended to contain the `FileId`, so that they can be uniquely referred to in the context of the (multi-file) schema. After the analysis phase (the `parser_database` crate), consumers of the analyzed schema become multi-file aware completely transparently, no change is necessary in the other engines.

The only changes that will be required at scattered points across the codebase are the `psl::validate()` call sites that will need to receive a `Vec<Box<Path>, SourceFile>` instead of a single `SourceFile`. This PR does _not_ deal with that, but it makes where these call sites are obvious by what entry points they use: `psl::validate()`, `psl::parse_schema()` and the various `*_assert_single()` methods on `ParserDatabase`.

The PR contains tests confirming that schema analysis, validation and displaying diagnostics across multiple files works as expected.

Status of this PR
=================

This is going to be directly mergeable after review, and it will not affect the current schema handling behaviour when dealing with a single schema file.

Next steps
==========

- Replace all calls to `psl::validate()` with calls to `psl::validate_multi_file()`.
- The `*_assert_single()` calls should be progressively replaced with their multi-file counterparts across engines.
- The language server should start sending multiple files to prisma-schema-wasm in all calls. This is not in the spirit of the language server spec, but that is the most immediate solution. We'll have to make `range_to_span()` in `prisma-fmt` multi-schema aware by taking a FileId param.

Links
=====

Relevant issue: prisma/prisma#2377

Also see the [internal design doc](https://www.notion.so/prismaio/Multi-file-Schema-24d68fe8664048ad86252fe446caac24?d=68ef128f25974e619671a9855f65f44d#2889a038e68c4fe1ac9afe3cd34978bd).
Copy link
Contributor

@jkomyno jkomyno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge?

SevInf

This comment was marked as duplicate.

Copy link
Contributor

@SevInf SevInf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's

@jkomyno jkomyno merged commit dcdb692 into main Apr 8, 2024
205 of 207 checks passed
@jkomyno jkomyno deleted the psl-multi-file-schema branch April 8, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants