Adds integrity check of formatter as a CI/CD step #416

tomasnyberg · 2025-03-19T15:39:27Z

Description

As discussed in three different syncs (XD) and in various slack threads:

We would like to add a CI / CD step that automatically runs the "verify formatting" check that we have been manually running with local files. This check essentially just means running the following integrity checks on a large amount of sample queries (real queries from Aura that we got from the product-analytics people):

The AST should not change as a result of formatting
Formatting should be idempotent
Formatting should not change the amount of non-whitespace characters in a query

This integrity check should happen on any change to the src/formatting folder (see the new file formatting-integrity-check.yaml)

Why has this not been introduced earlier?

The sample queries were not sufficiently anonymized; they still contained node and relationship names
The sample queries contained lots of duplicates, meaning any attempt would waste a lot of compute looking at queries that are identical in terms of their AST (and thus not very interesting to verify)
Storing a large blob of test data directly in git seemed unreasonable, and we did not want to add complexity to the project by e.g. having something on s3.

Why is it being introduced now; how have we solved / changed our opinions on these issues?

By running an "anonymizing visitor" on the queries I was able to remove any custom node/relationship names
Through the same visitor as above I could replace all of them with aaaaa and deduplicate queries by matching strings
We came to the conclusion that storing a reasonably sized (< 10 mb) file should not pose any significant problems with things such as cost.
- s3 would of course be cheaper, but for < 10 MB the costs are negligible anyways; 1 GB on Git LFS costs $0.07 per month, with an additional $0.0875 per GiB transfer charge. If we assume that those 10 mb are transferred 1000 times each month (I'm assuming github charges for the data it needs to transfer for its action runners?), they will cost us an additional 0.01GB1000$0.0875/GB = $0.875 per month.
  - Note that this PR does not use Git LFS, I just couldn't find a reference for what GH charges for storing regular files so I used the Git LFS numbers for reference.

Testing

I introduced a meaningless change in the formatting visitor to break the tests, and it failed with a nice output: https://github.com/neo4j/cypher-language-support/actions/runs/13951159540/job/39050765538
- (see the commit "introduce dumb change to check that it fails"
I introduced a meaningless change that does not break the formatter and it passed nicely: https://github.com/neo4j/cypher-language-support/actions/runs/13951484722/job/39051899077
- see the commit "add meaningless change to verify happy path"
I removed both of those and the verification run did not happen

…g though

changeset-bot · 2025-03-19T15:39:31Z

⚠️ No Changeset found

Latest commit: 4eba9da

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

tomasnyberg · 2025-03-20T08:09:49Z

package.json

    "watch": "turbowatch ./turbowatch.ts",
    "test": "turbo run test",
    "test:e2e": "turbo run test:e2e",
+    "test:formattingIntegrity": "ts-node ./packages/language-support/src/tests/formatting/verification/verificationCheck.ts",


I'm not sure if there's a more idiomatic way to run a file as a script? Or maybe it shouldn't be run as a script at all, maybe there's a better way? Happy to take any suggestions here.

In other cases, I've made the script build js and then execute it. We've also needed to use cross-env for some cases (see the gen-parser script, although being on mac I don't really know why it's needed 😅 )

For the record I think it's fine to leave as is here

cross-env is only to use environment variables (we cannot set them for builds on Windows otherwise)

If we move to node22 we can run it without ts-node as well with --experimental-strip-types

tomasnyberg · 2025-03-20T08:10:34Z

packages/language-support/src/tests/formatting/verification/verificationCheck.ts

+---------   QUERY BEFORE START  ------------
+${query}
+---------   QUERY BEFORE END    ----------
+
+---------   QUERY FORMATTED START  ------------
+${formatted}
+---------   QUERY FORMATTED END    ----------


This leaves some pretty nice output which should make it easy to debug, see this one for instance:

https://github.com/neo4j/cypher-language-support/actions/runs/13951159540/job/39050765538

OskarDamkjaer

I think it looks good (with a few small tweaks). Let's see what @ncordon thinks 😄

OskarDamkjaer · 2025-03-20T13:41:02Z

.github/workflows/ciformatting.yaml

+    branches:
+      - main
+    paths:
+      - 'packages/language-support/src/formatting/**'
+
+  pull_request:
+    branches:
+      - main
+    paths:
+      - 'packages/language-support/src/formatting/**'


Do we need both PRs and pulls? I think only pull requests would be good enough. Perhaps we'd also want it as part of the release checks?

Any particular reason why you think it shouldn't run on both PRs and commits? My reasoning was it might help catch cases where a check on a PR is outdated, and I don't think it's too expensive. it's also faster than e2e tests

I just think it's a bit wasteful, but feel free to leave it in

@ncordon What's your opinion on this?

.github/workflows/ciformatting.yaml

OskarDamkjaer · 2025-03-20T13:44:07Z

package.json

    "watch": "turbowatch ./turbowatch.ts",
    "test": "turbo run test",
    "test:e2e": "turbo run test:e2e",
+    "test:formattingIntegrity": "ts-node ./packages/language-support/src/tests/formatting/verification/verificationCheck.ts",


In other cases, I've made the script build js and then execute it. We've also needed to use cross-env for some cases (see the gen-parser script, although being on mac I don't really know why it's needed 😅 )

OskarDamkjaer · 2025-03-20T13:44:18Z

package.json

    "watch": "turbowatch ./turbowatch.ts",
    "test": "turbo run test",
    "test:e2e": "turbo run test:e2e",
+    "test:formattingIntegrity": "ts-node ./packages/language-support/src/tests/formatting/verification/verificationCheck.ts",


For the record I think it's fine to leave as is here

ncordon

It looks good to me but just so you know the file is not stored with lfs, which you could and I don't know if that was the intention given the description of the PR?

tomasnyberg · 2025-03-25T10:58:53Z

It looks good to me but just so you know the file is not stored with lfs, which you could and I don't know if that was the intention given the description of the PR?

Yeah I couldn't find what the exact pricing was for regular objects in a git repo so I used the LFS numbers as a reference. The assumption I'm making is that the price is comparable when also factoring in the cost of gh actions. The magnitude of the numbers is the most important thing I think

tomasnyberg added 5 commits March 18, 2025 13:52

add sample queries

bc6d2a9

move sample_queries.json to tests

2693ab9

move the sample queries.json file (again...)

2388b8f

add file that runs all queries and throws on errorgs

899c55e

add the github action, seems to be working (not without the java thin…

20bbad4

…g though

tomasnyberg added 12 commits March 19, 2025 16:43

clean up ci.yaml (remove comment, blank line)

17edd03

better name for the verification check gsjob itself

0d9d61e

format verificationCheck.ts

1c668c0

refactor verificaitonCheck a bit

0f8f735

correct name for job

9d8f93e

introduce dumb change to check that it fails

58f2aa6

try to get check to run only on certain changes

ebe9508

remove paths to check if that's the issue

6ab7ec9

correct path

5c95913

remove commented out visit

62ed742

remove commented out part

9fd7150

add meaningless change to verify happy path

15eada0

tomasnyberg changed the title ~~[Draft for gh actions, do not merge yet] verification check for formatter~~ [Draft for gh actions, do not merge yet] Add integrity check of formatter as a CI/CD step Mar 19, 2025

tomasnyberg changed the title ~~[Draft for gh actions, do not merge yet] Add integrity check of formatter as a CI/CD step~~ [Draft for gh actions, do not merge yet] Adds integrity check of formatter as a CI/CD step Mar 19, 2025

tomasnyberg added 2 commits March 19, 2025 17:39

remove menaingless change

9054ab9

add anonymized comments to the sample queries

68d226e

tomasnyberg changed the title ~~[Draft for gh actions, do not merge yet] Adds integrity check of formatter as a CI/CD step~~ Adds integrity check of formatter as a CI/CD step Mar 20, 2025

tomasnyberg commented Mar 20, 2025

View reviewed changes

OskarDamkjaer approved these changes Mar 20, 2025

View reviewed changes

rename ciformatting.yaml -> formatting-integrity-check.yaml

4eba9da

ncordon self-assigned this Mar 20, 2025

ncordon approved these changes Mar 24, 2025

View reviewed changes

tomasnyberg merged commit e02a1c1 into neo4j:main Mar 25, 2025
3 checks passed

Adds integrity check of formatter as a CI/CD step #416

Adds integrity check of formatter as a CI/CD step #416

Uh oh!

Conversation

tomasnyberg commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why has this not been introduced earlier?

Why is it being introduced now; how have we solved / changed our opinions on these issues?

Testing

Uh oh!

changeset-bot bot commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OskarDamkjaer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ncordon left a comment

Choose a reason for hiding this comment

Uh oh!

tomasnyberg commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tomasnyberg commented Mar 19, 2025 •

edited

Loading

changeset-bot bot commented Mar 19, 2025 •

edited

Loading

tomasnyberg commented Mar 25, 2025 •

edited

Loading