Skip to content

Conversation

@tomasnyberg
Copy link
Contributor

@tomasnyberg tomasnyberg commented Mar 19, 2025

Description

As discussed in three different syncs (XD) and in various slack threads:

We would like to add a CI / CD step that automatically runs the "verify formatting" check that we have been manually running with local files. This check essentially just means running the following integrity checks on a large amount of sample queries (real queries from Aura that we got from the product-analytics people):

  1. The AST should not change as a result of formatting
  2. Formatting should be idempotent
  3. Formatting should not change the amount of non-whitespace characters in a query

This integrity check should happen on any change to the src/formatting folder (see the new file formatting-integrity-check.yaml)

Why has this not been introduced earlier?

  1. The sample queries were not sufficiently anonymized; they still contained node and relationship names
  2. The sample queries contained lots of duplicates, meaning any attempt would waste a lot of compute looking at queries that are identical in terms of their AST (and thus not very interesting to verify)
  3. Storing a large blob of test data directly in git seemed unreasonable, and we did not want to add complexity to the project by e.g. having something on s3.

Why is it being introduced now; how have we solved / changed our opinions on these issues?

  1. By running an "anonymizing visitor" on the queries I was able to remove any custom node/relationship names
  2. Through the same visitor as above I could replace all of them with aaaaa and deduplicate queries by matching strings
  3. We came to the conclusion that storing a reasonably sized (< 10 mb) file should not pose any significant problems with things such as cost.
    • s3 would of course be cheaper, but for < 10 MB the costs are negligible anyways; 1 GB on Git LFS costs $0.07 per month, with an additional $0.0875 per GiB transfer charge. If we assume that those 10 mb are transferred 1000 times each month (I'm assuming github charges for the data it needs to transfer for its action runners?), they will cost us an additional 0.01GB1000$0.0875/GB = $0.875 per month.
      • Note that this PR does not use Git LFS, I just couldn't find a reference for what GH charges for storing regular files so I used the Git LFS numbers for reference.

Testing

@changeset-bot
Copy link

changeset-bot bot commented Mar 19, 2025

⚠️ No Changeset found

Latest commit: 4eba9da

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@tomasnyberg tomasnyberg changed the title [Draft for gh actions, do not merge yet] verification check for formatter [Draft for gh actions, do not merge yet] Add integrity check of formatter as a CI/CD step Mar 19, 2025
@tomasnyberg tomasnyberg changed the title [Draft for gh actions, do not merge yet] Add integrity check of formatter as a CI/CD step [Draft for gh actions, do not merge yet] Adds integrity check of formatter as a CI/CD step Mar 19, 2025
@tomasnyberg tomasnyberg changed the title [Draft for gh actions, do not merge yet] Adds integrity check of formatter as a CI/CD step Adds integrity check of formatter as a CI/CD step Mar 20, 2025
"watch": "turbowatch ./turbowatch.ts",
"test": "turbo run test",
"test:e2e": "turbo run test:e2e",
"test:formattingIntegrity": "ts-node ./packages/language-support/src/tests/formatting/verification/verificationCheck.ts",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's a more idiomatic way to run a file as a script? Or maybe it shouldn't be run as a script at all, maybe there's a better way? Happy to take any suggestions here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other cases, I've made the script build js and then execute it. We've also needed to use cross-env for some cases (see the gen-parser script, although being on mac I don't really know why it's needed 😅 )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record I think it's fine to leave as is here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cross-env is only to use environment variables (we cannot set them for builds on Windows otherwise)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move to node22 we can run it without ts-node as well with --experimental-strip-types

Comment on lines +8 to +14
--------- QUERY BEFORE START ------------
${query}
--------- QUERY BEFORE END ----------
--------- QUERY FORMATTED START ------------
${formatted}
--------- QUERY FORMATTED END ----------
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves some pretty nice output which should make it easy to debug, see this one for instance:

https://github.com/neo4j/cypher-language-support/actions/runs/13951159540/job/39050765538

Copy link
Collaborator

@OskarDamkjaer OskarDamkjaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good (with a few small tweaks). Let's see what @ncordon thinks 😄

Comment on lines 3 to 12
branches:
- main
paths:
- 'packages/language-support/src/formatting/**'

pull_request:
branches:
- main
paths:
- 'packages/language-support/src/formatting/**'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both PRs and pulls? I think only pull requests would be good enough. Perhaps we'd also want it as part of the release checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why you think it shouldn't run on both PRs and commits? My reasoning was it might help catch cases where a check on a PR is outdated, and I don't think it's too expensive. it's also faster than e2e tests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think it's a bit wasteful, but feel free to leave it in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncordon What's your opinion on this?

"watch": "turbowatch ./turbowatch.ts",
"test": "turbo run test",
"test:e2e": "turbo run test:e2e",
"test:formattingIntegrity": "ts-node ./packages/language-support/src/tests/formatting/verification/verificationCheck.ts",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other cases, I've made the script build js and then execute it. We've also needed to use cross-env for some cases (see the gen-parser script, although being on mac I don't really know why it's needed 😅 )

"watch": "turbowatch ./turbowatch.ts",
"test": "turbo run test",
"test:e2e": "turbo run test:e2e",
"test:formattingIntegrity": "ts-node ./packages/language-support/src/tests/formatting/verification/verificationCheck.ts",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record I think it's fine to leave as is here

@ncordon ncordon self-assigned this Mar 20, 2025
Copy link
Contributor

@ncordon ncordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me but just so you know the file is not stored with lfs, which you could and I don't know if that was the intention given the description of the PR?

@tomasnyberg
Copy link
Contributor Author

tomasnyberg commented Mar 25, 2025

It looks good to me but just so you know the file is not stored with lfs, which you could and I don't know if that was the intention given the description of the PR?

Yeah I couldn't find what the exact pricing was for regular objects in a git repo so I used the LFS numbers as a reference. The assumption I'm making is that the price is comparable when also factoring in the cost of gh actions. The magnitude of the numbers is the most important thing I think

@tomasnyberg tomasnyberg merged commit e02a1c1 into neo4j:main Mar 25, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants