Move to one-file-per-query for sqlx-cli prepare #570

mehcode · 2020-07-27T13:20:32Z

This seems to have a lot of advantages and I can't remember why we didn't do this originally.

git merge won't get in the way
orphan query files are currently a problem so this is a net zero
we already collect to multiple files during build
we could make cargo build transparently always do cargo sqlx prepare and leave the latter as a "make sure" / verify step

The text was updated successfully, but these errors were encountered:

abonander · 2020-07-27T19:40:32Z

git merge won't get in the way

It will for schema changes where the queries were unaffected. Then you'll have potentially dozens of files to resolve merge conflicts on rather than one.

In general, I think we agreed that a single file was better because it could easily be skimmed over in diffs and if a merge conflict did arise it would be a single "accept ours"/"accept theirs" resolution.

jplatte · 2020-07-27T19:58:32Z

Having worked with this for a bit, it's always annoying to have to go back when CI failed because of a stale cache. I would be very happy if the separate cargo sqlx prepare step was not needed.

In general, I think we agreed that a single file was better because it could easily be skimmed over in diffs and if a merge conflict did arise it would be a single "accept ours"/"accept theirs" resolution.

I think what I would do after resolving the rest of the merge conflicts (if any) is just delete .sqlx / sqlx-data.json and re-generate. In the case of editor integration that auto-runs cargo check and the files always being written, maybe I wouldn't even have to do anything in the directory case? Either way, I don't see it becoming harder with lots of files (except if resolving conflicts in the GitHub web interface, but I generally don't do that much and I imagine one would often fall back to local resolving for other more complex scenarios anyway).

abonander · 2020-07-27T20:24:42Z

If the files were contained in a standard directory then it would probably be possible to set default resolution rules with .gitattributes or the like.

I'm weary of the macros automatically creating such a directory outside of target/ though. Perhaps they shouldn't output any files if the directory doesn't exist, and then cargo sqlx prepare could create that directory.

As for the directory name, maybe .sqlx? The leading dot would hide it by default on *nix, I'm not sure if we want that or not.

abonander · 2020-07-27T20:27:38Z

The reason the macros don't directly write to one file is because they might potentially be run in parallel (not currently I don't think but in the foreseeable future) so if they all tried to rewrite it they would probably clobber each other. However, if the sqlx-data format supported appends (CSV maybe?), I believe those are atomic so multiple processes writing to the same file would be fine. Then cargo sqlx prepare could just go through and clean up the file afterwards.

Although the more that I think about it, this would end up just being an inferior implementation of writing everything to a directory anyway.

jplatte · 2020-07-29T10:42:40Z

Although the more that I think about it, this would end up just being an inferior implementation of writing everything to a directory anyway.

Sounds like changing to one file per query is realistic then? Would this be something that might end up in 0.4, or would you rather release 0.4 with sqlx-data.json and change to single file per query for 0.5, if at all?

jplatte · 2020-07-30T11:47:18Z

I'm weary of the macros automatically creating such a directory outside of target/ though. Perhaps they shouldn't output any files if the directory doesn't exist, and then cargo sqlx prepare could create that directory.

If that's all it does, that seems like a pointless abstraction though. In that case the documentation could just mention that users can create .sqlx if they want the query cache.

abonander · 2020-08-18T02:32:50Z

Actually, I don't think we have to do anything special to arrive at a file format that's good for appends. We could just atomically append JSON objects to a file, each object essentially being the same as the current sqlx-data.json format but with an individual query each, then cargo sqlx prepare can merge these objects together.

serde_json provides a way to parse multiple values from a stream already, the macros on a non-compacted file would just do a linear search to find the data for their query: https://docs.rs/serde_json/1.0.57/serde_json/de/struct.StreamDeserializer.html

Then a user could have a pre-commit hook that runs cargo sqlx prepare --compact to merge the entries without touching the database.

@mehcode any objections?

abonander · 2020-08-18T02:48:18Z

Hmm, on further research it seems like we may not be able to guarantee atomicity of writes to files anyway. It's very OS and filesystem specific and not at all specified.

Multiple files it is then.

abonander · 2020-08-28T00:45:09Z

As per the discussion on Discord, we still have the issue with concurrent writes with multiple files, if the filenames are the hashes of the queries (which is necessary for the macros looking up the query data).

We could make the filenames to be the spans of the macro invocations, which should be unique, but that has other issues:

it duplicates data for identical queries from different macro invocations
it relies on the Debug representation of Span which isn't guaranteed to be a valid filename fragment (which could be resolved by hashing the debug repr I guess), also assumes spans are stable for compilation runs on different machines
making changes unrelated to macro invocations in the same file will move spans around, changing the filename, creating even more duplicate files which would have to be cleaned up in a separate step
if you don't recompile between your last change and the time of commit, the data will be stale anyway

Instead, I propose we encourage users to configure Git to run cargo sqlx prepare as a pre-commit hook. I don't like pre-commit hooks myself because sometimes I just want to commit and push broken code when I need to step away from the computer and they're usually configured to block commits if they fail, but this commit hook doesn't need to do that; it just means that CI will fail (which it should for broken code anyway).

jplatte · 2020-09-14T10:27:01Z

@abonander: What about first writing to a temporary file and then moving that to the actual location? I think I've seen that used to work around concurrent write issues before.

Personally I'm pretty sure I won't use a pre-commit hook for something that takes as long as cargo sqlx prepare because it would disturb my workflow – in particular interactive rebases where I squash or amend a few commits would become extremely annoying.

jplatte · 2020-11-18T12:41:24Z

There's one more advantage to this that hasn't been mentioned: It should improve performance quite a bit for projects with lots of queries, since every macro call only has to parse (possibly with debug-compiled parsing code) its own JSON metadata, not a JSON file with all of the queries for the entire application.

jplatte · 2020-12-11T13:10:00Z

I assume with one file per query, always rebuilding the entire crate also becomes unnecessary, right? I'm not entirely sure but I think it has a decent impact on both perf and cache disk usage.

@abonander @mehcode Would you be interested in a PR that implements this, using the aforementioned strategy of writing to a temporary file and then moving it to the actual location to avoid issues with concurrent writes to the same file?

mehcode · 2020-12-11T23:43:36Z

I'm personally for this, yes. I imagine a sqlx/ folder that is always populated during cargo build that is either gitignored or not depending on preference. E.g., totally remove the concept of an explicit online mode, its just seamless.

jplatte · 2020-12-12T09:56:48Z

This is in scope for 0.4 too, right? I would be happy to also then rebase it on top of 0.5, but if possible would like to see another 0.4.x release that has this.

Edit: Actually, it would probably be too confusing to have the sqlx directory just appear for people who activated the offline feature. So for this to become available in a 0.4.x, the old behaviour would have to still be supported 🤔

jplatte · 2020-12-29T18:50:28Z

Since 0.5 is going to actually not take long, I'm now working on this and am already positive I'll have a PR ready later 🙂

argv-minus-one · 2021-04-03T20:57:43Z

@abonander

Hmm, on further research it seems like we may not be able to guarantee atomicity of writes to files anyway. It's very OS and filesystem specific and not at all specified.

That's true, but replacing a file is atomic. To do that:

Use Builder::tempfile_in from the tempfile crate to create a random-named temporary file (which is atomic) in the same folder (in order to guarantee that the temporary file is on the same file system as the final file, which it must be for step 4 to work).
Write into the file (which isn't atomic but doesn't need to be).
flush the file to ensure that the file's contents actually make it to disk.
persist the file with its final name (which is atomic as long as steps 2 and 3 happen before step 4).

This is guaranteed to result in one of the following:

The file is completely replaced.
The original file is left untouched. (This will happen if replacement fails.)
The original file is left untouched and a stale temporary file is left behind. (This will happen if replacement fails and NamedTempFile's destructor fails to delete the temporary file, usually because of abrupt process termination, OS crash, power failure, or I/O error. It is possible to avoid this on Linux using O_TMPFILE, but the author of tempfile wants consistent behavior across platforms, and I don't think any other platform has equivalent functionality.)

None of these results involve the original file being only partially overwritten. Hence, replacing the file is atomic.

As per the discussion on Discord, we still have the issue with concurrent writes with multiple files, if the filenames are the hashes of the queries (which is necessary for the macros looking up the query data).

I haven't seen the Discord discussion, but if you mean the hash could collide, that shouldn't be an issue if you use a cryptographic hash like SHA256. It's incredibly hard to find collisions in such a hash even intentionally, let alone accidentally.

abonander · 2021-04-03T21:17:30Z

We ultimately came to the conclusion that the concurrent write issue is rather moot since macros aren't expanded in parallel and probably won't be for some time.

It's incredibly hard to find collisions in such a hash even intentionally, let alone accidentally.

I'm not worried about hash collisions, which are certainly unlikely with SHA256, but actually just the user copypasting the exact same query into multiple invocations, which is orders of magnitude more likely.

argv-minus-one · 2021-04-04T01:28:47Z

That shouldn't be a problem, as long as the query data files are replaced atomically (as described above).

mehcode assigned abonander Jul 27, 2020

abonander mentioned this issue Aug 18, 2020

Make cargo sqlx prepare cache whole db structure, instead of individual queries #632

Closed

jplatte mentioned this issue Dec 29, 2020

Store offline data in one file per query / remove sqlx-data.json #941

Closed

5 tasks

jplatte mentioned this issue Apr 20, 2021

Store offline data in one file per query / remove sqlx-data.json #1183

Closed

6 tasks

abonander mentioned this issue Aug 19, 2021

Preparing query data for multiple binaries + lib. #1390

Closed

abonander mentioned this issue Aug 30, 2021

SQLx 0.6.0 Release Tracking Issue #1413

Closed

8 tasks

abonander mentioned this issue Oct 5, 2021

cargo sqlx prepare does not detect queries unless cargo clean is run first #1470

Closed

abonander mentioned this issue Dec 28, 2021

Proposal: New Architecture for sqlx::query!() Macros #1598

Open

abonander mentioned this issue Mar 30, 2022

WIP: [offline] implement one-file-per-query #1770

Closed

3 tasks

cycraig mentioned this issue Sep 21, 2022

[offline] Change prepare to one-file-per-query #2110

Closed

2 tasks

cycraig mentioned this issue Feb 21, 2023

[offline] Change prepare to one-file-per-query #2363

Merged

abonander closed this as completed in #2363 Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to one-file-per-query for sqlx-cli prepare #570

Move to one-file-per-query for sqlx-cli prepare #570

mehcode commented Jul 27, 2020

abonander commented Jul 27, 2020

jplatte commented Jul 27, 2020

abonander commented Jul 27, 2020

abonander commented Jul 27, 2020 •

edited

jplatte commented Jul 29, 2020

jplatte commented Jul 30, 2020

abonander commented Aug 18, 2020 •

edited

abonander commented Aug 18, 2020

abonander commented Aug 28, 2020

jplatte commented Sep 14, 2020

jplatte commented Nov 18, 2020

jplatte commented Dec 11, 2020

mehcode commented Dec 11, 2020

jplatte commented Dec 12, 2020 •

edited

jplatte commented Dec 29, 2020

argv-minus-one commented Apr 3, 2021

abonander commented Apr 3, 2021

argv-minus-one commented Apr 4, 2021

Move to one-file-per-query for sqlx-cli prepare #570

Move to one-file-per-query for sqlx-cli prepare #570

Comments

mehcode commented Jul 27, 2020

abonander commented Jul 27, 2020

jplatte commented Jul 27, 2020

abonander commented Jul 27, 2020

abonander commented Jul 27, 2020 • edited

jplatte commented Jul 29, 2020

jplatte commented Jul 30, 2020

abonander commented Aug 18, 2020 • edited

abonander commented Aug 18, 2020

abonander commented Aug 28, 2020

jplatte commented Sep 14, 2020

jplatte commented Nov 18, 2020

jplatte commented Dec 11, 2020

mehcode commented Dec 11, 2020

jplatte commented Dec 12, 2020 • edited

jplatte commented Dec 29, 2020

argv-minus-one commented Apr 3, 2021

abonander commented Apr 3, 2021

argv-minus-one commented Apr 4, 2021

abonander commented Jul 27, 2020 •

edited

abonander commented Aug 18, 2020 •

edited

jplatte commented Dec 12, 2020 •

edited