Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(client): Use json_each for batch insert of subscription data #1157

Closed
wants to merge 5 commits into from

Conversation

msfstef
Copy link
Contributor

@msfstef msfstef commented Apr 15, 2024

Addresses #616

I've benchmarked the difference for our _applySubscriptionData method as a whole, below you can see a small bar chart for results using better_sqlite3 in Node where the difference between the previous and new method is within +-2% of each other across all ranges (and that is inserting simple child and parent records).

image

I've also benchmarked it using wa-sqlite with our canonical items example, and at 50-100k items the initial sync with the new json_each approach was consistently ~10% faster. For small numbers of records the JSON serialization and deserialization might be a small overhead but with larger numbers of records I suspect we gain speed because there's fewer back and forths with sqlite.

Overall my main argument for opting for this method over the previous one is that it requires fewer magic numbers and maxSqlParameters configurations and is relatively easy to understand and relies more on SQLite than it does on client code.

@msfstef msfstef requested a review from kevin-dp April 15, 2024 13:42
@msfstef
Copy link
Contributor Author

msfstef commented Apr 15, 2024

Investigated failing e2e - since blobs cannot be trivially serialized and it would require knowledge of what is being serialized and how in order to properly deserialize the appeal of the simplicity of this approach is lost to the need to introduce these kinds of checks and conversions. Similar thing for special JS number values (Infinity etc).

Delegating to the SQLite wrapper's bind parameter implementation is a safer choice.

It would still be possible to use this approach for the shadow table as it is simple and only has strings and number by definition but I'm not sure if having two methods to do batch inserts that have similar performance has any benefit, it only complicates the codebase.

If you agree @kevin-dp we can close the source issue?

@samwillis samwillis self-requested a review April 15, 2024 16:01
Copy link
Contributor

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this touches the same code as #1038 which introduces a query builder for the snapshot queries it should not be merged until after that as PG on the client is a priority right now. I've not looked at how this PR would need to change to be compatible with it, or if we can have alternative methods for both SQLite and Postgres.

It would be interesting to benchmark the same JSON based method on top of Postgres and PGlite too.

@msfstef
Copy link
Contributor Author

msfstef commented Apr 16, 2024

@samwillis if you look at my comment above inserting data using json_each requires first serializing our input and then deserializing it within the context of SQLite (or Postgres soon).

Given the minimal performance gains from using this approach (at least from my preliminary benchmarking), once we need to start handling special cases for serialization and deserialization that also depend on the SQLite version and json functions, the "simplification" benefit is kind of lost. Sticking with SQLite parameter binding I think is much safer and better to deal with the limit on number of parameters to bind.

If you agree I will close this PR and the referenced issue as well.

@kevin-dp
Copy link
Contributor

Hi Stefanos, i agree with you on closing this for now. It looks appealing at first, but if we have to start special casing, the simplicity benefit is lost. If we would introduce this purely for performance reasons, then - as mentioned by Sam - we will also want to benchmark it on PG and PGlite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants