-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(client): Use json_each
for batch insert of subscription data
#1157
Conversation
Investigated failing e2e - since blobs cannot be trivially serialized and it would require knowledge of what is being serialized and how in order to properly deserialize the appeal of the simplicity of this approach is lost to the need to introduce these kinds of checks and conversions. Similar thing for special JS number values (Infinity etc). Delegating to the SQLite wrapper's bind parameter implementation is a safer choice. It would still be possible to use this approach for the shadow table as it is simple and only has strings and number by definition but I'm not sure if having two methods to do batch inserts that have similar performance has any benefit, it only complicates the codebase. If you agree @kevin-dp we can close the source issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this touches the same code as #1038 which introduces a query builder for the snapshot queries it should not be merged until after that as PG on the client is a priority right now. I've not looked at how this PR would need to change to be compatible with it, or if we can have alternative methods for both SQLite and Postgres.
It would be interesting to benchmark the same JSON based method on top of Postgres and PGlite too.
@samwillis if you look at my comment above inserting data using Given the minimal performance gains from using this approach (at least from my preliminary benchmarking), once we need to start handling special cases for serialization and deserialization that also depend on the SQLite version and json functions, the "simplification" benefit is kind of lost. Sticking with SQLite parameter binding I think is much safer and better to deal with the limit on number of parameters to bind. If you agree I will close this PR and the referenced issue as well. |
Hi Stefanos, i agree with you on closing this for now. It looks appealing at first, but if we have to start special casing, the simplicity benefit is lost. If we would introduce this purely for performance reasons, then - as mentioned by Sam - we will also want to benchmark it on PG and PGlite. |
Addresses #616
I've benchmarked the difference for our
_applySubscriptionData
method as a whole, below you can see a small bar chart for results using better_sqlite3 in Node where the difference between the previous and new method is within +-2% of each other across all ranges (and that is inserting simple child and parent records).I've also benchmarked it using wa-sqlite with our canonical items example, and at 50-100k items the initial sync with the new
json_each
approach was consistently ~10% faster. For small numbers of records the JSON serialization and deserialization might be a small overhead but with larger numbers of records I suspect we gain speed because there's fewer back and forths with sqlite.Overall my main argument for opting for this method over the previous one is that it requires fewer magic numbers and
maxSqlParameters
configurations and is relatively easy to understand and relies more on SQLite than it does on client code.