feat(client): Use `json_each` for batch insert of subscription data #1157

msfstef · 2024-04-15T13:42:27Z

Addresses #616

I've benchmarked the difference for our _applySubscriptionData method as a whole, below you can see a small bar chart for results using better_sqlite3 in Node where the difference between the previous and new method is within +-2% of each other across all ranges (and that is inserting simple child and parent records).

I've also benchmarked it using wa-sqlite with our canonical items example, and at 50-100k items the initial sync with the new json_each approach was consistently ~10% faster. For small numbers of records the JSON serialization and deserialization might be a small overhead but with larger numbers of records I suspect we gain speed because there's fewer back and forths with sqlite.

Overall my main argument for opting for this method over the previous one is that it requires fewer magic numbers and maxSqlParameters configurations and is relatively easy to understand and relies more on SQLite than it does on client code.

msfstef · 2024-04-15T14:44:24Z

Investigated failing e2e - since blobs cannot be trivially serialized and it would require knowledge of what is being serialized and how in order to properly deserialize the appeal of the simplicity of this approach is lost to the need to introduce these kinds of checks and conversions. Similar thing for special JS number values (Infinity etc).

Delegating to the SQLite wrapper's bind parameter implementation is a safer choice.

It would still be possible to use this approach for the shadow table as it is simple and only has strings and number by definition but I'm not sure if having two methods to do batch inserts that have similar performance has any benefit, it only complicates the codebase.

If you agree @kevin-dp we can close the source issue?

samwillis

As this touches the same code as #1038 which introduces a query builder for the snapshot queries it should not be merged until after that as PG on the client is a priority right now. I've not looked at how this PR would need to change to be compatible with it, or if we can have alternative methods for both SQLite and Postgres.

It would be interesting to benchmark the same JSON based method on top of Postgres and PGlite too.

msfstef · 2024-04-16T10:56:25Z

@samwillis if you look at my comment above inserting data using json_each requires first serializing our input and then deserializing it within the context of SQLite (or Postgres soon).

Given the minimal performance gains from using this approach (at least from my preliminary benchmarking), once we need to start handling special cases for serialization and deserialization that also depend on the SQLite version and json functions, the "simplification" benefit is kind of lost. Sticking with SQLite parameter binding I think is much safer and better to deal with the limit on number of parameters to bind.

If you agree I will close this PR and the referenced issue as well.

kevin-dp · 2024-04-17T12:27:05Z

Hi Stefanos, i agree with you on closing this for now. It looks appealing at first, but if we have to start special casing, the simplicity benefit is lost. If we would introduce this purely for performance reasons, then - as mentioned by Sam - we will also want to benchmark it on PG and PGlite.

msfstef added 5 commits April 15, 2024 16:07

Replace batch insert statement utility with json_each one

f7706a8

Replace batch insertion method in satellite process

f802ecf

Remove maxSqlParameters references in process

38e7b68

Fix formatting issues

0c79645

Add changeset

a19e30c

msfstef requested a review from kevin-dp April 15, 2024 13:42

samwillis self-requested a review April 15, 2024 16:01

samwillis reviewed Apr 15, 2024

View reviewed changes

msfstef mentioned this pull request Apr 17, 2024

Use json_each for bulk data load #616

Closed

msfstef closed this Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(client): Use `json_each` for batch insert of subscription data #1157

feat(client): Use `json_each` for batch insert of subscription data #1157

msfstef commented Apr 15, 2024

msfstef commented Apr 15, 2024

samwillis left a comment

msfstef commented Apr 16, 2024

kevin-dp commented Apr 17, 2024

feat(client): Use json_each for batch insert of subscription data #1157

feat(client): Use json_each for batch insert of subscription data #1157

Conversation

msfstef commented Apr 15, 2024

msfstef commented Apr 15, 2024

samwillis left a comment

Choose a reason for hiding this comment

msfstef commented Apr 16, 2024

kevin-dp commented Apr 17, 2024

feat(client): Use `json_each` for batch insert of subscription data #1157

feat(client): Use `json_each` for batch insert of subscription data #1157