-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark bulk entity create API #1095
Comments
I have some basic CSV parsing in Frontend in place, and I've been benchmarking that parsing. I also tried sending the parsed entities to Backend, and so far, I'm seeing two limits. I'm able to send 6553 entities, but there's a Postgres error for 6554 entities. That's consistent with the limit mentioned above. Happy to share my JSON data if that'd be helpful! Secondly, even though central-backend/lib/http/service.js Line 33 in f98187d
|
I just ran into the 6553 max entity and 250kb JSON limit myself. Removing that limit, I've so far been able to upload 6553 entities with 1000 properties, but not 1500 properties. |
Based on discussion yesterday, we're hoping to be able to bulk-create 50,000 entities with 20 properties each, where each value is 100 characters. |
This seems like a good idea to have in general. Can we lift/override the limit only for this endpoint? |
Two thoughts about the nginx limit:
|
We discussed this today and concluded that it'd be fine to leave the nginx limit where it is today, at 100 MB. We've seen that the size of the JSON request body is usually larger than the CSV file, so this will probably limit the size of CSV files to less than 100 MB. (I think the JSON is larger in large part because certain strings are repeated for each entity:
I've filed getodk/central#611 for this idea, but for now, we're just going to implement getodk/central#610. I've filed that as a separate issue so that we can continue to use this issue to discuss other limits and aspects of benchmarking. |
I just wanted to leave a comment with some observations from @alxndrsn on Slack. @alxndrsn uploaded a CSV file of 100K entities:
It'd be interesting to learn more about why processing the file takes so long. |
Questions about bulk upload limits:
Optimization idea:
There was an optimization using
sql.unnest
made to another part of backend for inserting many form fields at once. Should that approach be used here, too? https://github.com/gajus/slonik?tab=readme-ov-file#inserting-large-number-of-rowsCommit where this was added to
insertMany
: 222b2a8What is the upper limit?
10 parameters per entity def --> maybe ~6K entities able to be inserted at once without the unnest optimization? Things to try!!
Related idea:
Should
createMany
andcreateNew
(multiple vs. single entity creation) be combined? Does it make sense to use the multi case for a single entity or iscreateNew
better optimized for the single-entity scenario?The text was updated successfully, but these errors were encountered: