Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster upsert #37

Merged
merged 13 commits into from
Mar 31, 2022
Merged

Faster upsert #37

merged 13 commits into from
Mar 31, 2022

Conversation

berekuk
Copy link
Collaborator

@berekuk berekuk commented Mar 29, 2022

PR for #30.

  • DELETE FROM table instead of dropping & recreating
  • bulk inserts with 20-rows chunks (see pgBulkInsert code which replaced pgInsert)
  • wrap upserts in transactions
  • also: measureTime higher order function for common "Took X seconds, or Y minutes." logging pattern
  • also: more typings

DELETE FROM by itself haven't brought any performance improvements, in fact, it made the code significantly slower (but it's still the right thing to do, transaction safety-wise, otherwise frontpage update could coincide with merge job and then the frontpage would be empty).

Bulk inserts are fast. In my local tests time for polymarket upserts went down from 35 seconds to 5.

PS: This PR is stacked on top of #34.

@netlify
Copy link

netlify bot commented Mar 29, 2022

Deploy Preview for metaforecast ready!

Name Link
🔨 Latest commit 111aa8c
🔍 Latest deploy log https://app.netlify.com/sites/metaforecast/deploys/624490718818330009ebc2c2
😎 Deploy Preview https://deploy-preview-37--metaforecast.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@berekuk
Copy link
Collaborator Author

berekuk commented Mar 29, 2022

Ugh, npm decided to remove some stuff from my package-lock.json for some reason, I thought it was fine but it isn't, and netlify build fails now.

Upd: oh, it installed @next/swc-darwin-arm64 instead of linux version.

@berekuk
Copy link
Collaborator Author

berekuk commented Mar 30, 2022

Rebased on top of fresh #34 changes.

One more benchmark: on my local db (incomplete, includes only some platforms):

Inserted 1988 rows with approximate cummulative size 3 MB into latest.combined.
...
Took 18.949 seconds, or 0.3158166666666667 minutes.

On prod:

Inserted 5001 rows with approximate cummulative size 13.9 MB into latest.combined.
...
Took 330.071 seconds, or 5.5011833333333335 minutes.

@berekuk
Copy link
Collaborator Author

berekuk commented Mar 30, 2022

Also: this speed-up might be almost entirely due to ping distance from Heroku to DO.

I checked the ping distance with tcp-ping from Heroku to DO and it's not good:

~ $ npm i tcp-ping

[...]

~ $ node
Welcome to Node.js v16.14.2.
Type ".help" for more information.
> const {ping}=require('tcp-ping');
undefined
> ping({address: 'postgres-red-do-user-10290909-0.b.db.ondigitalocean.com', port: 25060}, (err, data) => {console.log(data)})
undefined
> {
  address: 'postgres-red-do-user-10290909-0.b.db.ondigitalocean.com',
  port: 25060,
  attempts: 10,
  avg: 92.8678741,
  max: 96.414132,
  min: 89.406278,
  results: [
    { seq: 0, time: 92.711855 },
    { seq: 1, time: 93.537151 },
    { seq: 2, time: 89.406278 },
    { seq: 3, time: 92.171002 },
    { seq: 4, time: 90.283298 },
    { seq: 5, time: 93.572547 },
    { seq: 6, time: 89.975433 },
    { seq: 7, time: 95.535087 },
    { seq: 8, time: 96.414132 },
    { seq: 9, time: 95.071958 }
  ]
}

So, for 5000 rows inserted one by one it would take... actually, 450 seconds instead 330, not sure what that's about :)

This might be the good reason to move from Heroku to DO, or at least to look into whether there's a Heroku-DO datacenter pair with lower ping distance.

Also, this might eventually cause us to move API back from Netlify closer to the DB (but still keep it implemented with Next.js); if our APIs will become more complicated and require multiple DB queries then ping distance will heavily affect its performance.

Single-query requests shouldn't be affected by this, though: Netlify probably runs its functions on edge close to the user, and user->db ping distance is unavoidable (at least without much more complicated setups with db replicas on each continent, which we surely don't want to deal with now).

@NunoSempere
Copy link
Collaborator

Yeah, I previouusly prefered to have my DO instance in Europe (data protection laws are better). But I changed it to the US in the new team database.

@NunoSempere NunoSempere merged commit 8060303 into master Mar 31, 2022
@berekuk berekuk mentioned this pull request Apr 2, 2022
@berekuk berekuk deleted the faster-upsert branch April 3, 2022 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants