Parallelize loading from S3 for better performance #4

vgrichina · 2022-05-13T01:08:31Z

Note that this is relatively naive method which unnecessarily waits for one batch to complete before starting another batch load.

However it still seems to improve performance significantly.

Results in improvement of from about 2 blocks/second to 12+ blocks/second on my machine.

When measured at Hetzner box I use:

Batch size 1 ≈ 5 blocks/second
Batch size 10 ≈ 36 blocks/second
Batch size 20 ≈ 66 blocks/second
Batch size 50 ≈ 145 blocks/second
Batch size 100 ≈ 195 blocks/second
Batch size 200 ≈ 234 blocks/second

It contains improvements on loading speed near/near-lake-framework-js#4

It allows to build code when installing npm package from source

frol

@vgrichina Thanks for looking into it! Let's address the comments and merge it

src/types.ts

frol · 2022-05-13T06:28:43Z

src/s3fetchers.ts

@@ -21,12 +21,13 @@ import { normalizeBlockHeight, parseBody } from "./utils";
 export async function listBlocks(
  client: S3Client,
  bucketName: string,
-  startAfter: BlockHeight
+  startAfter: BlockHeight,
+  limit = 10


Let's default to 200 since it will make less requests, and boosts the throughput quite a bit.

@khorolets Let's make this parameter configurable on Rust side as well and update the default to 200 (somehow we used 100 there, but choose to use 10 in JS version)

looks like 200 might be a bit suboptimal when dealing with more meaty blocks, might need experiments on user's side to tune – e.g. at block #46661963 it seems that 100 is working better

maybe just needs another change to avoid blocking for all of them to be loaded

src/streamer.ts

…onal

khorolets

Many thanks!

P.S.: I've addressed the review suggestion from @frol

vgrichina added 3 commits May 12, 2022 17:59

Have TypeScript as dev dependency

66cad60

Add build scripts to package.json

4dcbf60

Parallelize loading from S3 for better performance

b7de603

vgrichina requested a review from khorolets May 13, 2022 01:08

vgrichina added a commit to fastnear/fast-near that referenced this pull request May 13, 2022

Use GitHub version of near-lake-framework

2d6801d

It contains improvements on loading speed near/near-lake-framework-js#4

Add prepare script

86cfc45

It allows to build code when installing npm package from source

vgrichina force-pushed the parallel-load branch from 856aae2 to 86cfc45 Compare May 13, 2022 01:31

Make block list batch size configurable

61767f5

frol approved these changes May 13, 2022

View reviewed changes

Address review suggestion, make LakeConfig.blocksPreloadPoolSize opti…

20e871a

…onal

khorolets approved these changes May 17, 2022

View reviewed changes

khorolets mentioned this pull request May 17, 2022

Make a fetch pool size (max_keys) configurable parameter near/near-lake-framework-rs#32

Closed

khorolets merged commit 5bb1abd into near:main May 17, 2022

frol mentioned this pull request Jun 16, 2022

feat: improved streaming near/near-lake-framework-rs#38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize loading from S3 for better performance #4

Parallelize loading from S3 for better performance #4

vgrichina commented May 13, 2022 •

edited

Loading

frol left a comment

frol May 13, 2022

vgrichina May 17, 2022

khorolets left a comment

Parallelize loading from S3 for better performance #4

Parallelize loading from S3 for better performance #4

Conversation

vgrichina commented May 13, 2022 • edited Loading

frol left a comment

Choose a reason for hiding this comment

frol May 13, 2022

Choose a reason for hiding this comment

vgrichina May 17, 2022

Choose a reason for hiding this comment

khorolets left a comment

Choose a reason for hiding this comment

vgrichina commented May 13, 2022 •

edited

Loading