Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ETL file size configurable #6927

Merged
merged 11 commits into from Mar 13, 2024

Conversation

SozinM
Copy link
Contributor

@SozinM SozinM commented Mar 2, 2024

closes #6696

@SozinM SozinM marked this pull request as ready for review March 2, 2024 07:24
@DaniPopes DaniPopes requested a review from joshieDo March 6, 2024 13:46
Copy link
Member

@DaniPopes DaniPopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, pending @joshieDo and conflicts

book/run/config.md Outdated Show resolved Hide resolved
book/run/config.md Outdated Show resolved Hide resolved
crates/config/src/config.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@joshieDo joshieDo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! HeaderStage also uses ETL, so it would require it as well.

Personally, I'd rather have a config shared by all stages, rather than per stage as it seems. wdyt @shekhirin

@shekhirin
Copy link
Collaborator

shekhirin commented Mar 6, 2024

great! HeaderStage also uses ETL, so it would require it as well.

Personally, I'd rather have a config shared by all stages, rather than per stage as it seems. wdyt @shekhirin

that makes sense, I agree. ETL doesn't depend on the stage workload type, and the config only depends on the available resources.

Changes from @shekhirin

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
@SozinM
Copy link
Contributor Author

SozinM commented Mar 6, 2024

@joshieDo Will do, but note that the current default size for these temporary folders is 100Mb so my change will increase it from 200 Mb total to 1Gi total.
cc @shekhirin

hash_collector: Collector::new(100 * (1024 * 1024)),

@SozinM
Copy link
Contributor Author

SozinM commented Mar 6, 2024

Made a refactoring, and created a dedicated config etl for all common etl-related configs.
I use this config field in DefaultStage and TransactionLookupStage. I didn't find an easy way to use DefaultStage fields from TransactionLookupStage, so suggest if you can think of any way to make it prettier.

Comment on lines 89 to 90
hash_collector: Collector::new(tempdir.clone(), etl_file_size),
header_collector: Collector::new(tempdir, etl_file_size),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshieDo what do you think if we do etl_file_size / 2 here, so the ETL file size from config means how much max disk space to allocate for ETL files per stage?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@shekhirin shekhirin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only documentation nits, LGTM otherwise!

book/run/config.md Outdated Show resolved Hide resolved

```toml
[stages.etl]
# The size of temporary file in bytes for ETL data collector.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we do / 2 now, it's about the total size of ETL files that the node can allocate, and not just one

Copy link
Collaborator

@joshieDo joshieDo Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's about the total size of ETL files

it's not though, it's the threshold size before creating a new ETL file, important so memory isnt blown

book/cli/reth/stage/run.md Outdated Show resolved Hide resolved
@SozinM
Copy link
Contributor Author

SozinM commented Mar 11, 2024

Okay, I tried to rebase and hell broke lose :)
So I will try tomorrow again

@joshieDo
Copy link
Collaborator

joshieDo commented Mar 12, 2024

high prio, so i'll be taking over the rebase and push it. thanks!

@joshieDo joshieDo added this pull request to the merge queue Mar 13, 2024
Merged via the queue into paradigmxyz:main with commit 5d6ac4c Mar 13, 2024
27 checks passed
@SozinM
Copy link
Contributor Author

SozinM commented Mar 13, 2024

thank you @joshieDo !

@gakonst
Copy link
Member

gakonst commented Mar 13, 2024

Thanks for taking this on @SozinM !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make ETL file size configurable
5 participants