subdir for temp files #959

koen84 · 2020-08-21T14:35:57Z

I'm running turbogeth on a 1TB drive, with 197 GiB in tempfiles it got filled to the brim and got stuck. It would be great if temp / disk usage would be taking into account available space.

Regardless, it would be better if tempfiles would get their own subdir (within the datafolder), so it's easier to mount it on a different disk, considering the large size.

mandrigin · 2020-08-21T16:52:10Z

That’s a bit weird, I’m running a node on 1TB drive too and have enough space left (about 200 GB).

But I agree with your point about temp files.

koen84 · 2020-08-22T00:37:12Z

My main server has 656GB data + 197GB temp files (+ OS / swap) = full drive.
My secondary only has 13 GB temp files.

Both managed to get to a full sync and maintain it. Though for some reason the production server went disk full. (The other one is less powerful yet has more diskspace.)

BTW, any reason the data is 1 giant file rather than the typical arsenal of smaller files ? This makes it incompatible with COW filesystems (BTRFS, ZFS, etc).

AskAlexSharov · 2020-08-22T00:59:47Z

Reason for gigantic db file - transactions. But I believe we can split it at-least at 2: blockchain (blocks 120Gb) and everything else. For sure it will not happen in close future, need to be super careful with such change

AskAlexSharov · 2020-08-22T01:03:47Z

About btrfs, it’s interesting but hard question. LMDB - is B+ tree, BTRFS is B tree. Tree on tree must be or redundant or weird. But i don’t believe that BTRFS and ZFS don’t support 1Tb files. Or you mean their cool features become not cool?

AskAlexSharov · 2020-08-22T01:07:28Z

temporary files are 128mb

mandrigin · 2020-08-23T07:35:23Z

Temp files aren't persisted though. They are created during some stages of sync (and their size depends on how much data you are syncing) and then they should be removed.
So if you sync form genesis to the current HEAD block number, during some phases, you might end up with a couple of hundreds of GB of temp files when we generate indexes. Mostly because we generate indexes for the whole chain at that point.
After you catch up and the sync goes from, say block 10.001.000 to 10.002.000, we only need to generate indexes for 1000 blocks, so obviously the size of the temp files will be smaller.

Temp files are used in a couple of stages and during db migrations, not only indexes, but the idea is the same.

mandrigin · 2020-08-23T07:36:20Z

but sure, it definitely makes sense to make a subdir for temp files, I'll take a look at that

koen84 · 2020-08-23T16:27:20Z

@AskAlexSharov 1TB files on a COW filesystem like BTRFS would surely be brutal for perforamance, seeing they'd get many small writes ? My main reason to want to explore this avenue is snapshotting and send+receive thereof, so that in case of issues i can easily revert back.

@mandrigin currently the amount of temp files still growing. I had successfully completed full sync before, so it's only doing catch up since. (Till it got stuck by full disk, which i've since managed to resume.)

My chaindata is 657GiB.
The tg-sync-sortable-buf<9numbers> amount for 272 GiB. These files are 283-285MiB in size. And while i see the log mentioning it's removing some, the total amount is still growing. It seems what's cleanup are newly created files, the old ones all remain. Did it leave files it forgot ? What's the effect of stopping turboget, removing all these files and starting turbogeth ?
I've put this last part in #969 as it might be an issue, that's unrelated to subdir or not.

AlexeyAkhunov · 2020-08-23T16:34:08Z

Thanks for your report! The old temp files files are not cleaned automatically, do at the moment they need to be removed manually. If you stop turbo-geth, and remove the files, it will not have any adverse effect

AskAlexSharov · 2020-08-24T01:17:05Z

I’m sure that 1Tb usual files and 1Tb of mmap files are not equal terabyte. Because only OS can read/write them, not App, and i’m sure that OS has integration with BTRFS for this case. But, yes, definitely we need explore that snapshotting works well.

AskAlexSharov · 2020-08-24T01:20:44Z

What i really mean: BTRFS needs not just small files But knowledge “what changed” for incremental Snapshoting, and OS has information which pages are new in Mmap file and when was updated.

mandrigin self-assigned this Aug 23, 2020

mandrigin mentioned this issue Aug 23, 2020

etl: create a subfolder in datadir for temp files. #965

Merged

AlexeyAkhunov closed this as completed in #965 Aug 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subdir for temp files #959

subdir for temp files #959

koen84 commented Aug 21, 2020

mandrigin commented Aug 21, 2020

koen84 commented Aug 22, 2020

AskAlexSharov commented Aug 22, 2020 •

edited

Loading

AskAlexSharov commented Aug 22, 2020

AskAlexSharov commented Aug 22, 2020

mandrigin commented Aug 23, 2020

mandrigin commented Aug 23, 2020

koen84 commented Aug 23, 2020

AlexeyAkhunov commented Aug 23, 2020

AskAlexSharov commented Aug 24, 2020

AskAlexSharov commented Aug 24, 2020

subdir for temp files #959

subdir for temp files #959

Comments

koen84 commented Aug 21, 2020

mandrigin commented Aug 21, 2020

koen84 commented Aug 22, 2020

AskAlexSharov commented Aug 22, 2020 • edited Loading

AskAlexSharov commented Aug 22, 2020

AskAlexSharov commented Aug 22, 2020

mandrigin commented Aug 23, 2020

mandrigin commented Aug 23, 2020

koen84 commented Aug 23, 2020

AlexeyAkhunov commented Aug 23, 2020

AskAlexSharov commented Aug 24, 2020

AskAlexSharov commented Aug 24, 2020

AskAlexSharov commented Aug 22, 2020 •

edited

Loading