Report: GC times and improvments from recent optimzations. #3462

kevina · 2016-12-03T05:05:54Z

Repos where created by doing the following:

1000x: random 4000 > afile && ipfs add afile
100x: random-files -q -dirs 4 -files 35 -depth 5 -random-fanout -random-size d && ipfs add -r d
100x: random-files -q -dirs 4 -files 35 -depth 5 -random-fanout -random-size d && ipfs add --pin=false -r d
10x: random-files -q -dirs 4 -files 100 -depth 7 -random-fanout -random-size d && ipfs add -r d
10x: random-files -q -dirs 4 -files 100 -depth 7 -random-fanout -random-size d && ipfs add --pin=false -r d
2x: random-files -q -dirs 4 -files 100 -depth 10 -random-fanout -random-size d && ipfs add -r d
2x: random-files -q -dirs 4 -files 100 -depth 10 -random-fanout -random-size d && ipfs add --pin=false -r d

Other notes:

The -seed option was used in random and random-files in order to get deterministic results.
Repo. where created both with with and without using the --raw-leaves option.
Tests where done before and after any of my optimizations landed.

Implemenation notes:

Due to XXX three IPFS repo's where created, one with normal leaves, one with the incorrect Cid for raw leaves, and one with the correct Cid.
All three repo where on a 32 Gib xfs partition on linux so that I can easily backup the partition to undo the effects of a GC run.

This creates a repo with around 1.6 million small blocks (under 4k), perfect for stress testing the pinner and filesystem. About 50% of the content is pinned.

The G.C. is done in basically two passes, the first pass collects the colored set, that is the set of blocks to keep, the second step scans the blockstore and removes any blocks in the set. I measured each pass seperatly. For the first pass the only thing that made and measurable difference is if raw leaves, which lead to a speedup of over an order of magnitude.

What	Normal Leaves	Raw Leaves	Speedup
Get Colored Set (First Time)	282s	20.7s	13.6x
Get Colored Set (In Cache)	26.3s	2.1s	12.5x

For the second part when there are a significant amount of item's to blocks to delete the time spent deleting the files outweighs all other factors. In addition the time spend is highly depeneding on the
filesystem used. For the xfs paratation I got different numbers depending on which repo. was used. The time was anywhere from 20 to 75 minutes. I have no idea what can be done about deletion time as long as we stick to one file per block. Increasing the prefix length in flatfs might help (there where on average 3,100 files per directory), but I have not investigated that.

After the initial GC was run and there was nothing to delete my optimizations has an effect on the second pass:

What	Pre Opt	Post Opt	Speedup
Normal Leaves, First Time	5.7s	2.6s	2.2x
Normal Leaves, Second Time	5.8s	2.5s	2.3x

Raw Leaves, First Time	217s	4.4s	49x
Raw Leaves, Second Time	5.8s	2.2s	2.6x

For the case with normal leaves, since everything is pinned, the fpass to get the colored already loaded everything in the OS cache so the second time numbers are about the same as the first.

When raw leaves are used the story is very different. Only a small fraction of the blocks needed to be read from the store for the first pass so during the second pass there was still a lot of disk IO.

I double checked the first time results with raw leaves, there really is that large of a speed up after my optimizations. This is most likely due to the fact that after my optimizations the inodes for the files did not need to read from the disk in order to just list the blocks in the flatfs datastore.

In all cases when everything is in the os cache the speed up from my optimization is somewhere between 2x and 3x.

The text was updated successfully, but these errors were encountered:

kevina · 2016-12-03T05:52:12Z

Actually I did discover there was a problem when using raw leaves here are some preliminary numbers:

Total time delete with normal leaves: 23 minutes
Total time delete with raw leaves: 65 minutes

The problem is with raw leaves most of the blocks are in the AFZBE directory in the flatfs here is a count of some of the dirs with raw leaves:

AFZBE 1568285
CIQA2 100
CIQA3 78
CIQA4 68

and without raw leaves

CIQA2 3160
CIQA3 3173
CIQA4 3162
CIQA5 3160

I will report more later.

whyrusleeping · 2016-12-03T06:09:03Z

Ah, interesting. We're going to have to increase the shard width for cidV1... good catch.

kevina · 2016-12-03T06:40:34Z

Ah, interesting. We're going to have to increase the shard width for cidV1... good catch.

@whyrusleeping see #3463 and let's take the discussion on how to handle it over there.

kevina · 2016-12-03T18:29:23Z

After looking more closely at the deletion times and double checking the results I determined in one case having most of the files in a single directory helped runtime:

Total time delete with normal leaves: 23 minutes
Total time delete with raw leaves (incorrect Cid): 65 minutes
Total time delete with raw leaves (correct Cid): 8 minutes

Weird. I guess it is highly depending on how xfs chose to store the files in the large directory.

(Having so many files in a single directory also explains why ext4 was performing badly. I even managed to trigger the dir_index hash bug.)

whyrusleeping · 2016-12-05T22:22:38Z

@kevina It would be really nice to have a script that creates these files and runs the tests you were running and outputs timings. This would be a wonderful thing to be able to put into an automated performance regression test (cc @victorbjelkholm ). Where we could run on a specific machine a set of tests to get performance numbers for each commit (or maybe a nightly checkpoint of master).

That said, these numbers look great! I'm really happy to have all of this in :)

kevina · 2016-12-05T22:33:04Z

@whyrusleeping here is the script to create the repo:
https://gist.github.com/kevina/63efcab752d1a15e2fde6651b76d2a80

For the numbers I just used a hacked up version of IPFS.

rht · 2017-01-30T20:27:56Z

@kevina

Would be really really great if you could wrap the script in a repo. Someone needs to do this, as I'm kind of tired of waiting for arbitrary non-transparent approval from the members/those-with-commit-access, or that this has to be implemented by these people, otherwise it is "wrong" (apologize for this statement beforehand if it might not be according to any of the terms in code-of-conduct.md, but I think it is fair to say what I think is the case while remaining respectful, and to be objectively corrected if it is not the case). I have several local benchmark-scripts that I could put in. There is also the network bitswap duplicate benchmark by @ion1 buried somewhere. Benchmarks are means of structured progress (see e.g. the NIST hash function competition for hash function, Stanford Question Answering Dataset benchmark for AI reading comprehension, or http://www.spacex.com/hyperloop for ... hyperloop), where anyone could objectively verify for veracity, and where it reduces reliance on centralized reviews, for 1. the maintainers might sometimes be wrong, 2. this reduces their workload through automaton. Finally, regardless of how I stated this all, I hope there is a comprehensive benchmark suite, for this reason (I did not think of this leveling use case when I stated it was redundant with natural datasets test case in #3621 (comment)).

whyrusleeping · 2017-01-30T20:38:25Z

@rht If you want to help out with automated performance benchmarks you should take part in our ongoing CI/CD sprint. It would be great to collect these various benchmarks together and have them be automatically run for each new PR.

rht · 2017-01-30T20:59:50Z

(I am in fact in the middle of curating all these scattered benchmarks into https://github.com/rht/sfpi-benchmark)

kevina mentioned this issue Dec 3, 2016

Default 5 digit prefix for flatfs is no longer sufficient for CidV1 #3463

Closed

kevina changed the title ~~Report GC times and improvments from recent optimzations.~~ Report: GC times and improvments from recent optimzations. Dec 3, 2016

kevina mentioned this issue Dec 5, 2016

Improve Garbage Collection Performance #3333

Closed

Kubuxu added status/ready Ready to be worked need/analysis Needs further analysis before proceeding and removed status/ready Ready to be worked labels Dec 5, 2016

kevina mentioned this issue Dec 8, 2016

Provide utility to convert from one Sharding type to another. ipfs/go-ds-flatfs#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report: GC times and improvments from recent optimzations. #3462

Report: GC times and improvments from recent optimzations. #3462

kevina commented Dec 3, 2016

kevina commented Dec 3, 2016

whyrusleeping commented Dec 3, 2016

kevina commented Dec 3, 2016

kevina commented Dec 3, 2016

whyrusleeping commented Dec 5, 2016

kevina commented Dec 5, 2016

rht commented Jan 30, 2017 •

edited

Loading

whyrusleeping commented Jan 30, 2017

rht commented Jan 30, 2017

Report: GC times and improvments from recent optimzations. #3462

Report: GC times and improvments from recent optimzations. #3462

Comments

kevina commented Dec 3, 2016

kevina commented Dec 3, 2016

whyrusleeping commented Dec 3, 2016

kevina commented Dec 3, 2016

kevina commented Dec 3, 2016

whyrusleeping commented Dec 5, 2016

kevina commented Dec 5, 2016

rht commented Jan 30, 2017 • edited Loading

whyrusleeping commented Jan 30, 2017

rht commented Jan 30, 2017

rht commented Jan 30, 2017 •

edited

Loading