swarm/storage: Support for uploading 100gb files #1395

jmozah · 2019-05-14T14:54:34Z

Before: Upload 100gb file fails
After: 100gb upload passes in a 2 node cluster.

supersedes #1357

Go routine leaks:
===============

Problem: The go-routines spawned in the storeChunk() of "hasherstore.go" was not controlled. i.e. For every chunk a new go-routine was spawned leading to millions of go-routines when you upload a large file.

Fix: This was fixed by making the no of go-routines have a max limit and having a queue to help in the back pressure. So a max of 150 routines will be spawned (128 for data chunks in a branch + few for the tree chunks). Every routine will pick a chunk from the worker queue and push it to the storage. When all the chunks are over, they wait for the close signal from Wait() routine and die. The old contents of the Wait() function was moved to a new function called startWait() so that it can independently collect the chunk stats. We tried to use tags for doing the chunk stats, but unit testing became too challenging as stats are incremented all over the storage code.

Chunk corruption when uploading file greater than 8.5gb in size
=== ===============================================

Problem: Whenever a file greater than 8.5Gb approx. was uploaded, then downloaded, the diffs would not match. This did not occur in smaller files.

Fix: The buildTree() function in pyramid.go is called every-time the data chunks reaches 128 or the file is over. This function is responsible for creating the tree chunk which binds the created data chunks in to the rest of the merkel tree. The first loop in this function determines how far the tree has wrapped. For example: if the current branch is 127th, then by adding a new tree chunk, you not only have to add the data chunks to this new tree chunk... but also since this is a chunk boundary, you go one level up. In some cases, when you add a final chunk, you close many levels on the top.
Ex: If level 0,1,2,3 has 127 branches, by adding a final tree chunk in level 0, level 0,1,2,3 all level becomes closed. This was not properly handles in the code. Now, when every tree chunk is created, we go back up to 128th level to see if this addition is going to trigger a snowball effect on the rest of the top levels.

swarm/swarm.go

swarm/storage/localstore/localstore.go

swarm/storage/hasherstore.go

acud · 2019-05-14T16:15:56Z

@jmozah also please note that the travis build is failing: https://travis-ci.org/ethersphere/go-ethereum/builds/532362762?utm_source=github_status&utm_medium=notification

nonsense

SGTM. Number of goroutines is indeed lower with this change.

I'd be nice to improve documentation of this module.

acud

almost there, few more minor things @jmozah <3

swarm/storage/hasherstore.go

nolash

In #1356 you mention:

Go routine leaks
Chunk corruption when uploading file greater than 8.5gb in size
DBCapacity lower than uploaded file size issue

It would be nice to have a better description of what exactly in the implementation causes the problems in 1. and 2. Being unfamiliar with this code, I don't know how to perform anything but a very superficial review. I've requested more detailed information for you twice already, but can't say I see much of an improvement. I'm not going to ask a third time. So I'll mark this review as "comment" instead of "changes requested," in case the lack of better descriptions means you are ok with a merely superficial review, in which case don't mind me and go ahead and merge.

By the way, it seems issue 3. above is abandoned? I saw a question from @acud related to this in a previous version of this PR, but no conclusion:

#1357 (comment)

nolash · 2019-06-07T06:37:12Z

storage/hasherstore.go

 type hasherStore struct {
 	store     ChunkStore
 	tag       *chunk.Tag
 	toEncrypt bool
+	doWait    sync.Once


please comment.,

why do we need this now and not before.

This is to trigger the singleton startWait(), which keeps track of results of all the storage go-routines. Please look at the latest push. I regressed when moving the code through several swarm branches.

I regressed when moving the code through several swarm branches.

?

I PRd my initial work to the previous repo. But when i moved to the new repo.. i regressed. :-)

nolash · 2019-06-07T06:43:06Z

storage/hasherstore.go


 	"github.com/ethersphere/swarm/chunk"
 	"github.com/ethersphere/swarm/storage/encryption"
 	"golang.org/x/crypto/sha3"
 )

+const (
+	noOfStorageWorkers = 150


Why this number?

Data chunks are batched as 128 to form a tree chunk. If we want this storage to be in sync with the chunker speed, its better to have all 128 data chunks pushed using separate go-routines of their own. I added few more to take care of the Tree chunks. This will allow both the chunker and storage to move in steps of 128.

nolash · 2019-06-07T06:44:05Z

storage/pyramid.go

@@ -204,7 +204,7 @@ func (pc *PyramidChunker) decrementWorkerCount() {

 func (pc *PyramidChunker) Split(ctx context.Context) (k Address, wait func(context.Context) error, err error) {
 	pc.wg.Add(1)
-	pc.prepareChunks(ctx, false)
+	go pc.prepareChunks(ctx, false)


Is the introduction of concurrency here relevant to solving the bug, or is it unrelated?

This is primarily for improving concurrency.

That much is obvious :)

prepareChunks is not documented, so it's difficult to say what its total scope is. However, further down in this function you are blocking until whatever is started there finishes. Why and how, then, does introducing this new thread improve the performance? Is it merely in order to execute the next goroutine and enter the select earlier? Doesn't that really mean that the prepareChunks in reality is a synchronous function?

True. I cant remember why i added this :-)

nolash · 2019-06-07T06:53:48Z

storage/pyramid.go

@@ -539,6 +539,15 @@ func (pc *PyramidChunker) buildTree(isAppend bool, ent *TreeEntry, chunkWG *sync
 		if lvlCount >= pc.branches {
 			endLvl = lvl + 1
 			compress = true
+
+			// Move up the chunk level to see if there is any boundary wrapping
+			for uprLvl := endLvl; uprLvl < pc.branches; uprLvl++ {


I still have no idea how this works.

What is endLvl? Is a level a level in the tree? Why does it then start at 128? (chunkSize / hashSize). How does pc.chunkLevel relate to this?

I am not sure how it is possible for someone not familiar with this code to verify that these changes are sane and correct without a better description of the actual detailed issue and how it is solved. Sorry.

Level always refer to tree chunk levels. You can read the huge comment in the top of pyramid.go to understand it more. I am sorry for the way chunker is written. I will someday make it more readable for you :-)

When the number of data chunks produced reaches 128, buildTree() is called. This creates a new TreeChunk and packs it's data with 128 pointers and other meta data. Finally it checks if the tree chunk created newly is the 128th chunk in that level, if it is, then it goes one level up, creates a new tree chunk and binds all the tree chunk one level below.

This check has to be done from the present level of the tree chunk to highest affected level (or endLvl). SO the loop you see from level to endLvl does that.

I understand. Partly.

if lvlCount >= pc.branches {

Here lvlCount is the amount of input data chunks that have not yet been "built" as a tree? You write:

When the number of data chunks produced reaches 128

Why is this then >= and not ==?

for uprLvl := endLvl; uprLvl < pc.branches; uprLvl++ {

What is the effect of using pc.branches as the boundary for this loop?

nolash · 2019-06-07T07:04:59Z

By the way I don't know if it's related by tests are failing: https://travis-ci.org/ethersphere/swarm/jobs/540798086

acud · 2019-06-07T07:14:59Z

By the way I don't know if it's related by tests are failing: https://travis-ci.org/ethersphere/swarm/jobs/540798086

It is related to tags. I'll follow up with zahoor on this.

jmozah · 2019-06-11T10:36:12Z

@nolash Sorry for not replying you earlier.. Didn't notice your requests for documentation. Here it comes. If you feel this will help maintain the code, i can add it in code too. Personally, i would not bloat the code with so much text.

Go routine leaks

Problem: The go-routines spawned in the storeChunk() of "hasherstore.go" was not controlled. i.e. For every chunk a new go-routine was spawned leading to millions of go-routines when you upload a large file.

Fix: This was fixed by making the no of go-routines have a max limit and having a queue to help in the back pressure. So a max of 150 routines will be spawned (128 for data chunks in a branch + few for the tree chunks). Every routine will pick a chunk from the worker queue and push it to the storage. When all the chunks are over, they wait for the close signal from Wait() routine and die. The old contents of the Wait() function was moved to a new function called startWait() so that it can independently collect the chunk stats. We tried to use tags for doing the chunk stats, but unit testing became too challenging as stats are incremented all over the storage code.

Chunk corruption when uploading file greater than 8.5gb in size

Problem: Whenever a file greater than 8.5Gb approx. was uploaded, then downloaded, the diffs would not match. This did not occur in smaller files.

Fix: The buildTree() function in pyramid.go is called every-time the data chunks reaches 128 or the file is over. This function is responsible for creating the tree chunk which binds the created data chunks in to the rest of the merkel tree. The first loop in this function determines how far the tree has wrapped. For example: if the current branch is 127th, then by adding a new tree chunk, you not only have to add the data chunks to this new tree chunk... but also since this is a chunk boundary, you go one level up. In some cases, when you add a final chunk, you close many levels on the top.
Ex: If level 0,1,2,3 has 127 branches, by adding a final tree chunk in level 0, level 0,1,2,3 all level becomes closed. This was not properly handled in the code. Now, when every tree chunk is created, we go back up to 128th level to see if this addition is going to trigger a snowball effect on the rest of the top levels.

nolash · 2019-06-13T06:53:52Z

@jmozah thanks

If level 0,1,2,3 has 127 branches

You mean here if the data length of the levels modulo 128 is 127?

In some cases, when you add a final chunk, you close many levels on the top.

In fact this only happens when you have two levels of intermediate chunks or more, right?Especially in complex concepts like these we should take care to be explicit about such things when we describe them.

nonsense · 2019-06-17T11:50:24Z

storage/chunker_test.go

@@ -238,7 +238,7 @@ func TestRandomData(t *testing.T) {
 	// This test can validate files up to a relatively short length, as tree chunker slows down drastically.
 	// Validation of longer files is done by TestLocalStoreAndRetrieve in swarm package.
 	//sizes := []int{1, 60, 83, 179, 253, 1024, 4095, 4096, 4097, 8191, 8192, 8193, 12287, 12288, 12289, 524288, 524288 + 1, 524288 + 4097, 7 * 524288, 7*524288 + 1, 7*524288 + 4097}
-	sizes := []int{1, 60, 83, 179, 253, 1024, 4095, 4097, 8191, 8192, 12288, 12289, 524288}
+	sizes := []int{1, 60, 83, 179, 253, 1024, 4095, 4097, 8191, 8192, 12288, 12289, 524288, 2345678}


This is also passing fine on master, not sure why we are adding another test case?

It does not affect the old code.This was added here to catch bugs in this PR.. We have 150 storage threads now...This size 2345678 will exhaust the 150 storage go-routines.. and if any one of them is not release the properly.. the test case will fails.

OK, fair enough, it doesn't make the test suite much slower on Travis, so its fine.

@jmozah please add it in the commented line above as well - we don't run all these tests, because Travis is slow, but it is expected that if you touch this code, you run them manually locally.

This code is not changed often, so we decided this is an OK compromise to make.

jmozah · 2019-06-17T11:51:00Z

@jmozah thanks

If level 0,1,2,3 has 127 branches

You mean here if the data length of the levels modulo 128 is 127?

I meant the no of tree chunks in the levels....

In some cases, when you add a final chunk, you close many levels on the top.

In fact this only happens when you have two levels of intermediate chunks or more, right?Especially in complex concepts like these we should take care to be explicit about such things when we describe them.

Not really. In this particular bug, it went three levels up.
In the example below,

The image only shows tree chunks.. Data chunks are one level below the lowest tree level.

Let's assume the green part of the tree is already existing. When a tree chunk is added atlevel 0 (red colour chunk), It triggers a closure on level1, level2 & level3 (orange chunks). Level4 will get one more chunk so a new level can be formed called level4 and the tree should get organised.

The code for this orange chunks was already present.. but the loop used to go up until only 1 level. With this fix, it goes up multiple level and takes care of it.

Hope this description is explicit enough and helps you understand the fix.

nonsense · 2019-06-17T11:53:45Z

@jmozah thanks for the description of the bugs this is fixing and for the PR in general. It is much clearer to me now, than before.

It would have been nice if there is a test for bug 2, but I realise it might not be trivial.

nolash · 2019-06-17T14:56:04Z

@jmozah ah yes three layers 4096*(128^3) = ~8GB of course sorry

jmozah · 2019-06-17T16:44:54Z

It would have been nice if there is a test for bug 2, but I realise it might not be trivial.

Yes. This is hard to test in a unit test. But a good candidate to add in smoke test / functional test if we have them.

jmozah · 2019-06-17T16:46:31Z

@jmozah ah yes three layers 4096*(128^3) = ~8GB of course sorry

Yeah.. somewhere close to that number was the error happening.

jmozah added 5 commits May 14, 2019 20:12

swarm/storage: Fix gouroutine leak while uploading large files

fb22830

swarm/storage: Fix pyramid chunker to wrap properly level3 and above

16d5bb4

swarm/storage: Increasing the defailt capacity to accomodate 100gb file

5ca3e87

swarm: Bypass network layer for local upload

5bf77d4

swarm/storage: Fix leaking go-routine of startWait function

b093210

acud reviewed May 14, 2019

View reviewed changes

swarm/swarm.go Outdated Show resolved Hide resolved

swarm/storage/localstore/localstore.go Outdated Show resolved Hide resolved

zelig reviewed May 14, 2019

View reviewed changes

swarm/storage/hasherstore.go Outdated Show resolved Hide resolved

swarm/storage/hasherstore.go Outdated Show resolved Hide resolved

swarm/storage/hasherstore.go Outdated Show resolved Hide resolved

swarm/storage/hasherstore.go Outdated Show resolved Hide resolved

jmozah added 3 commits May 15, 2019 15:10

swarm: Reverting the localstore shortcut

1e62a71

Fix lint errors

1c967ff

Fix review comments

c50cd45

nonsense previously approved these changes May 15, 2019

View reviewed changes

swarm/storage: Using chunk tags to end wait

29bd92e

acud suggested changes May 27, 2019

View reviewed changes

swarm/storage/hasherstore.go Outdated Show resolved Hide resolved

swarm/storage/hasherstore.go Outdated Show resolved Hide resolved

swarm/storage/hasherstore.go Outdated Show resolved Hide resolved

swarm/storage/hasherstore.go Outdated Show resolved Hide resolved

jmozah added 4 commits May 28, 2019 12:19

Merge branch 'master' of https://github.com/ethersphere/go-ethereum

0434546

Merge branch 'master' of https://github.com/ethersphere/go-ethereum

f81c144

Fixing review comments

7b24bdc

Removed pin files that got inside this commit

b59d0dc

acud previously approved these changes Jun 3, 2019

View reviewed changes

jmozah added 5 commits June 3, 2019 15:56

Fix the goimports

6ee4f8d

Merge branch 'master' of https://github.com/ethersphere/go-ethereum

1a29bbb

Triggering build with new repo

8889fb3

Merge branch 'master' of https://github.com/ethersphere/swarm

83127ff

Merge branch 'master' of https://github.com/ethersphere/swarm

f69133b

nolash reviewed Jun 7, 2019

View reviewed changes

storage: Fix failing test, refactor Wait function

961e039

acud assigned jmozah and acud Jun 10, 2019

FantasticoFox mentioned this pull request Jun 10, 2019

[EPIC] Datafund 100 GB Epic ethersphere/user-stories#2

Open

8 tasks

jmozah added 2 commits June 11, 2019 11:31

storage: Reverting back to local chunk accounting instead of tags

262ab51

storage: bring back the go-routine stuck fix which was missed

66150a0

Merge branch 'master' of https://github.com/ethersphere/swarm

8bf54eb

Merge branch 'master' into master

4c2cfe7

acud dismissed stale reviews from nonsense and themself via 4c2cfe7 June 12, 2019 19:16

nonsense added this to the 0.4.2 milestone Jun 14, 2019

acud approved these changes Jun 15, 2019

View reviewed changes

acud requested a review from zelig June 17, 2019 09:11

nonsense reviewed Jun 17, 2019

View reviewed changes

nonsense approved these changes Jun 17, 2019

View reviewed changes

acud merged commit f57d4f0 into ethersphere:master Jun 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swarm/storage: Support for uploading 100gb files #1395

swarm/storage: Support for uploading 100gb files #1395

jmozah commented May 14, 2019 •

edited

Loading

acud commented May 14, 2019

nonsense left a comment •

edited

Loading

acud left a comment

nolash left a comment •

edited

Loading

nolash Jun 7, 2019

jmozah Jun 11, 2019

nolash Jun 13, 2019

jmozah Jun 17, 2019

nolash Jun 7, 2019

jmozah Jun 11, 2019 •

edited

Loading

nolash Jun 7, 2019

jmozah Jun 11, 2019

nolash Jun 13, 2019 •

edited

Loading

jmozah Jun 19, 2019

nolash Jun 7, 2019

jmozah Jun 11, 2019

jmozah Jun 11, 2019

nolash Jun 13, 2019 •

edited

Loading

nolash commented Jun 7, 2019

acud commented Jun 7, 2019

jmozah commented Jun 11, 2019 •

edited

Loading

nolash commented Jun 13, 2019

nonsense Jun 17, 2019

jmozah Jun 17, 2019

nonsense Jun 17, 2019

jmozah commented Jun 17, 2019 •

edited

Loading

nonsense commented Jun 17, 2019

nolash commented Jun 17, 2019 •

edited

Loading

jmozah commented Jun 17, 2019

jmozah commented Jun 17, 2019

swarm/storage: Support for uploading 100gb files #1395

swarm/storage: Support for uploading 100gb files #1395

Conversation

jmozah commented May 14, 2019 • edited Loading

acud commented May 14, 2019

nonsense left a comment • edited Loading

Choose a reason for hiding this comment

acud left a comment

Choose a reason for hiding this comment

nolash left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmozah Jun 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nolash Jun 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nolash Jun 13, 2019 • edited Loading

Choose a reason for hiding this comment

nolash commented Jun 7, 2019

acud commented Jun 7, 2019

jmozah commented Jun 11, 2019 • edited Loading

nolash commented Jun 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmozah commented Jun 17, 2019 • edited Loading

nonsense commented Jun 17, 2019

nolash commented Jun 17, 2019 • edited Loading

jmozah commented Jun 17, 2019

jmozah commented Jun 17, 2019

jmozah commented May 14, 2019 •

edited

Loading

nonsense left a comment •

edited

Loading

nolash left a comment •

edited

Loading

jmozah Jun 11, 2019 •

edited

Loading

nolash Jun 13, 2019 •

edited

Loading

nolash Jun 13, 2019 •

edited

Loading

jmozah commented Jun 11, 2019 •

edited

Loading

jmozah commented Jun 17, 2019 •

edited

Loading

nolash commented Jun 17, 2019 •

edited

Loading