-
Notifications
You must be signed in to change notification settings - Fork 110
swarm/storage: Support for uploading 100gb files #1395
Conversation
@jmozah also please note that the travis build is failing: https://travis-ci.org/ethersphere/go-ethereum/builds/532362762?utm_source=github_status&utm_medium=notification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Number of goroutines is indeed lower with this change.
I'd be nice to improve documentation of this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost there, few more minor things @jmozah <3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1356 you mention:
- Go routine leaks
- Chunk corruption when uploading file greater than 8.5gb in size
- DBCapacity lower than uploaded file size issue
It would be nice to have a better description of what exactly in the implementation causes the problems in 1. and 2. Being unfamiliar with this code, I don't know how to perform anything but a very superficial review. I've requested more detailed information for you twice already, but can't say I see much of an improvement. I'm not going to ask a third time. So I'll mark this review as "comment" instead of "changes requested," in case the lack of better descriptions means you are ok with a merely superficial review, in which case don't mind me and go ahead and merge.
By the way, it seems issue 3. above is abandoned? I saw a question from @acud related to this in a previous version of this PR, but no conclusion:
type hasherStore struct { | ||
store ChunkStore | ||
tag *chunk.Tag | ||
toEncrypt bool | ||
doWait sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please comment.,
why do we need this now and not before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to trigger the singleton startWait(), which keeps track of results of all the storage go-routines. Please look at the latest push. I regressed when moving the code through several swarm branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I regressed when moving the code through several swarm branches.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I PRd my initial work to the previous repo. But when i moved to the new repo.. i regressed. :-)
storage/hasherstore.go
Outdated
|
||
"github.com/ethersphere/swarm/chunk" | ||
"github.com/ethersphere/swarm/storage/encryption" | ||
"golang.org/x/crypto/sha3" | ||
) | ||
|
||
const ( | ||
noOfStorageWorkers = 150 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data chunks are batched as 128 to form a tree chunk. If we want this storage to be in sync with the chunker speed, its better to have all 128 data chunks pushed using separate go-routines of their own. I added few more to take care of the Tree chunks. This will allow both the chunker and storage to move in steps of 128.
@@ -204,7 +204,7 @@ func (pc *PyramidChunker) decrementWorkerCount() { | |||
|
|||
func (pc *PyramidChunker) Split(ctx context.Context) (k Address, wait func(context.Context) error, err error) { | |||
pc.wg.Add(1) | |||
pc.prepareChunks(ctx, false) | |||
go pc.prepareChunks(ctx, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the introduction of concurrency here relevant to solving the bug, or is it unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is primarily for improving concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That much is obvious :)
prepareChunks
is not documented, so it's difficult to say what its total scope is. However, further down in this function you are blocking until whatever is started there finishes. Why and how, then, does introducing this new thread improve the performance? Is it merely in order to execute the next goroutine and enter the select earlier? Doesn't that really mean that the prepareChunks
in reality is a synchronous function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I cant remember why i added this :-)
@@ -539,6 +539,15 @@ func (pc *PyramidChunker) buildTree(isAppend bool, ent *TreeEntry, chunkWG *sync | |||
if lvlCount >= pc.branches { | |||
endLvl = lvl + 1 | |||
compress = true | |||
|
|||
// Move up the chunk level to see if there is any boundary wrapping | |||
for uprLvl := endLvl; uprLvl < pc.branches; uprLvl++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have no idea how this works.
What is endLvl
? Is a level a level in the tree? Why does it then start at 128? (chunkSize / hashSize
). How does pc.chunkLevel
relate to this?
I am not sure how it is possible for someone not familiar with this code to verify that these changes are sane and correct without a better description of the actual detailed issue and how it is solved. Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Level always refer to tree chunk levels. You can read the huge comment in the top of pyramid.go to understand it more. I am sorry for the way chunker is written. I will someday make it more readable for you :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the number of data chunks produced reaches 128, buildTree() is called. This creates a new TreeChunk and packs it's data with 128 pointers and other meta data. Finally it checks if the tree chunk created newly is the 128th chunk in that level, if it is, then it goes one level up, creates a new tree chunk and binds all the tree chunk one level below.
This check has to be done from the present level of the tree chunk to highest affected level (or endLvl). SO the loop you see from level to endLvl does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. Partly.
if lvlCount >= pc.branches {
Here lvlCount
is the amount of input data chunks that have not yet been "built" as a tree? You write:
When the number of data chunks produced reaches 128
Why is this then >=
and not ==
?
for uprLvl := endLvl; uprLvl < pc.branches; uprLvl++ {
What is the effect of using pc.branches
as the boundary for this loop?
By the way I don't know if it's related by tests are failing: https://travis-ci.org/ethersphere/swarm/jobs/540798086 |
It is related to |
@nolash Sorry for not replying you earlier.. Didn't notice your requests for documentation. Here it comes. If you feel this will help maintain the code, i can add it in code too. Personally, i would not bloat the code with so much text.
Problem: The go-routines spawned in the storeChunk() of "hasherstore.go" was not controlled. i.e. For every chunk a new go-routine was spawned leading to millions of go-routines when you upload a large file. Fix: This was fixed by making the no of go-routines have a max limit and having a queue to help in the back pressure. So a max of 150 routines will be spawned (128 for data chunks in a branch + few for the tree chunks). Every routine will pick a chunk from the worker queue and push it to the storage. When all the chunks are over, they wait for the close signal from Wait() routine and die. The old contents of the Wait() function was moved to a new function called startWait() so that it can independently collect the chunk stats. We tried to use tags for doing the chunk stats, but unit testing became too challenging as stats are incremented all over the storage code.
Problem: Whenever a file greater than 8.5Gb approx. was uploaded, then downloaded, the diffs would not match. This did not occur in smaller files. Fix: The buildTree() function in pyramid.go is called every-time the data chunks reaches 128 or the file is over. This function is responsible for creating the tree chunk which binds the created data chunks in to the rest of the merkel tree. The first loop in this function determines how far the tree has wrapped. For example: if the current branch is 127th, then by adding a new tree chunk, you not only have to add the data chunks to this new tree chunk... but also since this is a chunk boundary, you go one level up. In some cases, when you add a final chunk, you close many levels on the top. |
@jmozah thanks
You mean here if the data length of the levels modulo 128 is 127?
In fact this only happens when you have two levels of intermediate chunks or more, right?Especially in complex concepts like these we should take care to be explicit about such things when we describe them. |
@@ -238,7 +238,7 @@ func TestRandomData(t *testing.T) { | |||
// This test can validate files up to a relatively short length, as tree chunker slows down drastically. | |||
// Validation of longer files is done by TestLocalStoreAndRetrieve in swarm package. | |||
//sizes := []int{1, 60, 83, 179, 253, 1024, 4095, 4096, 4097, 8191, 8192, 8193, 12287, 12288, 12289, 524288, 524288 + 1, 524288 + 4097, 7 * 524288, 7*524288 + 1, 7*524288 + 4097} | |||
sizes := []int{1, 60, 83, 179, 253, 1024, 4095, 4097, 8191, 8192, 12288, 12289, 524288} | |||
sizes := []int{1, 60, 83, 179, 253, 1024, 4095, 4097, 8191, 8192, 12288, 12289, 524288, 2345678} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also passing fine on master
, not sure why we are adding another test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not affect the old code.This was added here to catch bugs in this PR.. We have 150 storage threads now...This size 2345678 will exhaust the 150 storage go-routines.. and if any one of them is not release the properly.. the test case will fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fair enough, it doesn't make the test suite much slower on Travis, so its fine.
@jmozah please add it in the commented line above as well - we don't run all these tests, because Travis is slow, but it is expected that if you touch this code, you run them manually locally.
This code is not changed often, so we decided this is an OK compromise to make.
I meant the no of tree chunks in the levels....
Not really. In this particular bug, it went three levels up. The image only shows tree chunks.. Data chunks are one level below the lowest tree level. Let's assume the green part of the tree is already existing. When a tree chunk is added atlevel 0 (red colour chunk), It triggers a closure on level1, level2 & level3 (orange chunks). Level4 will get one more chunk so a new level can be formed called level4 and the tree should get organised. The code for this orange chunks was already present.. but the loop used to go up until only 1 level. With this fix, it goes up multiple level and takes care of it. Hope this description is explicit enough and helps you understand the fix. |
@jmozah thanks for the description of the bugs this is fixing and for the PR in general. It is much clearer to me now, than before. It would have been nice if there is a test for bug 2, but I realise it might not be trivial. |
@jmozah ah yes three layers |
Yes. This is hard to test in a unit test. But a good candidate to add in smoke test / functional test if we have them. |
Yeah.. somewhere close to that number was the error happening. |
Fixes #1356
Before: Upload 100gb file fails
After: 100gb upload passes in a 2 node cluster.
supersedes #1357
===============
Problem: The go-routines spawned in the storeChunk() of "hasherstore.go" was not controlled. i.e. For every chunk a new go-routine was spawned leading to millions of go-routines when you upload a large file.
Fix: This was fixed by making the no of go-routines have a max limit and having a queue to help in the back pressure. So a max of 150 routines will be spawned (128 for data chunks in a branch + few for the tree chunks). Every routine will pick a chunk from the worker queue and push it to the storage. When all the chunks are over, they wait for the close signal from Wait() routine and die. The old contents of the Wait() function was moved to a new function called startWait() so that it can independently collect the chunk stats. We tried to use tags for doing the chunk stats, but unit testing became too challenging as stats are incremented all over the storage code.
=== ===============================================
Problem: Whenever a file greater than 8.5Gb approx. was uploaded, then downloaded, the diffs would not match. This did not occur in smaller files.
Fix: The buildTree() function in pyramid.go is called every-time the data chunks reaches 128 or the file is over. This function is responsible for creating the tree chunk which binds the created data chunks in to the rest of the merkel tree. The first loop in this function determines how far the tree has wrapped. For example: if the current branch is 127th, then by adding a new tree chunk, you not only have to add the data chunks to this new tree chunk... but also since this is a chunk boundary, you go one level up. In some cases, when you add a final chunk, you close many levels on the top.
Ex: If level 0,1,2,3 has 127 branches, by adding a final tree chunk in level 0, level 0,1,2,3 all level becomes closed. This was not properly handles in the code. Now, when every tree chunk is created, we go back up to 128th level to see if this addition is going to trigger a snowball effect on the rest of the top levels.