Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

swarm/storage: Support for uploading 100gb files #1395

Merged
merged 23 commits into from
Jun 18, 2019
Merged

swarm/storage: Support for uploading 100gb files #1395

merged 23 commits into from
Jun 18, 2019

Conversation

jmozah
Copy link
Collaborator

@jmozah jmozah commented May 14, 2019

Fixes #1356

Before: Upload 100gb file fails
After: 100gb upload passes in a 2 node cluster.

supersedes #1357

  1. Go routine leaks:
    ===============

Problem: The go-routines spawned in the storeChunk() of "hasherstore.go" was not controlled. i.e. For every chunk a new go-routine was spawned leading to millions of go-routines when you upload a large file.

Fix: This was fixed by making the no of go-routines have a max limit and having a queue to help in the back pressure. So a max of 150 routines will be spawned (128 for data chunks in a branch + few for the tree chunks). Every routine will pick a chunk from the worker queue and push it to the storage. When all the chunks are over, they wait for the close signal from Wait() routine and die. The old contents of the Wait() function was moved to a new function called startWait() so that it can independently collect the chunk stats. We tried to use tags for doing the chunk stats, but unit testing became too challenging as stats are incremented all over the storage code.

  1. Chunk corruption when uploading file greater than 8.5gb in size
    === ===============================================

Problem: Whenever a file greater than 8.5Gb approx. was uploaded, then downloaded, the diffs would not match. This did not occur in smaller files.

Fix: The buildTree() function in pyramid.go is called every-time the data chunks reaches 128 or the file is over. This function is responsible for creating the tree chunk which binds the created data chunks in to the rest of the merkel tree. The first loop in this function determines how far the tree has wrapped. For example: if the current branch is 127th, then by adding a new tree chunk, you not only have to add the data chunks to this new tree chunk... but also since this is a chunk boundary, you go one level up. In some cases, when you add a final chunk, you close many levels on the top.
Ex: If level 0,1,2,3 has 127 branches, by adding a final tree chunk in level 0, level 0,1,2,3 all level becomes closed. This was not properly handles in the code. Now, when every tree chunk is created, we go back up to 128th level to see if this addition is going to trigger a snowball effect on the rest of the top levels.

swarm/swarm.go Outdated Show resolved Hide resolved
swarm/storage/localstore/localstore.go Outdated Show resolved Hide resolved
swarm/storage/hasherstore.go Outdated Show resolved Hide resolved
swarm/storage/hasherstore.go Outdated Show resolved Hide resolved
swarm/storage/hasherstore.go Outdated Show resolved Hide resolved
swarm/storage/hasherstore.go Outdated Show resolved Hide resolved
@acud
Copy link
Member

acud commented May 14, 2019

nonsense
nonsense previously approved these changes May 15, 2019
Copy link
Contributor

@nonsense nonsense left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Number of goroutines is indeed lower with this change.

I'd be nice to improve documentation of this module.

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there, few more minor things @jmozah <3

swarm/storage/hasherstore.go Outdated Show resolved Hide resolved
swarm/storage/hasherstore.go Outdated Show resolved Hide resolved
swarm/storage/hasherstore.go Outdated Show resolved Hide resolved
swarm/storage/hasherstore.go Outdated Show resolved Hide resolved
acud
acud previously approved these changes Jun 3, 2019
Copy link
Contributor

@nolash nolash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1356 you mention:

  1. Go routine leaks
  2. Chunk corruption when uploading file greater than 8.5gb in size
  3. DBCapacity lower than uploaded file size issue

It would be nice to have a better description of what exactly in the implementation causes the problems in 1. and 2. Being unfamiliar with this code, I don't know how to perform anything but a very superficial review. I've requested more detailed information for you twice already, but can't say I see much of an improvement. I'm not going to ask a third time. So I'll mark this review as "comment" instead of "changes requested," in case the lack of better descriptions means you are ok with a merely superficial review, in which case don't mind me and go ahead and merge.

By the way, it seems issue 3. above is abandoned? I saw a question from @acud related to this in a previous version of this PR, but no conclusion:

#1357 (comment)

type hasherStore struct {
store ChunkStore
tag *chunk.Tag
toEncrypt bool
doWait sync.Once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment.,

why do we need this now and not before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to trigger the singleton startWait(), which keeps track of results of all the storage go-routines. Please look at the latest push. I regressed when moving the code through several swarm branches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regressed when moving the code through several swarm branches.

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I PRd my initial work to the previous repo. But when i moved to the new repo.. i regressed. :-)


"github.com/ethersphere/swarm/chunk"
"github.com/ethersphere/swarm/storage/encryption"
"golang.org/x/crypto/sha3"
)

const (
noOfStorageWorkers = 150
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this number?

Copy link
Collaborator Author

@jmozah jmozah Jun 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data chunks are batched as 128 to form a tree chunk. If we want this storage to be in sync with the chunker speed, its better to have all 128 data chunks pushed using separate go-routines of their own. I added few more to take care of the Tree chunks. This will allow both the chunker and storage to move in steps of 128.

@@ -204,7 +204,7 @@ func (pc *PyramidChunker) decrementWorkerCount() {

func (pc *PyramidChunker) Split(ctx context.Context) (k Address, wait func(context.Context) error, err error) {
pc.wg.Add(1)
pc.prepareChunks(ctx, false)
go pc.prepareChunks(ctx, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the introduction of concurrency here relevant to solving the bug, or is it unrelated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is primarily for improving concurrency.

Copy link
Contributor

@nolash nolash Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That much is obvious :)

prepareChunks is not documented, so it's difficult to say what its total scope is. However, further down in this function you are blocking until whatever is started there finishes. Why and how, then, does introducing this new thread improve the performance? Is it merely in order to execute the next goroutine and enter the select earlier? Doesn't that really mean that the prepareChunks in reality is a synchronous function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I cant remember why i added this :-)

@@ -539,6 +539,15 @@ func (pc *PyramidChunker) buildTree(isAppend bool, ent *TreeEntry, chunkWG *sync
if lvlCount >= pc.branches {
endLvl = lvl + 1
compress = true

// Move up the chunk level to see if there is any boundary wrapping
for uprLvl := endLvl; uprLvl < pc.branches; uprLvl++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have no idea how this works.

What is endLvl? Is a level a level in the tree? Why does it then start at 128? (chunkSize / hashSize). How does pc.chunkLevel relate to this?

I am not sure how it is possible for someone not familiar with this code to verify that these changes are sane and correct without a better description of the actual detailed issue and how it is solved. Sorry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Level always refer to tree chunk levels. You can read the huge comment in the top of pyramid.go to understand it more. I am sorry for the way chunker is written. I will someday make it more readable for you :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the number of data chunks produced reaches 128, buildTree() is called. This creates a new TreeChunk and packs it's data with 128 pointers and other meta data. Finally it checks if the tree chunk created newly is the 128th chunk in that level, if it is, then it goes one level up, creates a new tree chunk and binds all the tree chunk one level below.

This check has to be done from the present level of the tree chunk to highest affected level (or endLvl). SO the loop you see from level to endLvl does that.

Copy link
Contributor

@nolash nolash Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. Partly.

if lvlCount >= pc.branches {

Here lvlCount is the amount of input data chunks that have not yet been "built" as a tree? You write:

When the number of data chunks produced reaches 128

Why is this then >= and not ==?

for uprLvl := endLvl; uprLvl < pc.branches; uprLvl++ { 

What is the effect of using pc.branches as the boundary for this loop?

@nolash
Copy link
Contributor

nolash commented Jun 7, 2019

By the way I don't know if it's related by tests are failing: https://travis-ci.org/ethersphere/swarm/jobs/540798086

@acud
Copy link
Member

acud commented Jun 7, 2019

By the way I don't know if it's related by tests are failing: https://travis-ci.org/ethersphere/swarm/jobs/540798086

It is related to tags. I'll follow up with zahoor on this.

@acud acud assigned jmozah and acud Jun 10, 2019
@jmozah
Copy link
Collaborator Author

jmozah commented Jun 11, 2019

@nolash Sorry for not replying you earlier.. Didn't notice your requests for documentation. Here it comes. If you feel this will help maintain the code, i can add it in code too. Personally, i would not bloat the code with so much text.

  1. Go routine leaks

Problem: The go-routines spawned in the storeChunk() of "hasherstore.go" was not controlled. i.e. For every chunk a new go-routine was spawned leading to millions of go-routines when you upload a large file.

Fix: This was fixed by making the no of go-routines have a max limit and having a queue to help in the back pressure. So a max of 150 routines will be spawned (128 for data chunks in a branch + few for the tree chunks). Every routine will pick a chunk from the worker queue and push it to the storage. When all the chunks are over, they wait for the close signal from Wait() routine and die. The old contents of the Wait() function was moved to a new function called startWait() so that it can independently collect the chunk stats. We tried to use tags for doing the chunk stats, but unit testing became too challenging as stats are incremented all over the storage code.

  1. Chunk corruption when uploading file greater than 8.5gb in size

Problem: Whenever a file greater than 8.5Gb approx. was uploaded, then downloaded, the diffs would not match. This did not occur in smaller files.

Fix: The buildTree() function in pyramid.go is called every-time the data chunks reaches 128 or the file is over. This function is responsible for creating the tree chunk which binds the created data chunks in to the rest of the merkel tree. The first loop in this function determines how far the tree has wrapped. For example: if the current branch is 127th, then by adding a new tree chunk, you not only have to add the data chunks to this new tree chunk... but also since this is a chunk boundary, you go one level up. In some cases, when you add a final chunk, you close many levels on the top.
Ex: If level 0,1,2,3 has 127 branches, by adding a final tree chunk in level 0, level 0,1,2,3 all level becomes closed. This was not properly handled in the code. Now, when every tree chunk is created, we go back up to 128th level to see if this addition is going to trigger a snowball effect on the rest of the top levels.

@acud acud dismissed stale reviews from nonsense and themself via 4c2cfe7 June 12, 2019 19:16
@nolash
Copy link
Contributor

nolash commented Jun 13, 2019

@jmozah thanks

If level 0,1,2,3 has 127 branches

You mean here if the data length of the levels modulo 128 is 127?

In some cases, when you add a final chunk, you close many levels on the top.

In fact this only happens when you have two levels of intermediate chunks or more, right?Especially in complex concepts like these we should take care to be explicit about such things when we describe them.

@nonsense nonsense added this to the 0.4.2 milestone Jun 14, 2019
@acud acud requested a review from zelig June 17, 2019 09:11
@@ -238,7 +238,7 @@ func TestRandomData(t *testing.T) {
// This test can validate files up to a relatively short length, as tree chunker slows down drastically.
// Validation of longer files is done by TestLocalStoreAndRetrieve in swarm package.
//sizes := []int{1, 60, 83, 179, 253, 1024, 4095, 4096, 4097, 8191, 8192, 8193, 12287, 12288, 12289, 524288, 524288 + 1, 524288 + 4097, 7 * 524288, 7*524288 + 1, 7*524288 + 4097}
sizes := []int{1, 60, 83, 179, 253, 1024, 4095, 4097, 8191, 8192, 12288, 12289, 524288}
sizes := []int{1, 60, 83, 179, 253, 1024, 4095, 4097, 8191, 8192, 12288, 12289, 524288, 2345678}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also passing fine on master, not sure why we are adding another test case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not affect the old code.This was added here to catch bugs in this PR.. We have 150 storage threads now...This size 2345678 will exhaust the 150 storage go-routines.. and if any one of them is not release the properly.. the test case will fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fair enough, it doesn't make the test suite much slower on Travis, so its fine.

@jmozah please add it in the commented line above as well - we don't run all these tests, because Travis is slow, but it is expected that if you touch this code, you run them manually locally.

This code is not changed often, so we decided this is an OK compromise to make.

@jmozah
Copy link
Collaborator Author

jmozah commented Jun 17, 2019

@jmozah thanks

If level 0,1,2,3 has 127 branches

You mean here if the data length of the levels modulo 128 is 127?

I meant the no of tree chunks in the levels....

In some cases, when you add a final chunk, you close many levels on the top.

In fact this only happens when you have two levels of intermediate chunks or more, right?Especially in complex concepts like these we should take care to be explicit about such things when we describe them.

Not really. In this particular bug, it went three levels up.
In the example below,

Screenshot 2019-06-17 at 5 13 42 PM

The image only shows tree chunks.. Data chunks are one level below the lowest tree level.

Let's assume the green part of the tree is already existing. When a tree chunk is added atlevel 0 (red colour chunk), It triggers a closure on level1, level2 & level3 (orange chunks). Level4 will get one more chunk so a new level can be formed called level4 and the tree should get organised.

The code for this orange chunks was already present.. but the loop used to go up until only 1 level. With this fix, it goes up multiple level and takes care of it.

Hope this description is explicit enough and helps you understand the fix.

@nonsense
Copy link
Contributor

@jmozah thanks for the description of the bugs this is fixing and for the PR in general. It is much clearer to me now, than before.

It would have been nice if there is a test for bug 2, but I realise it might not be trivial.

@nolash
Copy link
Contributor

nolash commented Jun 17, 2019

@jmozah ah yes three layers 4096*(128^3) = ~8GB of course sorry

@jmozah
Copy link
Collaborator Author

jmozah commented Jun 17, 2019

It would have been nice if there is a test for bug 2, but I realise it might not be trivial.

Yes. This is hard to test in a unit test. But a good candidate to add in smoke test / functional test if we have them.

@jmozah
Copy link
Collaborator Author

jmozah commented Jun 17, 2019

@jmozah ah yes three layers 4096*(128^3) = ~8GB of course sorry

Yeah.. somewhere close to that number was the error happening.

@acud acud merged commit f57d4f0 into ethersphere:master Jun 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for uploading 100gb files
5 participants