Skip to content
This repository has been archived by the owner on Oct 30, 2018. It is now read-only.

Reliability Issues Overview #398

Closed
14 of 30 tasks
braydonf opened this issue Mar 29, 2017 · 8 comments
Closed
14 of 30 tasks

Reliability Issues Overview #398

braydonf opened this issue Mar 29, 2017 · 8 comments
Milestone

Comments

@braydonf
Copy link
Contributor

braydonf commented Mar 29, 2017

Uploads

  • Issues with timeouts for storage offers
    • People can write scripts that upload many tiny (and large) files at the same time that will cause network congestion for offers (DONE Rate limiter testing #410)
    • People with freeTier set to false do not have any rate limiting (DONE Rate limiter testing #410)
    • Current release of the GUI is unstable and may go offline (DONE [WIP] v5.0.0 storjshare-gui#483)
    • Sending of PUBLISH messages doesn't scale, need to make contracts in advance fill up smaller portions of the contract. Farmers would establish contract with the bridge as part of a startup. (See Publish/offer message propagation timeouts core#693)
    • Larger shards fail to upload to farmers (can be solved by partial data pushes for resuming) (???)
    • Larger shards don't get many offers, because of the max size of bucket in kfs at 1/256 of total available space in kfs
    • More nodes are needed to increase capacity and to make routing more efficient (Erasure encoding in increase the capacity of the network by being more efficient with mirroring. Efforts here should still be helpful though.)
      • Farmers are reluctant to join network because they want more reliable prediction for the return of investment (???)
  • The original farmer that receives the data may disappear right after the data was uploaded (DONE Implement reed solomon encoding libstorj#274)
  • Some transfers may be slow during initial upload (improve selection of offers to more reliable and faster nodes) (DONE Configure minimum speed threshold for transfers libstorj#312)
  • Sometimes there are not enough mirrors created for a shard on upload, and there are isn't any process that will increase the mirror number Process that monitors established mirror counts per shard #383 (DONE not an issue w/ Implement reed solomon encoding libstorj#274)

Metadata

Downloads

Note: This will be edited to keep track of the current status.

@braydonf braydonf added this to the v6.0.0 milestone Mar 29, 2017
@littleskunk
Copy link
Collaborator

littleskunk commented Mar 30, 2017

More nodes are needed to increase capacity and to make routing more efficient

I don't agree on that. We don't need more farmer. The problem is that the bridge will only accept the first X OFFER and drop all other OFFER. The bridge prefers farmer with a low ping. Even if we increase the number of farmer the bridge will still prefer a tiny low ping group.

Lets use a simple example to explain it. 100 farmer close to the bridge (low ping) are sending OFFER. 500 farmer far away from the pridge (high ping) are sending OFFER as well but they are always to late and will never get a free slot on the mirror queue. This will work as long as the 100 farmer close to the bridge have free space to create all 5 mirrors. Once they are full the bridge will get a problem. 100 queue mirrors but not one of them has the free space to create a new mirror. At the end increased chance for lost shards.

At the same time the 100 low ping farmer have to manage high up and download traffic. At the end slow up and download speed while the resources of all other farmer are unused.

See #344

@44203
Copy link
Contributor

44203 commented Mar 30, 2017

More nodes are needed to increase capacity and to make routing more efficient

I don't agree on that. We don't need more farmer.

This is a general statement, not necessarily a bridge-specific problem. You might not agree, but I'd assert that the statement is correct. At the current network scale, without user rate-limiting, a single user can DoS the network by uploading thousands of small files in parallel. This is because network chatter is a function of demand for shard storage and not size of any given shard. The demand must be met with more participation (not capacity) to keep the network available for others. More nodes means that less of the network has to relay the contract publication, because the neighborhoods are denser.

In short, we always need more farmers - a bigger network increases both reliability and security.

@braydonf
Copy link
Contributor Author

I ran into an issue today in a local small testing network with 4 farmers each with 100GB allocated, however had issues with uploading a 4GB file. It would seem that 4 farmers should be able to handle it, with all the mirrors it would still only be 24GB, more digging here may be needed.

@littleskunk
Copy link
Collaborator

@braydonf

It would seem that 4 farmers should be able to handle it

Each farmer will send one OFFER. That way you can get 1 shard + 3 mirrors. For all 5 mirrors you would need at least 6 farmer.

@44203
Copy link
Contributor

44203 commented Mar 31, 2017

@braydonf i think that's probably related to the default maxOfferConcurrency. with only 4 farmers, I'd imagine the rate at which the contracts go out could exceed this value and end up with farmers refusing to send an offer.

@braydonf
Copy link
Contributor Author

braydonf commented Apr 20, 2017

Bumped up the maxOfferConcurrency to 16, decreased the mirror count to 0 in bridge, increased the farmers from 4 to 16, increased the available space for each from 5GB to 256GB and the issue went away, and offers were made again. It ended up being the last change to the allocated space that made the difference.

@littleskunk
Copy link
Collaborator

I uploaded a 64Byte file today. Only 1 mirror was created.

libstorj-1.0.0-rc2
info	
title	"Storj Bridge"
version	"5.12.1"
description	"Access the Storj network using a simple REST API."
x-protocol-version	"1.1.0"
x-core-version	"6.4.2"
storj --debug list-mirrors cb1d4eed725887b892cdf3d4 b16f24adb682540b8caafa49
Unlock passphrase:
Established
-----------
Shard: 0
Hash: 0c639ea9abfab009fb7f042e6c7d8c4bddc50108
        storj://realshels.hopto.org:4232/0c8941dc15c438f636648ebad26514b08113b5ac

Available
---------
Shard: 0

@braydonf
Copy link
Contributor Author

Closing and breaking out into specific issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants