Skip to content
This repository has been archived by the owner. It is now read-only.

Is 100gb sequential writes with a seeded non-ipfs host on 80 servers pulling posible? #105

Closed
gerrickw opened this issue Mar 30, 2016 · 4 comments
Labels

Comments

@gerrickw
Copy link

gerrickw commented Mar 30, 2016

Attempting to see if my usecase is possible, methods to make it possible, or if this would be a bad solution. This question will be multiple, but towards the same goal.

Usecase: I would have 80 servers attempting to download a 100gb file as quickly as possible in an intranet environment.

Questions:

  1. While downloading is it possible to force sequential writes for spinning disks with a 1gb memory limit (so not a large amount of room for disk cache)? As the disks are not ssds, sequential downloads will help IO on the disks.
  2. Is a 80 server swarm possible where each server is pulling from any of the 80 servers if they have already downloaded that chuck of data from a single seed? So all 80 servers start downloading at once, I would hope they would spread out their download load between servers.
  3. Is it possible for the 80 servers to start the download from a seed that does not have ipfs installed? Such as if it is hosted on http, https, or a hadoop connection? Or if some metadata files such as checksums, blocks, etc can be generated beforehand.
  4. If number 3 is not possible, can a host add a file to ipfs and before it is finished adding can others start fetching?

Thanks for any help people can supply -- even answering a single questions will be helpful -- Thanks

@whyrusleeping
Copy link

whyrusleeping commented Mar 30, 2016

  1. I'm not super certain what youre meaning by sequential writes. Do you mean fetch each block in order?
  2. Yes, that is possible and how things should happen. Currently bitswap isnt the smartest protocol so there may be some wasted bandwidth going around (nodes receiving the same block from multiple other peers). But moving forward with better bitswap strategies will improve this dramatically.
  3. No. content requested through ipfs needs to exist in ipfs before (or after) its requested. Ipfs cant pull content from an external source automatically. (yet?)
  4. Currently we cannot do 'streaming' adds, since the ipfs datastructure is a merkledag, you need the entire structure before you can know the hash of its root. you could however manage to stream each chunk hash out of band to the other servers to start the fetch. This functionality is planned, but not yet implemented.

@gerrickw
Copy link
Author

gerrickw commented Mar 30, 2016

  1. I'm not super certain what youre meaning by sequential writes. Do you mean fetch each block in order?

Correct, or at least somewhat in order and once enough data downloaded in order for a write operation to occur. This would be better optimized for spinning media than jumping sectors. I suppose this would also be similar while streaming video.

  1. No. content requested through ipfs needs to exist in ipfs before (or after) its requested. Ipfs cant pull content from an external source automatically. (yet?)

Seems like this would be an easy way in an Internet environment for a user to completely move over to using ipfs standard if it supported a http protocol. Example: Start local daemon, browse websites, daemon notices it does not have some cdn content and start downloading and then serve to the user normally. Others in your swarm could then pull from your daemon of that cdn content.

  1. Currently we cannot do 'streaming' adds, since the ipfs datastructure is a merkledag, you need the entire structure before you can know the hash of its root. you could however manage to stream each chunk hash out of band to the other servers to start the fetch. This functionality is planned, but not yet implemented.
    

Oh interesting, is there a github ticket I could follow for this functionality?

Thanks for your help

@madavieb
Copy link

madavieb commented May 23, 2017

@NatoBoram
Copy link

NatoBoram commented Nov 22, 2018

You can add files from an URL to IPFS via urlstore.

USAGE
  ipfs urlstore add <url> - Add URL via urlstore.

SYNOPSIS
  ipfs urlstore add [--trickle | -t] [--] <url>

ARGUMENTS

  <url> - URL to add to IPFS

OPTIONS

  -t, --trickle bool - Use trickle-dag format for dag generation.

DESCRIPTION

  Add URLs to ipfs without storing the data locally.
  
  The URL provided must be stable and ideally on a web server under your
  control.
  
  The file is added using raw-leaves but otherwise using the default
  settings for 'ipfs add'.
  
  The file is not pinned, so this command should be followed by an 'ipfs
  pin add'.
  
  This command is considered temporary until a better solution can be
  found.  It may disappear or the semantics can change at any
  time.
Merkle-dag
$ ipfs urlstore add https://discordapp.com/assets/9c38ca7c8efaed0c58149217515ea19f.png
zb2rhmKRyibA2tfdE1cFzcLSnhhXpUKMrZcaYPkq92ZB8rSrh

zb2rhmKRyibA2tfdE1cFzcLSnhhXpUKMrZcaYPkq92ZB8rSrh

Trickle-dag
ipfs urlstore add -t https://discordapp.com/assets/9c38ca7c8efaed0c58149217515ea19f.png
zdj7WcqTgCmCLRajzvT3HeS9jUUMDoS4zz5UAF6vVu9ceYtEd

zdj7WcqTgCmCLRajzvT3HeS9jUUMDoS4zz5UAF6vVu9ceYtEd

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants