Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

node.get(hash) and its async iterator hang when data is over 1.5MB #2814

Closed
sebastiendan opened this issue Mar 4, 2020 · 11 comments
Closed

Comments

@sebastiendan
Copy link

sebastiendan commented Mar 4, 2020

Hey,

I'm working on a web app for a user to challenge the performances of IPFS (only js-ipfs for now).
On start the app spawns two local nodes (nodeA and nodeB).

The user selects a buffer size and then triggers an operation that will in Node create and start a sync queue of the following steps:

  • fill the buffer with n times the current date (to get content that is unique on the network)
  • nodeA writes the buffer (nodeA.add(buffer)) and return the operation time
  • nodeB reads the buffer (nodeB.get(cid)) and return the operation time
  • a third remote node is reached through a custom HTTP API, reads the buffer and return the operation time

I have noticed that any buffer over about 1.5MB is blocking the queue on the nodeB.get step.

Expanding the get code (meaning not using of it-all and it-concat), I get this:

public async get(hash: string) {
    for await (const file of this._node.get(hash)) {
      const content = []

      for await (const chunk of file.content) {
        console.log('chunk:\n', chunk.length)
        content.push(chunk.length)
      }
    }
  }

With this console.log I can see the second for...await loop always stops after 5 iterations.

In short, here's my tests results:

Buffer Size (in MB) Working? Number of chunks Size of last chunk (in bytes)
1 Yes 4 213568
1.5 Yes 5 189280
1.57 Yes 5 259280
1.58 (and above) No 5 262144

262144B is the maximum size of the chunk (the size of all the chunks received before the last one) so I find it intriguing that the get request is blocked after 5 full streamed chunks.

I did a parallel test consisting of manually triggering the reads on the remote node (meaning nodeA writes, nodeB tries to read and hangs, and in the meantime I reach the remote node and read from there):

The first request hangs. Stopping it and requesting again works fine.

Unfortunately I cannot measure the read performance in this situation as the content has been partially (if not mostly) downloaded on the node during the broken first request (I assume that since the read time is way shorter than other non-breaking tests results I have).

Any idea why the node.get hangs there? Any advice for further investigation?
(FYI go-ipfs has no issue downloading my >1.5MB data, even on the first try)

Edit: here is a light early version of the project

@achingbrain
Copy link
Member

Could you share a runnable example of this problem please, e.g a git repo with the project and some instructions on how to run it?

There are tests in the interop suite that transfer much larger files between js and go, js and js, etc that work so the problem may not be 100% obvious.

@sebastiendan
Copy link
Author

@achingbrain I dropped a light early version on my github profile
https://github.com/sebastiendan/ipfs-perfs

@achingbrain
Copy link
Member

Great, that's really helpful - thanks. I think you've uncovered a race condition in js-ipfs-bitswap. Just writing up a PR with a fix.

@achingbrain
Copy link
Member

This app is really nice, by the way. I noticed a few things that could be improved and have opened a PR - sebastiendan/ipfs-perfs#1

@sebastiendan
Copy link
Author

sebastiendan commented Mar 5, 2020

Thanks for the PR, I'll give it a try today.

My main project is to have two users exchange data over the IPFS network through our software stack (we'll integrate the ipfs client in our stack), or have a single user store data for himself.

How close would you say your proposed changes would fit this non-testing use case? I don't want to go too far in tweaking the configuration to make the performances better while moving away from normal usage.

@sebastiendan
Copy link
Author

I saw your second PR about UnhandledPromiseRejections, I couldn't find the cause of them in my code.

@sebastiendan
Copy link
Author

Also a side note: below is the results I get for a 1MB buffer.

~1.6s for getting a 1MB buffer shows quite low performance, doesn't it?

Screen Shot 2020-03-05 at 11 47 35

@sebastiendan
Copy link
Author

One more thing I just noticed: I wanted to test the same operations through the HTTP API of the nodes instead of using their node instance from the js client, but I cannot reach either the API or the gateway locally (trying to reach the addresses specified in config.Addresses).

Any reason why?

@achingbrain
Copy link
Member

How close would you say your proposed changes would fit this non-testing use case?

If you want to transfer data over the public network, you'll need to connect to the bootstrap nodes to enable joining the network and probably enable preloading on the ipfs.add call. Pinning is not strictly necessary, unless you plan to run ipfs.repo.gc.

I wanted to test the same operations through the HTTP API

If you're going to swap between different implementations in your tests, you should look into using ipfsd-ctl as it will simplify things for you.

I cannot reach either the API or the gateway

You are running an in-process node which does not start the HTTP API or the gateway. To see how js-IPFS does this, see the daemon CLI command.

An easier way to get an HTTP API running is to use ipfsd-ctl as suggested above with the js type. Note that this will spawn js-IPFS in a separate process.

@sebastiendan
Copy link
Author

This project is exactly what I need indeed, however I cannot make it work for now (created an issue, perhaps related to your own PR from last month).

@jacobheun
Copy link
Contributor

Closing due to inactivity, please reopen if this is still a problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants