Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: file upload without buffering #1534

Merged
merged 4 commits into from
Aug 31, 2020
Merged

fix: file upload without buffering #1534

merged 4 commits into from
Aug 31, 2020

Conversation

Gozala
Copy link
Contributor

@Gozala Gozala commented Jul 1, 2020

Version 2

This pull request takes advantage of following changes

Please note that ipfs/js-ipfs#3184 is coming in ipfs-http-client@45.0.1 and to observe the fix you'd need to modify dependency in package.json as follows. I do not think we need to block this on release since minor version bump will be picked up automatically.

diff --git a/package.json b/package.json
index d87f56c..e129beb 100644
--- a/package.json
+++ b/package.json
@@ -47,7 +47,7 @@
     "ip": "^1.1.5",
     "ipfs-css": "^1.2.0",
     "ipfs-geoip": "^4.1.0",
-    "ipfs-http-client": "^45.0.0",
+    "ipfs-http-client": "next",
     "ipfs-provider": "^1.0.0",
     "ipld-explorer-components": "1.6.0",
     "is-binary": "^0.1.0",

Also note that while pull request introducing upload / download progress events is still pending, we don't need to block this because:

  1. Prior versions of ipfs-http-client just ignores those handlers.
  2. Progress handling is not in place anyway.

Therefor we can land this and once ipfs-http-client starts emitting upload / download progress events they will just start emitting FILES_WRITE_UPDATED actions.

Everything below is no longer reflects current state of the pull request, but keeping it around for history.

Version 1 (Retaining for historical)

This pull request explores a ways to upload files without unnecessary buffering. High level overview of changes below:

  • ipfs.add has a complex API that can take AsyncIterable of files with AsyncIterable contents. Optimize such API has several issues:

    1. Optimizing such API is very complex, as all the current input normalization would have to be thrown out of the window and each input type would require separate specialized code path. Simply put it would increase implementation complexity quite a bit.
    2. Even if optimizations are made (despite introduced complexity) it would be really easy to fall of happy path by mixing inputs that can't be optimized.

    For above reasons this draft pull in experimental ipfs-lite-http-client providing much more simplified ipfs.add functionality that only accepts inputs optimal for web. It is not here to stay, but here to make a case that ipfs.add API is inadequate for use cases like this.

  1. filesToStreams function used to turn DOM File into structure with contents of it represented as a stream. Which in turn than had to be buffered by an ipfs-http-client. This function is gone now and File instances are used instead. **However I'm not confident there is no code paths that assume old file like structures instead of File instances.

  2. I can successfully upload files with multiple gigs in size, it does take quite a bit time and I see no progress reports. ipfs-lite-http-client does report progress updates and I can see those being called however I can't seem to figure out where dispatch of FILES_WRITE_UPDATED leads to. I suspect handler of that action doesn't get what it expects and that is why progress doesn't get reported till the very end. Help here would be welcome.

might fix #1529

@Gozala Gozala marked this pull request as draft July 1, 2020 01:21
@Gozala
Copy link
Contributor Author

Gozala commented Jul 1, 2020

I can successfully upload files with multiple gigs in size, it does take quite a bit time and I see no progress reports. ipfs-lite-http-client does report progress updates and I can see those being called however I can't seem to figure out where dispatch of FILES_WRITE_UPDATED leads to. I suspect handler of that action doesn't get what it expects and that is why progress doesn't get reported till the very end. Help here would be welcome.

per @olizilla import / add button on the files view listened for FILES_WRITE_UPDATED events to show a progress bar-ish animation, but that no longer seems to be the case.

@rafaelramalho19 any idea what am I doing wrong in my pull request, that causes the import view not to show until upload is fully complete ?

@Gozala
Copy link
Contributor Author

Gozala commented Jul 1, 2020

Here is the average timing ~45 seconds, I'm observing when adding 2gig file, which feels a lot! Especially considering the fact there is absolutely no feedback in the UI that anything is happening. (Timing on webui.ipfs.io)

image

Edit: Turns out I was testing with a version that did not include my changes and timing looks significantly better ~19secs, although still not great.

image

@Gozala
Copy link
Contributor Author

Gozala commented Jul 1, 2020

Contrast with adding via CLI is massive extra 4317 seconds

time ipfs add 2g.bloat --quiet
QmTbKT35oQzKgR34LUHaPdtrFrphjxiQJLrz6QMyU4kQNC
ipfs add 2g.bloat --quiet  0.71s user 1.49s system 13% cpu 16.138 total

I do not exactly know what is the problem here, but clearly this needs investigation.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 1, 2020

Checked also with curl which takes around ~3secs, which is kind of throughput I would like to see with webUI.

time curl -X POST -F file=@2g.bloat http://127.0.0.1:5001/api/v0/add\?stream-channels\=true\&pin\=false\&progress\=true\&wrap-with-directory\=false
{"Name":"2g.bloat","Bytes":2147483648}
{"Name":"2g.bloat","Hash":"QmTbKT35oQzKgR34LUHaPdtrFrphjxiQJLrz6QMyU4kQNC","Size":"2147994442"}
curl -X POST -F file=@2g.bloat   0.60s user 2.92s system 12% cpu 28.167 total

@Gozala
Copy link
Contributor Author

Gozala commented Jul 1, 2020

Wrote a simple node server that just calculates number of bytes received and writes back chunks back
{ Bytes: bytes, Name: "2g.bloat" }

Which leads to upload times between 10s to 12s

image

With that 19s to do actual IPFS add doesn't seem all that bad. I also see from server output that browser is streaming data in chunks, that being said it is still unclear why it is taking this long.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 1, 2020

@Gozala
Copy link
Contributor Author

Gozala commented Jul 1, 2020

I did bit more investigation with my node server and it appears that

Chrome uploads 2gigs file in 108237 chunks (~19kb per chunk)
curl uploads same file in 32945 chunks (65kb per chunk)

This leads me believe that difference in chunking could lead to observed throughput difference. I am unaware if there is any way to control chunking of XHR upload, so I'm out of ideas.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 14, 2020

@rafaelramalho19 any idea what am I doing wrong in my pull request, that causes the import view not to show until upload is fully complete ?

Relevant pointer to #1495 from @rafaelramalho19

This was referenced Jul 24, 2020
@Gozala
Copy link
Contributor Author

Gozala commented Aug 10, 2020

Blocked on ipfs/js-ipfs-utils#54

@Gozala Gozala marked this pull request as ready for review August 10, 2020 21:15
@Gozala Gozala self-assigned this Aug 10, 2020
@Gozala
Copy link
Contributor Author

Gozala commented Aug 10, 2020

Something is not working, dropping a file seems to start spinning CPU and tab is unresponsive. I think there's non terminating recursion somewhere. 😭

Never mind. Turns out native form data patch has not made it to ipfs-http-client@45.0.0 and what I observed was buffering loop which on 3gig file looked like non terminating recursion (because of generator polyfills).

With the following change things work as expected (matching previously reported numbers)

diff --git a/package.json b/package.json
index d87f56c..e129beb 100644
--- a/package.json
+++ b/package.json
@@ -47,7 +47,7 @@
     "ip": "^1.1.5",
     "ipfs-css": "^1.2.0",
     "ipfs-geoip": "^4.1.0",
-    "ipfs-http-client": "^45.0.0",
+    "ipfs-http-client": "next",
     "ipfs-provider": "^1.0.0",
     "ipld-explorer-components": "1.6.0",
     "is-binary": "^0.1.0",

@lidel lidel added this to the v2.11 milestone Aug 11, 2020
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@rafaelramalho19 mind takign a look and if ok merging this?

CI did not execute because this is an old PR Irakli did from a fork, but I've run E2E tests locally and confirmed it is all green.

It would be nice to add server timing API into IPFS HTTP API.

Ack, created ipfs/in-web-browsers#167 to discuss that further

Comment on lines +287 to +288
// Just estimate download size to be around 10% of upload size.
const downloadSize = uploadSize * 10 / 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: what does "download" mean in this context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to the response body estimate. Which would translate to progressbar speeding up towards the end.

@lidel lidel changed the title File upload without buffering fix: file upload without buffering Aug 12, 2020
@rafaelramalho19
Copy link
Contributor

Looks good! I can finally upload 10gb files on both MacOS and Windows 🙏

After this gets unblocked feel free to merge 😄

@Gozala
Copy link
Contributor Author

Gozala commented Aug 12, 2020

After this gets unblocked feel free to merge 😄

Can you please elaborate what you mean ? We can land this now and once new ipfs-http-client is published things would actually stop buffering. Alternatively we could wait for ipfs-http-client to be published first and then bump dependency version. However since it's going to minor version update I don't think it's worth waiting.

@lidel
Copy link
Member

lidel commented Aug 31, 2020

@Gozala was new ipfs-http-client published with the changes we were waitig for?
If so, can you switch this PR to it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve file add so it can handle gigs of data
3 participants