Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An example of streaming form data using http post request #2202

Closed
Raynos opened this issue Aug 1, 2023 · 34 comments · Fixed by #3086
Closed

An example of streaming form data using http post request #2202

Raynos opened this issue Aug 1, 2023 · 34 comments · Fixed by #3086
Labels
Docs Changes related to the documentation enhancement New feature or request

Comments

@Raynos
Copy link

Raynos commented Aug 1, 2023

This would solve...

I spend two hours with a coworker trying to figure out how to upload a file using multi part formdata encoding and gave up and loaded the entire buffer into memory as a blob using the web api

The implementation should look like...

An example and test demonstrating how to do this without loading the 200mb file into an in memory Buffer.

I have also considered...

Using mikeal/request from 2012.

Additional context

@Raynos Raynos added the enhancement New feature or request label Aug 1, 2023
@metcoder95 metcoder95 added the Docs Changes related to the documentation label Aug 1, 2023
@mcollina
Copy link
Member

mcollina commented Aug 1, 2023

@Raynos this is about fetch or undici.request()?

@climba03003
Copy link
Contributor

Scanning through the code, the current FormData, File and Blob API buffering the values on constructor internally.
By using those WebAPI, it doesn't seems to provide streaming feature.

@metcoder95
Copy link
Member

If using fetch, you can pass an instance of RedeableStream, although a Redeable.toWeb are experimental, a wrapper on top of the RedeableStream is possible.

I'm thinking something like this:

new ReadableStream({
    async pull(controller) {
      for await (let chunk of stream) {
		controller.enqueue(chunk)
	  };
    },
  });
}

Haven't tested, but should be doable

@mcollina
Copy link
Member

mcollina commented Aug 2, 2023

A part from that, maybe we might want to document & test how to do it.
I didn't see a "quick" way to do it in undici, so maybe we might want to add something for it there too.

@climba03003
Copy link
Contributor

If using fetch, you can pass an instance of RedeableStream, although a Redeable.toWeb are experimental, a wrapper on top of the RedeableStream is possible.

Using ReadableStream means we need to provide the header for each form-data parts and piping each file readable to it.
So, it is not a good solution.

Streaming should be supported by File or FormData natively.
But scanning through the Web API spec, it is actually telling you to buffer the byte sequence. I am a bit surprise for it.

I am not sure if File snapshot means to be capture the state and use it later when FormData actually consume it.
Web API spec should clearly provide an instruction for streaming input.

A part from that, maybe we might want to document & test how to do it.

One of the easiest way is use form-data package. It provides FormData that support readable stream.

const FormData = require('form-data')
const fs = require('fs')
const { fetch } = require('undici')

  const form = new FormData();
  form.append('field', 'foo');
  form.append('file', fs.createReadStream('./index.js'));

  const readable = new ReadableStream({
    async pull(controller) {
      return new Promise(function(resolve) {
        form.on('data', function(chunk) {
          controller.enqueue(chunk)
        })
        form.once('end', function() {
          resolve()
        })
        form.resume()
      })
    }
  })
  
  fetch('http://localhost:3000/form-data', {
    method: 'POST',
    headers: form.getHeaders(),
    body: readable,
    duplex: 'half'
  })
  .then((res) => res.text())
  .then(console.log)
 

@metcoder95
Copy link
Member

metcoder95 commented Aug 2, 2023

But scanning through the Web API spec, it is actually telling you to buffer the byte sequence. I am a bit surprise for it.

Not an expert on the spec, but going through it, it seems it aims to be lazy-enqueued, waiting until some implementation disturbs the stream. Although I'm sure that Blob and its subclasses (File) are somehow backed by files.

Beyond that, I believe this is currently possibly without relying on the experimental fs.readAsBlob that backs the given Blob with a file on disk.

@Raynos
Copy link
Author

Raynos commented Aug 2, 2023

So I ran into this issue with form-data being streams 1 and not streams 2 and ragequit.

I did not think of wrapping it in ReadableStream.

Either undici or request would help a lot.

We eventually used https://stackoverflow.com/a/75795888 which is an undocumented work around ...

For example mikeal request jjust documents how to use formdata in the readme and its great ( https://github.com/request/request#multipartform-data-multipart-form-uploads )

@Raynos
Copy link
Author

Raynos commented Aug 2, 2023

@climba03003
Copy link
Contributor

climba03003 commented Aug 3, 2023

Not an expert on the spec, but going through it, it seems it aims to be lazy-enqueued

It is the problem of constructor, it only accept something that already buffered inside memory. So, there is no way for a File or Blob to lazy consume the stream when needed.

image
image

@KhafraDev
Copy link
Member

KhafraDev commented Aug 3, 2023

The buffering in memory is why I generally recommend not using FormData (& .formData()) in node...

Although I'm sure that Blob and its subclasses (File) are somehow backed by files.

AFAICT the file-backed Blob implementation only takes effect when specifically creating a blob via fs.openAsBlob. Otherwise, it should still be an in-memory blob.

So, there is no way for a File or Blob to lazy consume the stream when needed.

I think this should work:

import { openAsBlob } from 'node:fs'

const blob = await openAsBlob('README.md')

const fd = new FormData()
fd.set('file', blob)

of course at some point it'll still be buffered in memory

or this

const blob1 = await openAsBlob('...')
const blob2 = await openAsBlob('...')

const file = new File([blob1, blob2], 'file-amalgamate.txt')

@metcoder95
Copy link
Member

So, there is no way for a File or Blob to lazy consume the stream when needed.

Good catch. Maybe some tricks can be done here as how it is applied to openAsBlob, but at the current state and following the spec, I agree this seems to be not possible

AFAICT the file-backed Blob implementation only takes effect when specifically creating a blob via fs.openAsBlob. Otherwise, it should still be an in-memory blob.

Sure! I was mostly wondering if maybe browsers do it in this way

@mcollina
Copy link
Member

mcollina commented Aug 3, 2023

I think there are two takes in this feedback:

  1. develop & document how to do multipart http request on top of undici.request
  2. ddevelop & document how to do multipart http request on top of fetch

As both provide a subpar experience in this case.

(This shows that most of the maintaienrs of this seldomly send multipart requests from a server).

@Raynos
Copy link
Author

Raynos commented Aug 3, 2023

The buffering in memory is why I generally recommend not using FormData (& .formData()) in node...

I am uploading audio files to the ACR audible fingerprinting external HTTP API.

I wish I didn't need to do formdata either, we looked at just taking their curl example from their documentation and spawning it as a child process because at least it doesnt buffer the entire audio file into memory ...

@Raynos
Copy link
Author

Raynos commented Aug 3, 2023

The other off hand comment about the confusion I have with the README is its own issue now #2208

@Raynos
Copy link
Author

Raynos commented Aug 3, 2023

The workaround I found exploits (

undici/lib/client.js

Lines 1475 to 1480 in 59abe3f

} else if (util.isBlobLike(body)) {
if (typeof body.stream === 'function') {
writeIterable({ body: body.stream(), client, request, socket, contentLength, header, expectsPayload })
} else {
writeBlob({ body, client, request, socket, contentLength, header, expectsPayload })
}
)

    const formData = new FormData()
    formData.append('user_id', 42)
    formData.set('audio', {
        [Symbol.toStringTag]: 'File',
        name: 'music.mp3',
        stream: () => createReadStream('./music.mp3'),
    })

    const response = await fetch('http://api.example.com', {
        method: 'POST',
        body: formData,
    })

The fact that the web blob api happens to return a stream, to stream over the in memory bytes allows us to stream over file IO or network IO instead.

@KhafraDev
Copy link
Member

KhafraDev commented Aug 6, 2023

does this example not work (haven't tried it out myself)?

import { openAsBlob } from 'node:fs'

const formData = new FormData()
formData.append('user_id', 42)
formData.set('audio', await openAsBlob('./music.mp3'))

const response = await fetch('http://api.example.com', {
    method: 'POST',
    body: formData,
})

@climba03003
Copy link
Contributor

climba03003 commented Aug 6, 2023

does this example not work (haven't tried it out myself)?

Yes, it can runs normally.
But logging the server received chunk indicate it only received one whole chunk of formdata.
I am not sure if that means the file still buffered by FormData.

Edit: checked by the memory usage, and a big file. It is actually streaming the data.

import { createServer } from 'http'
import { openAsBlob } from 'node:fs'

const http = createServer(function(req, res) {
  req.on('data', (chunk) => {
    res.write(chunk)
  })
  req.once('end', () => {
    res.end(() => {
      http.close()
    })
  })
}).listen(3000)

const formData = new FormData()
formData.append('field', 42)
formData.set('file', await openAsBlob('./index.mjs'))

const response = await fetch('http://127.0.0.1:3000', {
    method: 'POST',
    body: formData,
})
console.log(await response.text())

@Raynos
Copy link
Author

Raynos commented Aug 7, 2023

Synchronous stat operations on the file when the Blob is created, and before each read in order to detect whether the file data has been modified on disk.

The documentation for open as blob does not mention its streaming.

Also it's an experimental method, also it claims to do fs.statSync() in my event loop driven server.

Ok i tried to read the implementation for this sync fs stat but it goes deep into the C++ , a sync stat in C++ is very different then one in the JS event loop

@Raynos
Copy link
Author

Raynos commented Aug 7, 2023

Also this experimental method does not exist on nodejs 16 or 18

@eddienubes
Copy link
Contributor

Hi guys, just stumbled upon this issue and wonder how to proceed.
I think there're 2 ways to implement form data streaming with undici:

  • Use third-party dependency like form-data, which will compose all key/values pairs into a stream of proper formatted chunks, and then pass this stream to undici's body parameter. (I'm solely talking about .request() API here)
  • The other way around, and I think more desirable here, is to avoid resorting to other packagers, but rather to incorporate streaming feature directly into undici's own FormData.

Also, I'm not sure what's the philosophy here, does undici aim to work mostly with WebAPI of node, or it's open for node-specific expansions?

@mcollina
Copy link
Member

mcollina commented Feb 8, 2024

@KhafraDev wdyt? I'm ok in adding support for Node.js stream in our own FormData, as it makes a lot of sense.

In parallel, we should likely add support for FormData in undici.request().

@climba03003
Copy link
Contributor

climba03003 commented Feb 8, 2024

@mcollina FormData works properly with request.

import { createServer } from 'http'
import { openAsBlob } from 'node:fs'
import { text } from 'node:stream/consumers'
import { request } from 'undici'

const http = createServer(function(req, res) {
  req.on('data', (chunk) => {
    res.write(chunk)
  })
  req.once('end', () => {
    res.end(() => {
      http.close()
    })
  })
}).listen(3000)

const formData = new FormData()
formData.append('field', 42)
formData.set('file', await openAsBlob('./index.mjs'))

// const response = await fetch('http://127.0.0.1:3000', {
//     method: 'POST',
//     body: formData,
// })
// console.log(await response.text())

const response = await request('http://127.0.0.1:3000', {
  method: 'POST',
  body: formData
})
console.log(await text(response.body))

Reference:

undici/lib/core/request.js

Lines 177 to 187 in 7308f53

if (!extractBody) {
extractBody = require('../fetch/body.js').extractBody
}
const [bodyStream, contentType] = extractBody(body)
if (this.contentType == null) {
this.contentType = contentType
this.headers += `content-type: ${contentType}\r\n`
}
this.body = bodyStream.stream
this.contentLength = bodyStream.length

@mcollina
Copy link
Member

mcollina commented Feb 8, 2024

Then it's just actually something to document.

@eddienubes
Copy link
Contributor

eddienubes commented Feb 8, 2024

But openAsBlob is still experimental and available only in v19+.
Also, it returns a promise and not just a Blob. Not a big deal, yet...

@mcollina
Copy link
Member

mcollina commented Feb 8, 2024

The alternative is adding support for Node.js streams to our FormData.
Wdyt @KhafraDev ?

@KhafraDev
Copy link
Member

KhafraDev commented Feb 8, 2024

undici.request works with FormData already (oh I see this is already mentioned.. I added it a while ago). Not a fan of adding support for node streams in FormData.

@eddienubes
Copy link
Contributor

If there is a shortage of manpower, I can volunteer implementing it.

@KhafraDev
Copy link
Member

The issue is deviating from the spec

@mcollina
Copy link
Member

mcollina commented Feb 8, 2024

I think we should just document the use of:

@eddienubes
Copy link
Contributor

eddienubes commented Feb 8, 2024

Alright, so referring to my previous question:

Also, I'm not sure what's the philosophy here, does undici aim to work mostly with WebAPI of node, or it's open for node-specific expansions?

Does undici strictly follow WebAPI spec only?

Sorry if it's a wrong thread.

@KhafraDev
Copy link
Member

to the best of our ability

@mcollina
Copy link
Member

mcollina commented Feb 8, 2024

undici.request does not. However for all fetch/WHATWG APIs we do

@JaoodxD
Copy link
Contributor

JaoodxD commented Apr 4, 2024

I think we should just document the use of:

Seems like it works totaly fine even with native FormData.

import fs from 'node:fs'

const file = await fs.openAsBlob('./big.csv')
const body = new FormData()
body.set('file', file, 'big.csv')

const res = await fetch('http://example.com', {
  method: 'POST',
  body
})

I would like to mention it in docs.
Is it fine to add it right after Async Iterable as another example?

@mcollina
Copy link
Member

go for it!

JaoodxD added a commit to JaoodxD/undici that referenced this issue Apr 10, 2024
Based on climba03003 example from here nodejs#2202 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Changes related to the documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants