Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multipart parsing #974

Closed
ronag opened this issue Aug 17, 2021 · 45 comments · Fixed by #1606
Closed

Implement multipart parsing #974

ronag opened this issue Aug 17, 2021 · 45 comments · Fixed by #1606
Labels
enhancement New feature or request fetch Status: help-wanted This issue/pr is open for contributions
Milestone

Comments

@ronag
Copy link
Member

ronag commented Aug 17, 2021

Our extractBody method is missing support for parsing multipart/form-data responses.

@ronag ronag added enhancement New feature or request fetch Status: help-wanted This issue/pr is open for contributions labels Aug 17, 2021
@ronag
Copy link
Member Author

ronag commented Aug 17, 2021

I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.

@octet-stream
Copy link
Contributor

Hey, it seems like node-fetch is going to have multipart parser soon, if this PR gets merged. Maybe this would be helpful for you somehow?

@jimmywarting
Copy link
Contributor

jimmywarting commented Oct 15, 2021

I think it seems useful, i mean https://www.npmjs.com/package/form-data has 4.5M download / week, + it's a spec'ed thing

I have long been wanting to have a formData method in node-fetch

@MrWaip
Copy link

MrWaip commented Dec 22, 2021

Hello. Telegram bot api accept only FormData for files
https://core.telegram.org/bots/api#inputfile

@blittle
Copy link

blittle commented Jan 19, 2022

I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.

Correct me if I'm wrong, but isn't the bigger use case the server parsing multipart form data from the client? The example from the node-fetch PR iml is:

import { Response } from 'node-fetch'

http.createServer(async function (req, res) {
  const formData = await new Response(req, { headers: req.headers }).formData()
  const file = formData.get('avatar')
  
  var arrayBuffer = await file.arrayBuffer()
  var text = await file.text()
  var whatwgReadableStream = file.stream()
});

It would be great if undici had the same support for multipart form data.

@jimmywarting
Copy link
Contributor

jimmywarting commented Jan 19, 2022

the example should actually be something like this:

http.createServer(async function (req, res) {
  const url = `${req.protocol}://${req.get('host')}${req.originalUrl}`
  const request new Request(url, { body: req, method: req.method, headers: req.headers })
  const formdata = request.formData()
  const file = formData.get('avatar')
  
  var arrayBuffer = await file.arrayBuffer()
  var text = await file.text()
  var whatwgReadableStream = file.stream()
});

But it's easier/quicker to construct a Response that don't need any url or extra options
the idea of having a formdata parsing and shipping the body.formData() comes from service workers
a service worker is meant to intercept the request and be able to parse the payload when needed and insert any missing fields like a authentication key or something

I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.

It's not about responding with a multipart/form-data response from a server like in: fetch(url).then(res => {res.formData()})
cuz again that would be weird if a server responded with a multipart/form-data. It's rather more about parsing incoming request to the http server which alot of folk use nodejs for.

I have long been wishing for a body.formData() but I never wanted to have that kind of functionality for getting such responses back from the server. it was b/c I wished for the servers IncommingRequest to mimic more like a service worker request and behaving more like the Request class... Where parsing IncommingRequest with body.formData() is essential so you wouldn't need busboy, formidable or any other library that you also need to learn how it works.

But again for a server to respond with a multipart/form-data isn't all that bad. It's a grate thing to use for sending multiple files back to the client now that fetch also has .formData(). And I have used it once in the web for transferering files as well.

// server.js
http.createServer(async function (req, res) {
  const formData = new FormData()
  fd.set('/uploads/logo.png', new File(...))
  fd.set('/uploads/movie-introduction.mp4', new File(...))
  const {body} = new Response(formData)
  stream.Readable.from(body).pipe(res)
})

// client.js
const res = await fetch('/get-files')
const fd = await res.formData()
const files = [...fd]
// use the files for something

This is much easier than bringing in a zip for packing and unpacking on both the server and on the client if you for any reason needs to get multipe files from the server. I have done this once before myself. But this is unherd of cuz never once have we had xhr.responseType = 'FormData' and not many are using service worker to intercept the request payload. this trully is a missing functionallity that we yet haven't taped into yet if there is such a thing to parse formdata in the other direction as well (from server to client).


I also added it cuz it's also part of the spec which might be the biggest reason to impl it so folk get what they expect

@ronag
Copy link
Member Author

ronag commented Jan 19, 2022

@jimmywarting: I wouldn't mind adding support for it. Would you be ok if someone port https://github.com/node-fetch/node-fetch/blob/main/src/utils/multipart-parser.js & https://github.com/node-fetch/node-fetch/pull/1314/files over to undici? @mcollina wdyt?

@jimmywarting
Copy link
Contributor

jimmywarting commented Jan 19, 2022

Sure, Be my guest take what u want. The source is based on a quite old version of formidable that i patched quite a bit in 2017 (to get rid of some core nodejs parts like Buffer and Streams) so that it would make a friendlier browser version for github/fetch polyfill but never got merged. it was patched even more when folks reviewed my PR to node-fetch for some performences improvements

also take a look into busboy if their new parser is any faster. i haven't tested it against node-fetch new parser

I think the owner of busboy said that it dose some more extra stuff that don't co-here with the FormData parser spec like supporting non-UTF8 encodings


also have a other question... would this formData() parser make it into stream consumers as well somehow?

import {formData} from 'node:stream/consumers';
const fd = await formData(iterable, {boundary: req.boundary})
fd // is instancesof FormData

Think it would also be cool to have something more low level like

import {formData, blob} from 'node:stream/consumers'

// returns another asyncIterable (maybe the stream consumer name `formData` is the wrong name for it)
const asyncIterable = formData(req, {boundary: req.boundary})

for await (const [name, entry, meta] of asyncIterable) {
  if (typeof entry === 'string') {
    // it's normal regular string, should this also be streamable, what if the payload is really large?
  } else {
    // it's either a File class or something that is streamable/iterable as well.
    const blob = await blob(entry, { type: meta.contentType || meta.type }) // wish for type option to exist here (it's missing)
    new File([blob], meta.fileName, { type: blob.type })
  }
}

another solution could be to treat every entry the same and the meta only changes depending on the type of file vs normal field:

import {multipart, blob, text} from 'node:stream/consumers'

// returns another asyncIterable
const asyncIterable = multipart(req, {boundary: req.boundary})

for await (const [entry, meta] of asyncIterable) {
  const headers = meta.headers // Headers is a instance of fetch `Headers` class
  const type = headers.get('content-type')
  const name = meta.fieldName 

  // parses the filename and the utf8 filename variant as well out of content-disposition header
  const fileName = meta.fileName // undefined || string
  
  const item = await (fileName ? blob(entry) : string(entry))
}

think i like this☝️ solution more then the previous 2 examples

@blittle
Copy link

blittle commented Jan 19, 2022

We'd like to have multipart support for Shopify Hydrogen, where our API route handlers are just functions that take undici Request objects. See Shopify/hydrogen#450

@KhafraDev
Copy link
Member

KhafraDev commented Jan 21, 2022

I have a working parser that just needs tests written before a PR (no code used from other parsers which should help with license issues), although tests keep stalling tap (same issue as client-fetch.js which causes the current failures in the CI).

const server = createServer(async (req, res) => {
    const chunks = []
    for await (const chunk of req) chunks.push(chunk)

    const resp = new Response(Buffer.concat(chunks), { headers: req.headers })
    const form = await resp.formData()
    
    const same = form.get('c')
    const hope = readFileSync('./index.html', 'utf-8')

    assert(same.name, 'index.html')
    assert(hope === await same.text())
    assert(form.get('a') === 'b')
    
    res.end('lol')
    server.close()
}).listen(0)

const form = new FormData()
form.set('a', 'b')
form.set('c', new Blob([readFileSync('./index.html')]), 'index.html')

// send POST req with form body here...

I'm just worried as it's less than 100 lines that there may be edge cases I've missed, although I've hopefully made it fool-proof 😅.

@mcollina
Copy link
Member

It needs to be a streaming client - the code in busboy shows more or less how complex that is. Might be interesting to try implementing it using async iterators

@jimmywarting
Copy link
Contributor

jimmywarting commented Jan 21, 2022

Another thing i have been thinking about doing in node-fetch when parsing FormData and encountering very large files would somehow be to offload the data into some temporary file instead of allocating everything into memory... that might be something that's next on my agenda...
that's when FileLike object's would come into play... I really wish that this got fixed in NodeJS

@KhafraDev
Copy link
Member

KhafraDev commented Jan 21, 2022

It needs to be a streaming client - the code in busboy shows more or less how complex that is. Might be interesting to try implementing it using async iterators

I added the ability to use async iterators which each yield an object close to

{
    headers: [Object: null prototype] {
      'Content-Disposition': 'form-data',
      name: 'c',
      filename: 'index.html',
      'Content-Type': 'application/octet-stream'
    },
    content: '... file contents...'
  }

although I loop over the entire buffer body before parsing so I need to combine that logic 🤔
& I'm unsure which of those values are headers so I just lumped them together

@mcollina
Copy link
Member

Buffering the full body is not going to work well for Node.js unfortunately, it can end up using quite a lot of memory.

@jimmywarting
Copy link
Contributor

jimmywarting commented Jan 21, 2022

I agree with @mcollina
I think the content: '... file contents...' should be streamable as well... and it should most certainly not be a string in the case where you upload binary data, A async iterator that yields Uint8Array would be preferred

@KhafraDev
Copy link
Member

KhafraDev commented Jan 21, 2022

I think I can do that, although using formidable/busboy/dicer, etc, is probably a better choice in this situation. The fetch spec does require parsing formdata responses though

@Rich-Harris
Copy link

I'll add SvelteKit to the list of frameworks that would definitely benefit from this — our best workaround is to monkey-patch the undici Request object's formData method, and create a node-fetch Request with the raw data in order to use its formData method: sveltejs/kit#5292

Obviously this is sub-optimal since it doesn't permit streaming parsing, and it adds bloat to apps, but it'll do for the time being!

@mcollina
Copy link
Member

Essentially we need to support await response.formData(). Is that correct?
@KhafraDev you did a lot of work for the spec compatiblity, would you like to take this on?

I'm ok to implement this here, even if it would have some major caveats for big files.

@jimmywarting
Copy link
Contributor

I'm ok to implement this here, even if it would have some major caveats for big files.

Would it be OK to write files to a tmp directory and using finalizers and then remove them once it's no longer referenced? I kind of did that in this PR: node-fetch/fetch-blob#137

this feature would kindof be blocked by Getting Files backed up by the filesystem

@kibertoad
Copy link
Contributor

Is it safe to assume availability of tmp directory in cloud environments? Immutable containers are not widely used?
Wonder if it could have a fallback or an override option.

@mcollina
Copy link
Member

Would it be OK to write files to a tmp directory and using finalizers and then remove them once it's no longer referenced? I kind of did that in this PR: node-fetch/fetch-blob#137

I would prefer to not implement that in here but rather provide some hook that some external module can plug in to implement that behavior. There are likely so many different ways this can be done that I do not want to support them all in this module.

One note: you want the files to be removed anyway on process end, take a look at https://github.com/mcollina/on-exit-leak-free as it allows you to have a finalizer that is run on process exit.

this feature would kindof be blocked by Getting Files backed up by the filesystem

Indeed, that should be implemented first.

@jimmywarting

This comment was marked as off-topic.

@mcollina
Copy link
Member

mcollina commented Aug 9, 2022

What we need is somebody with time to implement this. Can somebody from Shopify, Cloudflare, Vercel (just to name a few of the companies that +1 here) help?

JacobMGEvans added a commit to cloudflare/workers-sdk that referenced this issue Aug 10, 2022
Workaround for handling multipart/form-data slicing off top and bottom boundaries.
- [] Need to prevent download in existing projects
- [] Duplicate the logic for JS files

multipart/formdata not supported currently in Undici nodejs/undici#974
JacobMGEvans added a commit to cloudflare/workers-sdk that referenced this issue Aug 10, 2022
Workaround for handling multipart/form-data slicing off top and bottom boundaries.
- [x] Need to prevent download in existing projects
- [x] Duplicate the logic for JS files
- [] tests
multipart/formdata not supported currently in Undici nodejs/undici#974
JacobMGEvans added a commit to cloudflare/workers-sdk that referenced this issue Aug 11, 2022
Added the --from-dash <worker-name>, getting the source code into an initialized project replacing the template script.

Resolves #1624
Discussion: #1623
notes: Workaround for handling multipart/form-data slicing off top and bottom boundaries.
multipart/formdata parsing not supported currently in Undici nodejs/undici#974
JacobMGEvans added a commit to cloudflare/workers-sdk that referenced this issue Aug 11, 2022
Added the --from-dash <worker-name>, getting the source code into an initialized project replacing the template script.

Resolves #1624
Discussion: #1623
notes: Workaround for handling multipart/form-data slicing off top and bottom boundaries.
multipart/formdata parsing not supported currently in Undici nodejs/undici#974
JacobMGEvans added a commit to cloudflare/workers-sdk that referenced this issue Aug 12, 2022
Added the --from-dash <worker-name>, getting the source code into an initialized project replacing the template script.

Resolves #1624
Discussion: #1623
notes: Workaround for handling multipart/form-data slicing off top and bottom boundaries.
multipart/formdata parsing not supported currently in Undici nodejs/undici#974
JacobMGEvans added a commit to cloudflare/workers-sdk that referenced this issue Aug 12, 2022
Added `wrangler init --from-dash <worker-name>`, which allows initializing a wrangler project from a pre-existing worker in the dashboard.

Resolves #1624
Discussion: #1623

Notes: `multiplart/form-data` parsing is [not currently supported in Undici](nodejs/undici#974), so a temporary workaround to slice off top and bottom boundaries is in place.
JacobMGEvans added a commit to cloudflare/workers-sdk that referenced this issue Aug 12, 2022
…1645)

Added `wrangler init --from-dash <worker-name>`, which allows initializing a wrangler project from a pre-existing worker in the dashboard.

Resolves #1624
Discussion: #1623

Notes: `multiplart/form-data` parsing is [not currently supported in Undici](nodejs/undici#974), so a temporary workaround to slice off top and bottom boundaries is in place.
@JacobMGEvans
Copy link

What we need is somebody with time to implement this. Can somebody from Shopify, Cloudflare, Vercel (just to name a few of the companies that +1 here) help?

We will be picking this up if no one else has already, either myself or @cameron-robey

@KhafraDev
Copy link
Member

I have a WIP implementation of a FormData parser at https://github.com/KhafraDev/undici/tree/formdata-parser

It's surprisingly small for its capabilities (basic parsing, even with mismatched chunks) and should be easy enough to maintain. It has an API that is a mix between formidable (writing chunks to the parser directly) and like busboy (using an EventEmitter).

Currently it buffers the body in one big chunk, but similarly to busboy, will be moved to its own EventEmitter. For fetch's use case, this doesn't matter as the .formData() algorithm is extremely inefficient anyways and any effort to optimize each part's body will be nullified when re-constructing the FormData object. Even browsers and Deno parse formdatas synchronously and in one big chunk. (On a side note, my bad attempt from earlier in this thread would actually fit our usecase as, unbeknownst to me, was how many other platforms did it).

I did write it to be used outside of fetch, although I have no idea if it will/should be exposed. It needs tests, header parsing, error & end events, and likely much more I can't think of off the top of my head. It's also more maintenance for undici maintainers (not just me necessarily).

Usage currently works along the lines of

import { FormDataParser, FormData, request } from 'undici'
import { createServer } from 'http'
import { once } from 'events'

const server = createServer(async (req, res) => {
  const parser = new FormDataParser(req.headers['content-type'])

  parser.on('header', (header) => {
    /* do something with header */
  })
  parser.on('body', (chunk) => {
    /* do something with body */
  })

  for await (const chunk of req) {
    parser.write(chunk)
  }

  res.end('got formdata')
}).listen(0)

await once(server, 'listening')

const fd = new FormData()
fd.set('key', 'and value')

const blob = new Blob([/* some data */])
fd.set('file', blob, 'readme.md')

await request(`http://localhost:${server.address().port}`, {
  method: 'POST',
  body: fd
})

tl;dr I made a small async FormData parser that needs a lot more work. Also busboy is a better solution for now regardless of my opinion of adding dependencies for a niche feature.

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 27, 2022

I think i would prefer busboy, it's a very fast multipart parser that scans using streamsearch (Boyer-Moore-Horspool) that is very efficient (using bytes instead of string search). plus it also handles different kinds of encodings (utf8 and 7bit)

@KhafraDev
Copy link
Member

KhafraDev commented Aug 27, 2022

that is very efficient

This is irrelevant in regards to .formData() - it's an inefficient method which is nearly ENTIRELY unfit for servers. Other implementations use synchronous parsing of FormData strings without chunking or regard for efficiency (see: Deno, Firefox, and WebKit). Note: I couldn't find info on how Chromium parses FormDatas, however I assume it would be the same.

Also note that the experimental/WIP FormData parser I worked on does use bytes to compare buffers (albeit in the form of Buffer.prototype.indexOf and Buffer.prototype.includes, rather than streamsearch -- unsure of performance implications).


However, as I recommended in the comment before, using busboy for now is fine:
... Also busboy is a better solution for now...

@kibertoad
Copy link
Contributor

This is irrelevant in regards to .formData() - it's an inefficient method which is nearly ENTIRELY unfit for servers.

Can you clarify this statement? Is it impossible to implement it efficiently, or doing so would require significantly more effort?

@KhafraDev
Copy link
Member

KhafraDev commented Aug 27, 2022

It is impossible to implement it efficiently - at the time of writing this - for the reasons outlined here. Essentially, even if you parse it asynchronously & chunk the body data, you will still end up with Blob or File objects that could contain multiple gbs of data.

Without blobs being backed up by the filesystem, memory utilization is quite poor (see nodejs/node#37340 (comment)). I know @jimmywarting has made some effort to have this implemented in node core, but unfortunately no one has tackled it just yet.


On the server, it doesn't seem to matter. There are better ways of parsing FormDatas in node and Deno. BodyMixin.formData() should only implemented for cross-compatibility with browsers, and should not be recommended in every other scenario.

@mcollina mcollina added this to the fetch milestone Aug 29, 2022
@PodaruDragos
Copy link

so now that undici has mutipart/form-data using busboy is @KhafraDev answer okay ?
For example is this approach reliable for getting and witting a file to disk ?

createServer(async (request, response) => {
  const chunks: Buffer[] = [];

  for await (const chunk of request) chunks.push(chunk);

  const resp = new Response(Buffer.concat(chunks), {
    headers: request.headers as HeadersInit
  });
  const form = await resp.formData();

  const file = form.get('a-file') as File;
  const fileArrayBuffer = await file.arrayBuffer();

  await writeFile(file.name, Buffer.from(fileArrayBuffer));

  response.end('done');

}).listen(3000);

@KhafraDev
Copy link
Member

KhafraDev commented Aug 31, 2023

I would recommend using busboy or @fastify/busboy directly, which suits nodejs more than .formData (which I have a strong disdain for)

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 31, 2023

i think i would have done it more like this:

createServer(async (request, response) => {
  const resp = new Response(request, { headers: request.headers })
  const fd = await resp.formData()
  const file = fd.get('a-file')
  await fs.promises.writeFile(file.name, file.stream())
  response.end('done')
}).listen(3000)

Response and writeFile both accepts a AsyncIterator that can yield uint8arrays
So there is no need to read all chunks and concat them into a single large buffer:

  const chunks: Buffer[] = [];
  for await (const chunk of request) chunks.push(chunk);
  const resp = new Response(Buffer.concat(chunks), {

and likewise there is no need to read the hole file data to an arrayBuffer()

  const fileArrayBuffer = await file.arrayBuffer();
  await writeFile(file.name, Buffer.from(fileArrayBuffer));

i think that res.formData() is a grate option actually as a dependency free solution by only using web-tech.
the only pitty is that if you upload really large file then the hole formdata you get from res.formData() with all the Files will be living in the memory. so i also agree with @KhafraDev about the usage if the files are really big.

undici / fetch / formData, have the potential to parse the multipart upload and all the files into a temporary location and write it them to the disc

then it could populate the FormData with disc- based blobs

// kind of doing this:
fd = new FormData()
blob = await fs.openAsBlob(path, { type })
fd.append('a-file', blob, filename)
// later resolve `res.formData()` with the FormData
return fd

this was something that i had in the pipeline for node-fetch and also the reason of why i created createTemporaryBlob / createTemporaryFile in fetch-blob

But i haven't gotten around at implementing this yet.
I was planing on doing it for both res.formData() and also res.blob()
but only conditionally.
i would start reading some chunks into memory and create some few in-memory parts and if it would grow over a certain size then i would offload the data to the disc

this would match more closely to how chrome operates when using .formData() and also .blob()

@KhafraDev
Copy link
Member

I disagree. It's an awful solution that should have never landed in undici. I regret not blocking the PR that implemented it until we could find common ground.

  • It's not spec compliant (remember this)
  • It's magnitudes times slower than other spec compliant solutions...
  • The only requests for it to be implemented were from users of packages like svelte kit or miniflare who needed compatibility with other runtimes. This could have been easily polyfilled.

For large files, a streaming implementation doesn't matter whatsoever. It's all in memory anyways, as you can't opt-in for node to save the blobs FormData creates as file-based blobs. For small files, a streaming implementation doesn't make sense for obvious reasons.

undici / fetch / formData, have the potential to parse the multipart upload and all the files into a temporary location and write it them to the disc

Which is easier said than done. How do we decide which blobs use the file-based solution? Is it even faster? Do we leave it up to the users, who would have to install undici anyways (and rather could have just installed @fastify/busboy), to configure themselves? How do we handle permissions, filesystem errors, cleaning up files, etc.?

this would match more closely to how chrome operates when using .formData() and also .blob()

Chrome's .formData parser is also synchronous, as it should be according to the spec, as it is in every other browser and server environment, so comparing undici/node to any other implementation is not a good comparison. I don't see a reason why we should follow chrome with regards to how it offloads files while we are completely ignoring the spec.

@KhafraDev
Copy link
Member

KhafraDev commented Aug 31, 2023

I'd also like to mention that unlike busboy or @fastify/busboy, there is no way to limit the size of a file, nor the number of files or fields, nor file name length, or any other limit that these packages have. It's a giant security risk to use it in your http server.

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 31, 2023

How do we handle

Permissions

🤷‍♂️

Filesystem errors

could fallback to in-memory based blobs

Cleaning up files

the process of cleaning up files can be managed using finalizers, as exemplified in this reference, losing references to the blob could delete the file.

Decide which blobs use the file-based solution

For determining the appropriate solution for blobs, particularly in the context of res.blob() in Chrome, factors such as the content-length can play a role. If the data is not compressed and the content-length is available, a direct data pipe to the disk could be considered. However, for compressed data (gzip, br, deflate), or when there's no content-length due to streaming, an approach involving memory storage might be preferred. This could entail reading the stream into memory until a certain threshold, then transferring the remaining data to a disk-based file. Alternatively, a combination of in-memory and disk-based blobs could be used. return new Blob([in_memory, disc_based_blob])

For res.formData(), a straightforward streaming process can be adopted. Upon encountering the start of a file within the data, chunks can be processed incrementally. If the chunk size becomes substantial, data already processed can be written to a disk-based blob, followed by piping the rest of the data to that blob. Similar strategies can be applied as described above.


I'd also like to mention that unlike busboy or @fastify/busboy, there is no way to limit the size of a file, nor the number of files or fields, nor file name length, or any other limit that these packages have.

Well, at least you could examining the complete content-length to set an overall data limit.

Alternative Solutions

There could also be a completely different approach to this as well. It involves a separate handling mechanism that absolves the developer from grappling with these complexities. If NodeJS featured a "Blob Storage" concept, it could autonomously decide whether to page BlobParts to disk to manage excessive data allocation. This would allow developer to create blob chunks without needing to worry about offloading data to the disk.

So you would just create many blobs chunks and the "core" would take care of automatically deciding if it should offload some data to the disc or not. and it could happen without you even knowing that it have happened.

Google provides relevant documentation on this approach here and here.

In this alternative scenario, the primary responsibility would be generating blob chunks, while the core system would manage disk paging automatically. An example implementation is shown below:

async blob() {
  const chunks = []
  for await (const uint8_chunk of getStream(this)) {
    // creating many small blobs "could" automatically start paging to Disk
    const small_chunk = new Blob([ uint8_chunk ]) 
    chunks.push(small_chunk)
  }
  
  // this final blob would just hold many references to where each chunk is on the disc. 
  // so this would not hold much data or any memory, this would be fine to have in memory without problems
  return new Blob(chunks)
}

This approach presents advantages by globally handling the intricacies and simplifying memory and disk usage concerns for everyone by not having to think about this concerns


Chrome's .formData parser is also synchronous, as it should be according to the spec,

if the parser is being synchronous or asynchronous in accordance with specifications isn't the main concern. The paramount consideration is that the parser functions as anticipated by the specifications. If different approaches yield the same outcome, the manner in which it's achieved becomes irrelevant

@KhafraDev
Copy link
Member

KhafraDev commented Aug 31, 2023

First off I'd like to mention that the solutions offered either do not work in this context or are more complicated than using one of the aforementioned libraries. Chrome also works in an entirely different environment than node, where there is not a better solution other than this overly complicated approach. Node has better alternatives, even if we were to implement everything here (which I am strongly against due to how complex it is).

the process of cleaning up files can be managed using finalizers, as exemplified in this reference, losing references to the blob could delete the file.

There is no guarantee that those finalizers are called. If the process crashes or is killed externally, is there any guarantee that they're called? If not, do we just ignore those temporary files and let them build up? We're still facing issues with WeakRefs and finalizers in other parts of fetch.

Well, at least you could examining the complete content-length to set an overall data limit.

This unfortunately isn't even close to a viable alternative.

specifications isn't the main concern.

It is the main concern. The spec, for everything else, is the foremost and important thing we regard. I see no reason that it should be different for .formData, but it is.

The paramount consideration is that the parser functions as anticipated by the specifications.

It doesn't. There are 4 WPTs that do not pass for .formData, this is an absurdly high amount of failures considering the number of tests (14) and compared to other mixin methods, which are spec-compliant.

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 31, 2023

This unfortunately isn't even close to a viable alternative.

I know, but this is all that you got to go on... it isn't much but at least it's something. it could work for a simple check such as not being able to upload more than 4 GiB...

If the process crashes or is killed externally, is there any guarantee that they're called?

I haven't tested every corner cases yet, I have so far is a simple

process.once('exit', () => {
  tempDir && rmdirSync(tempDir, { recursive: true })
})

to clean up temp files on exit.

There could be an alternative solution to this.

  1. create a file and open a File Descriptor (fd)
  2. immediately unlink the fd
  3. any remaning fd can then read/write to this file as long as it's open while the file is still flagged for deletion

If the process has opened a file and that file was flagged for deletion (using functions like unlink() or remove() in C), the file will be deleted from the filesystem when the last reference to it is closed. This is because the file's directory entry is removed as part of the deletion process.

So if the process suddenly exit or is killed then the file will be deleted afterwards.
So maybe i will refactor my own code to using that instead of fs.promise.writeFile()

finalizers could still be used to close fd when a ref to a blob is GC.

It doesn't. There are 4 WPTs that do not pass for .formData, this is an absurdly high amount of failures considering the number of tests (14) and compared to other mixin methods, which are spec-compliant.

I'm curious what this 4 are, mind sharing any links?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fetch Status: help-wanted This issue/pr is open for contributions
Projects
None yet
Development

Successfully merging a pull request may close this issue.