Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FileReaderSync #1830

Closed
JacobMGEvans opened this issue Dec 22, 2022 · 27 comments
Closed

Implement FileReaderSync #1830

JacobMGEvans opened this issue Dec 22, 2022 · 27 comments
Labels
enhancement New feature or request

Comments

@JacobMGEvans
Copy link

JacobMGEvans commented Dec 22, 2022

This would solve...

Currently I am doing something that involves keeping everything out of Promise land, however requires reading a Blob

The implementation should look like...

Should look much like the FileReader however have sync behavior.
https://developer.mozilla.org/en-US/docs/Web/API/FileReaderSync

I have also considered...

Currently attempting to make a workaround with the given FileReader available with Unidici as of now.

Additional context

ex.

function mockFormDataToString(this: FormData) {
	const entries = [];
	for (const [key, value] of this.entries()) {
		if (value instanceof Blob) {
			const reader = new FileReaderSync();
			reader.readAsText(value);
			const result = reader.result;

			entries.push([key, result]);
		} else {
			entries.push([key, value]);
		}
	}
	return JSON.stringify({
		__formdata: entries,
	});
}
@JacobMGEvans JacobMGEvans added the enhancement New feature or request label Dec 22, 2022
@Ethan-Arrowood
Copy link
Collaborator

👍 on adding this api. TBH we shouldn't have landed the file api without it in the first place. Are you interested in implementing the api or would you like to leave it for someone else?

@JacobMGEvans
Copy link
Author

👍 on adding this api. TBH we shouldn't have landed the file api without it in the first place. Are you interested in implementing the api or would you like to leave it for someone else?

Oh hey Ethan, want too kinda, have the bandwidth... probably not.

@jimmywarting

This comment was marked as resolved.

@jimmywarting
Copy link
Contributor

jimmywarting commented Dec 22, 2022

Ok!

So here is the solution i came up with:

  • Blobs are transferable to worker threads, and posting (transfering) stuff is done synchronous 🚀
  • And a node script can be halted using Atomics + shared ArrayBuffers.
  • So using this we can start a new worker with some initial data.
  • wait for the blob to be read.
  • and then send it back by transferring the arraybuffer instead of copying the data over.
  • it requires an extra file (to run the worker), an extra module (aka sync filereader) and your own code...
/*! Read blob sync in NodeJS. MIT License. Jimmy Wärting <https://jimmy.warting.se/opensource> */

// Worker Thread (blob-worker.js)

const { workerData } = require('worker_threads')

const { signal, port, blob } = workerData

blob.arrayBuffer().then(ab => {
  // Post the result back to the main thread before unlocking 'signal'
  port.postMessage(ab, [ab])
  port.close()

  // Change the value of signal[0] to 1
  Atomics.store(signal, 0, 1)

  // This will unlock the main thread when we notify it
  Atomics.notify(signal, 0)
})
/*! Read blob sync in NodeJS. MIT License. Jimmy Wärting <https://jimmy.warting.se/opensource> */

// Module (FileReaderSync.js)

const {
  Worker,
  receiveMessageOnPort,
  MessageChannel,
} = require("worker_threads")

/**
 * @param {Blob} blob
 * @returns {ArrayBuffer}
 */
function read(blob) {
  const subChannel = new MessageChannel()
  const signal = new Int32Array(new SharedArrayBuffer(4))
  signal[0] = 0

  const worker = new Worker('./blob-worker.js', {
    transferList: [subChannel.port1],
    workerData: {
      signal,
      port: subChannel.port1,
      blob,
    },
  })

  // Sleep until the other thread sets signal[0] to 1
  Atomics.wait(signal, 0, 0)

  // Close the worker thread
  worker.terminate()

  return receiveMessageOnPort(subChannel.port2)?.message
}

class FileReaderSync {
  readAsArrayBuffer(blob) {
    this.result = read(blob)
  }

  readAsDataURL(blob) {
    const ab = read(blob)
    this.result = `data:${blob.type};base64,${Buffer.from(ab).toString('base64')}`
  }

  readAsText(blob) {
    const ab = read(blob)
    this.result = new TextDecoder().decode(ab)
  }

  // Should not be used, use readAsArrayBuffer instead
  // readAsBinaryString(blob) { ... }
}

exports.FileReaderSync = FileReaderSync
// Main Thread (your code)
const {FileReaderSync} = require('./FileReaderSync.js')

const blob = new Blob(['abc'], {type: 'text/plain'})

const frs = new FileReaderSync()
frs.readAsText(blob)

console.log(frs.result)

@jimmywarting
Copy link
Contributor

IMO i think that a sync file reader should not be implemented at all into NodeJS
It would only encourage bad practices i suppose

@KhafraDev
Copy link
Member

I think FileReaderSync is a huge footgun - similar to other synchronous apis (like XMLHttpRequest with async=false).

Unlike XMLHttpRequest, synchronously reading a blob is only available in workers:

[Exposed=(DedicatedWorker,SharedWorker)]

(from its webidl definition)


As the person who implemented FileReader initially, I don't think it was a good fit for undici. I think WebSocket, fetch, and even a FormData parser (💨) make sense to have - they revolve around different pieces of HTTP/1.1. FileReader... doesn't, although the logic at the time was fair: undici already implements File (which is part of the same FileAPI spec), why not implement FileReader also?

Now, considering that node has its own File implementation - also implemented by me - it would be a much better fit in core. Plus, FileReaderSync likely needs some C++ to be limited to workers (and even read the blob synchronously).

@KhafraDev
Copy link
Member

IMO i think that a sync file reader should not be implemented at all into NodeJS
It would only encourage bad practices i suppose

The base FileReader is a perfectly fine API IMO. Would you be willing to expand on what you mean by "bad practices"?

@jimmywarting
Copy link
Contributor

For browser (in the main thread) then you could do something how i posted on this old gist

const blob = new Blob(['123'])
const xhr = new XMLHttpRequest()
const url = URL.createObjectURL(blob)

// required if you need to read binary data:
xhr.overrideMimeType('text/plain; charset=x-user-defined') 
xhr.open('GET', url, false)
xhr.send()

const uint8 = Uint8Array.from(xhr.response, c => c.charCodeAt(0))
const text = new TextDecoder().decode(uint8)
const arrayBuffer = uint8.buffer
const json = JSON.parse(text)

@jimmywarting
Copy link
Contributor

Would you be willing to expand on what you mean by "bad practices"?

Probably the same reason as you wrote

"I think FileReaderSync is a huge footgun - similar to other synchronous apis"

Also as i mention earlier, Again if we get support for blob's backed up by the fs later,
then it's possible that you would mount a cloud network disc and create blob from a those paths.

if you try to read this files synchronous, then it will block the main thread for a long time.

@KhafraDev
Copy link
Member

KhafraDev commented Dec 22, 2022

FileReaderSync is fundamentally different from FileReader. You can tell there was consideration while adding it (unlike sync XMLHttpRequest) to not be a footgun - only being exposed in worker threads. FileReader is a perfectly fine, asynchronous API. Plus, every browser and Deno implements it.

@jimmywarting
Copy link
Contributor

jimmywarting commented Dec 22, 2022

i just think that the FileReader is such a legacy thing nowdays when we got the new blob(text/arrayBuffer/stream) reading methods. Deno didn't really want to have it either, And they did also prefer the new blob read methods. but they added it b/c they wanted aws and other library to be working in deno too, that's why they added it. and probably also to pass the WPT test.

I think that WPT should change there blob/file test cases to use the new blob reading methods and flesh out the FileReader to be something separate

@KhafraDev
Copy link
Member

Deno didn't really want to have it either

I don't see any disdain from the Deno maintainers about adding it, actually they all seemed to support adding it. The reasons you listed for them adding it don't seem negative - actually the opposite: it increases cross-environment compatibility. I don't see the downsides of this, regardless of if it's legacy.

FileReader is such a legacy thing nowdays

I don't think that this makes adding it any less valuable. I could see it being added in core while being flagged as legacy, similar to atob/btoa.

@jimmywarting
Copy link
Contributor

only being exposed in worker threads

Service worker dose not have it...
My view on NodeJS is more like a Service Worker, then a Web Worker. mostly b/c you use it to write server in backends

@jimmywarting
Copy link
Contributor

jimmywarting commented Dec 22, 2022

denoland/deno#5249 (comment) aws-sdk was their compelling reason.

also see w3c/ServiceWorker#735

@KhafraDev
Copy link
Member

KhafraDev commented Dec 22, 2022

Cross compatibility seems like a strong use case as well for implementing it. edit: it referring to FileReader in core.

Service worker dose not have it...

It's not a service worker because it allows synchronous, blocking work: "any type of synchronous requests must not be initiated inside of a service worker". However it's not really close to anything the browser implements, so you could call it whatever and you'd probably be partially correct.

Anyways, that seems like a more compelling reason not to implement FileReaderSync. Even if we did implement it, there's no chance that it would ever get past the tsc, lol.

@jimmywarting
Copy link
Contributor

jimmywarting commented Dec 22, 2022

ok... so if we did by any means implement a sync File Reader (which i think we should not anyways), then maybe it could just be available inside of worker_threads?

@KhafraDev
Copy link
Member

KhafraDev commented Dec 22, 2022

I think one of us (or maybe both of us) is getting confused. I don't think FileReaderSync should be implemented. I think FileReader (the async one) should be, but in nodejs core, not undici.

@jimmywarting
Copy link
Contributor

jimmywarting commented Dec 22, 2022

yea maybe. i have mostly been talking about sync file reader. but also both.

Things we agree on are:

  • FileReaderSync should not be implemented ✅
  • The sync FileReader should be handled by nodejs core (not undici) ✅

but IMO i don't think there is any reason to keep using the async FileReader either when we already have something better.
So no to both sync and async FileReader 😜

@KhafraDev
Copy link
Member

FileReaderSync should be implemented

I don't think so - it wouldn't even solve the original feature request. After all, you have to open a worker thread and then wait for the 'data' event, meaning you're... still reading it asynchronously. This might not be an issue if you're in a worker thread already, but it's added complexity for a niche use case. The real solution is to use your version as posted above, which will block the main thread.

The sync FileReader should be handled by nodejs core (not undici)

Agreed on this completely. Undici should have never implemented any of the FileAPI (File & FileReader), and rather contributed upstream instead. Although, with webidl, implementing everything in undici is far easier and more correct than in node core.

@jimmywarting
Copy link
Contributor

ups i did a typo, meant should not

@JacobMGEvans
Copy link
Author

Ok!

So here is the solution i came up with:

  • Blobs are transferable to worker threads, and posting (transfering) stuff is done synchronous 🚀
  • And a node script can be halted using Atomics + shared ArrayBuffers.
  • So using this we can start a new worker with some initial data.
  • wait for the blob to be read.
  • and then send it back by transferring the arraybuffer instead of copying the data over.
  • it requires an extra file (to run the worker), an extra module (aka sync filereader) and your own code...
/*! Read blob sync in NodeJS. MIT License. Jimmy Wärting <https://jimmy.warting.se/opensource> */

// Worker Thread (blob-worker.js)

const { workerData } = require('worker_threads')

const { signal, port, blob } = workerData

blob.arrayBuffer().then(ab => {
  // Post the result back to the main thread before unlocking 'signal'
  port.postMessage(ab, [ab])
  port.close()

  // Change the value of signal[0] to 1
  Atomics.store(signal, 0, 1)

  // This will unlock the main thread when we notify it
  Atomics.notify(signal, 0)
})
/*! Read blob sync in NodeJS. MIT License. Jimmy Wärting <https://jimmy.warting.se/opensource> */

// Module (FileReaderSync.js)

const {
  Worker,
  receiveMessageOnPort,
  MessageChannel,
} = require("worker_threads")

/**
 * @param {Blob} blob
 * @returns {ArrayBuffer}
 */
function read(blob) {
  const subChannel = new MessageChannel()
  const signal = new Int32Array(new SharedArrayBuffer(4))
  signal[0] = 0

  const worker = new Worker('./blob-worker.js', {
    transferList: [subChannel.port1],
    workerData: {
      signal,
      port: subChannel.port1,
      blob,
    },
  })

  // Sleep until the other thread sets signal[0] to 1
  Atomics.wait(signal, 0, 0)

  // Close the worker thread
  worker.terminate()

  return receiveMessageOnPort(subChannel.port2)?.message
}

class FileReaderSync {
  readAsArrayBuffer(blob) {
    this.result = read(blob)
  }

  readAsDataURL(blob) {
    const ab = read(blob)
    this.result = `data:${blob.type};base64,${Buffer.from(ab).toString('base64')}`
  }

  readAsText(blob) {
    const ab = read(blob)
    this.result = new TextDecoder().decode(ab)
  }

  // Should not be used, use readAsArrayBuffer instead
  // readAsBinaryString(blob) { ... }
}

exports.FileReaderSync = FileReaderSync
// Main Thread (your code)
const {FileReaderSync} = require('./FileReaderSync.js')

const blob = new Blob(['abc'], {type: 'text/plain'})

const frs = new FileReaderSync()
frs.readAsText(blob)

console.log(frs.result)

Ill try this and report back, the rest of this conversation was... interesting.

@JacobMGEvans
Copy link
Author

JacobMGEvans commented Dec 23, 2022

In my case this polyfill seems to be hanging on the Atomics.wait(signal, 0, 0) seems like the Worker script isn't running 🤷🏻‍♂️

Edit: NVM

@KhafraDev
Copy link
Member

Here's an alternative way that does the bare minimum:

index.js
const { Worker, receiveMessageOnPort } = require('worker_threads')
const { join } = require('path')

const blob = new Blob([Buffer.alloc(2 ** 31 - 1).fill('abcdefg')])

const shared = new SharedArrayBuffer(4)
const { port1: localPort, port2: workerPort } = new MessageChannel()

const path = join(__dirname, 'worker.mjs')

const w = new Worker(path, {
  workerData: { shared, blob, port: workerPort },
  transferList: [workerPort]
})

const int32 = new Int32Array(shared)
Atomics.wait(int32, 0, 0)

const { message } = receiveMessageOnPort(localPort)

console.log(message)
worker.mjs
import { workerData } from 'worker_threads'

const { shared, blob, port } = workerData

port.postMessage({
    body: await blob.arrayBuffer()
})

const int32 = new Int32Array(shared)
Atomics.notify(int32, 0)

@jimmywarting
Copy link
Contributor

looking at your code i just notices that one is a cjs and the other is esm. then i can't help but think that there should be a option for type: "module" just like in the browsers. i though so also when i wrote mine.

new Worker(url, { type: 'module' })

@JacobMGEvans
Copy link
Author

JacobMGEvans commented Dec 28, 2022

Got it working for my case, thanks for the polyfill @jimmywarting. I will leave this open depending on what the maintainers decide, feel free to close it or change it over to a discussion, my issue is resolved.

@ronag ronag closed this as completed Dec 28, 2022
@jimmywarting
Copy link
Contributor

Your welcome!

@jimmywarting
Copy link
Contributor

Just want to share something with you that i have just created:

Let me introduce: to-sync

Now if you want to read the blob in a sync manner then it is just as simple as doing:

import { createWorker } from 'to-sync'

const syncify = createWorker()
const read = syncify(async blob => {
  const ab = await blob.arrayBuffer()
  return new Uint8Array(ab)
})

console.log(read(new Blob(['abc']))) // Uint8Array

or if you like it tiny:

import { createWorker } from 'to-sync'

const readSync = createWorker()(blob => blob.arrayBuffer().then(r => new Uint8Array(ab))

console.log(readSync(new Blob(['abc']))) // Uint8Array

Makes all this abstraction way easier :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants