Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fromFile broken on Node 14 #29

Closed
timsuchanek opened this issue May 6, 2020 · 8 comments · Fixed by #34
Closed

fromFile broken on Node 14 #29

timsuchanek opened this issue May 6, 2020 · 8 comments · Fixed by #34

Comments

@timsuchanek
Copy link

Because of this bug, hasha doesn't work with Node 14 right now: nodejs/node#33263

@jasnell
Copy link

jasnell commented May 6, 2020

As I point out in the discussion in nodejs/node#33263 there are a couple of reasons for this that need to be looked into more on the Node.js side but... the more general issue here is that passing Buffer instances via the transferList the way hasha is doing is generally not safe. The reason is because it is not known if those Buffer instances own their underlying ArrayBuffer and BackingStore or not. When a Buffer (or any TypedArray) is transferred like this, there are a range of issues that can occur that we're just now starting to see. As for why this only started happening in v14.x, that's because there was a change to the way the backing stores are handled at the v8 level.

Essentially, what needs to happen to fix this in the hasha code is to change https://github.com/sindresorhus/hasha/blob/master/thread.js#L15-L16 such that it either:

a) creates a new Uint8Array that is a copy of the data, and transfers that new Uint8Array

for example.. from the example I posted in nodejs/node#33263:

const fs = require('fs')
const crypto = require('crypto')
const { parentPort } = require('worker_threads')

parentPort.on('message', (message) => {
  const hasher = crypto.createHash('sha256')
  fs.createReadStream('example.txt')
    .pipe(hasher)
    .on('finish', () => {
      const { buffer } = hasher.read()
      const buf = new Uint8Array(buffer);  // Create a copy
      parentPort.postMessage({ value: buf }, [buf.buffer])
    })
})

b) Avoids transferring the buffer at all and instead returns the hex-encoded string:

const fs = require('fs')
const crypto = require('crypto')
const { parentPort } = require('worker_threads')
const { pipeline } = require('stream')

parentPort.on('message', (message) => {
  const input = fs.createReadStream('example.txt')
  const hasher = crypto.createHash('sha256')
  pipeline(input, hasher, (err) => {
    if (err) {
      // handle the error appropriately
      return;
    }
    // Pass a hex of the hash rather than the buffer
    parentPort.postMessage({ value: hasher.digest().toString('hex')});
  });
})

Overall, however, given what thread.js in hasha is doing, I really have to question the value of using the worker thread in this way. Because the file read and hashing operations are already async, there is little performance gain to be had here with the current approach.

@sindresorhus ... if you'd allow me a little bit of self promotion, it might be worth taking a look at piscina ... it could make working with worker threads here a bit easier overall.

@sindresorhus
Copy link
Owner

if you'd allow me a little bit of self promotion, it might be worth taking a look at piscina ... it could make working with worker threads here a bit easier overall.

I wish that had existed when we first added worker support here. It looks great.

@sindresorhus
Copy link
Owner

// @stroncium

@stroncium
Copy link
Contributor

Ok, first of all, sorry for the delays.

@timsuchanek Thanks for all the work reporting and reproducing this.

@jasnell The point there is to move hashing out of main thread, not reading. I might be mistaken about original implementation or not up to date with latest changes, but as far as I know crypto's hashes will do the hashing on the main thread.

@sindresorhus Definitely me missing the part where hash might make it's own choices regarding creating a buffer. #34 fixes the issue using buffer clone approach.

@jasnell
Copy link

jasnell commented Oct 6, 2020

but as far as I know crypto's hashes will do the hashing on the main thread.

For the moment, yes. But just a heads up, Web Crypto API support is about to land in master, which does implement crypto functions off the main thread.

@stroncium
Copy link
Contributor

@jasnell Nice to know. Though I'm kinda suspicious it won't work too well for file streams(each chunk = some sync between threads as I believe it's unfeasible to somehow unload IO to the same thread crypto processing will run on).

@jasnell
Copy link

jasnell commented Oct 6, 2020

yes, definitely. There's a definite overhead there. I've been thinking about possible alternatives that will be more feasible once the Web Crypto pr lands. It changes many of the internals. For instance, I've been thinking about an async job that'll take an FD as input and digest the entire thing without pulling any of the chunks of data into JS-land

@stroncium
Copy link
Contributor

@jasnell Well, that's what we're doing here(except in js-land). Regarding non-js version, adding it to standard library feels like a bit of overkill to me. I'd expect it to be a relatively rare case for users to hash(/other crypto primitives) big enough files for js overhead to really matter and when it does matter doing it outside std lib is pretty straightforward(separate process or native library).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants