incorrect results when hashing Buffer inputs #2

Closed
fschwiet opened this Issue Jul 9, 2013 · 7 comments

Projects

None yet

2 participants

@fschwiet
Contributor
fschwiet commented Jul 9, 2013

I was using node-sha1 awhile, its quite nice. When I needed to hash files larger than I could load into memory, I switched over to using crypto and noticed I would get different results. I found that when I used sha1 against Buffer objects, I was getting incorrect results. This is demonstrated by this code sample, where "node-sha1 utf8 result" matches "crypto sha1" result, and the "node-sha1 buffer result" is different/incorrect.
I suppose the moral is don't use node-sha1 against buffers. Maybe the library could be improved to support this or protect against the misuse?

var path = require("path");
var fs = require("fs");
var sha1 = require("sha1");
var crypto = require("crypto");

var filename = process.argv[2];

var contents = fs.readFileSync(filename);
var contentsUtf8 = fs.readFileSync(filename, { encoding: 'utf8'});
console.log("node-sha1 buffer result:", sha1(contents));
console.log("node-sha1 utf8 result:", sha1(contentsUtf8));

var fileStream = fs.createReadStream(filename);
//fileStream.setEncoding('utf8');

var cryptoHasher = crypto.createHash('sha1');

fileStream.on('data', function(chunk) {
    cryptoHasher.update(chunk);
});

fileStream.on('end', function(){
    console.log("crypto sha1", cryptoHasher.digest('hex'));
});
@fschwiet
Contributor
fschwiet commented Jul 9, 2013

example output:

node-sha1 result: 0aeb7632c18215144438163fbe917f6e5e40b97c
node-sha1 utf8 result: 855f5018722318f55c98b2805aa91f72dea4e72b
crypto sha1 855f5018722318f55c98b2805aa91f72dea4e72b
@fschwiet
Contributor
fschwiet commented Jul 9, 2013

Note: the fileStream.setEncoding('utf8') call does not affect the result from crypto library, which I take as evidence they're handling both formats consistently.

@pvorb
Owner
pvorb commented Jul 9, 2013

As the API section of the README says, it's intended to be used with Strings. If the argument is not a string, its toString() method will be called implicitly, so the hash depends on that. If you want it to work with buffers, decode your buffer to a string (as you did) or consider using crypto. I don't plan to handle node's buffers, since this library can also be used in the browser.

@pvorb
Owner
pvorb commented Jul 9, 2013

If you really need this, I'd accept a pull request.

@pvorb
Owner
pvorb commented Jul 9, 2013

Also have a look at charenc which already does string to binary conversion and vice versa.

@fschwiet
Contributor

I agree that sha1 is working as advertised. I put together a pull request though, mainly because that was easier than explaining why I wouldn't be doing so :). The benefit would be preventing people from making the same mistake I did.

@pvorb
Owner
pvorb commented Jul 11, 2013

That's fine.

@pvorb pvorb closed this Jul 11, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment