Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect md5 value for strings greater than 823 bytes? #4

Closed
kikito opened this issue Apr 5, 2015 · 33 comments
Closed

Incorrect md5 value for strings greater than 823 bytes? #4

kikito opened this issue Apr 5, 2015 · 33 comments

Comments

@kikito
Copy link
Owner

kikito commented Apr 5, 2015

@lxss has reported here that md5 is returning the wrong md5 value for long enough strings.

require "md5"
local md5 = require 'md5'

local str = "";
for i=1,823 do
  str = str .. "1";
end
print(md5.sumhexa(str)); --823 correct
str = str .. "1";
print(md5.sumhexa(str)); --824 error

@lxss: How are you calculating the "correct" value? What values are you getting with md5.lua?

@lxss
Copy link

lxss commented Apr 5, 2015

823:
lua = 72bc510bad1c67c3185deb3861c9ab1d
php = 72bc510bad1c67c3185deb3861c9ab1d
824:
lua = 50937e9b88d5c64d3fc519c2717d3cd0
php = a126fd3611ab8d9b7e8a3384e2fa78a0

@lxss
Copy link

lxss commented Apr 5, 2015

"md5.lua 1.0.0"

@kikito
Copy link
Owner Author

kikito commented Apr 5, 2015

Hi lxss, thanks for reporting. I have tested this with both LuaJIT and Lua. I got different results!

$ luajit test.lua
72bc510bad1c67c3185deb3861c9ab1d
a126fd3611ab8d9b7e8a3384e2fa78a0
$ lua test.lua                                                                                                                              
72bc510bad1c67c3185deb3861c9ab1d
50937e9b88d5c64d3fc519c2717d3cd0

It seems that there is an error somewhere in the "pure lua" implementation of binary functions. I will have to give this a look, but I don't have time now.

In the meantime, if you can, try using LuaJIT instead of pure Lua.

@lxss
Copy link

lxss commented Apr 5, 2015

I wrote lua project is based on the third-party platform, so I can only use lua libraries
I went to try to search the bit lua

@lxss
Copy link

lxss commented Apr 5, 2015

I failed, I use bit32 lua libraries
824 calculation is correct, the test result error 2048, a lot of 0.
AM 3:00
I went to bed.
Thank you for your lua
I get up in the continue to study

@kikito
Copy link
Owner Author

kikito commented Apr 5, 2015

I have made some advancements; the bit libraries used in lua 5.2 and luajit throw different results.

At some point in the calculation 817 << 22 needs to be computed. This throws different results in Lua and LuaJIT:

$ lua -v
Lua 5.2.3  Copyright (C) 1994-2013 Lua.org, PUC-Rio
$ lua -e "print(require('bit32').lshift(817, 22))"
3426746368
$ luajit -v
LuaJIT 2.0.3 -- Copyright (C) 2005-2014 Mike Pall. http://luajit.org/
$ luajit -e "print(require('bit').lshift(817, 22))"
-868220928

It seems that internally Lua 5.2 represents numbers as unsigned ints, while LuaJIT uses signed ints. Hence the different results. I have no idea about how to make them return the same. I will investigate.

@lxss
Copy link

lxss commented Apr 6, 2015

Ok, I will also go to try to solve.
I use you write sha1.lua temporary solution file checksum.

@lxss
Copy link

lxss commented Apr 6, 2015

Lua 5.2 default is 64-bit
If it is a 32-bit, the result is correct.
No search how to convert 32-bit

@pablomayobre
Copy link
Contributor

Is the problem in bit32 or bit?

If it is in bit this fixes it (returns the same value as Lua bit32):

Replace this:

local ok, bit = pcall(require, 'bit')
if not ok then ok, bit = pcall(require, 'bit32') end

With this:

local ok, bit = pcall(require, 'bit')
if ok then
  local oldbit, bit = bit, {}
  local number = function(n)
    return tonumber('0x'..oldbit.tohex(n),16)
  end

  bit.bor = function(...) return number(oldbit.bor(...)) end
  bit.band = function(...) return number(oldbit.band(...)) end
  bit.bnot = function(...) return number(oldbit.bnot(...)) end
  bit.bxor = function(...) return number(oldbit.bxor(...)) end
  bit.rshift = function(...) return number(oldbit.rshift(...)) end
  bit.lshift = function(...) return number(oldbit.lshift(...)) end
else
  ok, bit = pcall(require, 'bit32')
end

If the problem is in the other two (bit32 and pure Lua) Then you'll need this function:

normalize = bit.tobit or function (n)
  if n > 0x7fffffff then
    return - (bit_not(n) + 1)
  else
    return n
end

You should normalize all the bitops outputs. So in order to use this function you should check that the library loaded is not LuaJIT BitOps but bit32 or pure Lua

@lxss
Copy link

lxss commented Apr 6, 2015

OK,I go to try

kikito added a commit that referenced this issue Apr 6, 2015
@kikito
Copy link
Owner Author

kikito commented Apr 6, 2015

Hey @lxss,

I have included (a version of) the change proposed by @positive07 and the 1111... test seems to pass just fine now in 5.1, 5.2 and LuaJIT. Apparently it was only failing on Lua 5.2, because of its non-standard representation of bits using unsigned items.

The fix is on master, and I have also released a new version to luarocks.

Let me know if the new version fixes your problem, so I can close this issue.

@kikito kikito mentioned this issue Apr 6, 2015
@pablomayobre
Copy link
Contributor

I'm glad that hack worked haha!

@pablomayobre
Copy link
Contributor

@kikito You are using normalize for bit32 but also for bit (that doesn't need it) which might come as an slow down which may be negligible but should be taken into account.

Doesnt the pure Lua implementation have the same problem as the bit32?

@kikito
Copy link
Owner Author

kikito commented Apr 6, 2015

@positive07 Nice catch!

The only Lua which needed "fixing" was Lua 5.2; LuaJIT and Lua 5.1 both work with unsigned (if you give a look at the 5.1 code it contains things similar to your fix - it does if n < 0 then <stuff>.

I have included an extra if now. LuaJIT simply assigns the functions, while Lua 5.2 normalizes them.

@pablomayobre
Copy link
Contributor

Oh I see now! Nice!

@jayceefun
Copy link

Hello, i got a wrong value too. when i do md5 to a large string (string length: 188731). It's all zero value.

@kikito
Copy link
Owner Author

kikito commented Apr 29, 2015

Hi @jayceefun, can you paste the value somewhere (like in a gist)? If I can't reproduce it, I can't fix it.

@jayceefun
Copy link

I can paste it here. It's a lua file actually, and i just wanna encrypt the whole file to a md5 value. But i got this value: "00000000000000000000000000000000"

@kikito
Copy link
Owner Author

kikito commented May 4, 2015

@jayceefun I have removed your file because it was too long (made scrolling through the issue problematic) and it is possible that github issues add/remove spaces and the like (so the md5 will be different). Please upload it to a gist or some other place which does not reformat text and put the link here.

Also, It would help a lot to know which version of Lua you are using and how exactly you are calculating the md5 of the file in question (the fact that you get all 0s suggests that there might be some other problem somewhere).

@jayceefun
Copy link

@kikito The lua version is 5.1 and I'll check my code again. Thanks for your help :)

@jayceefun
Copy link

@kikito
local f = io.input(filePath, "r")
local fileData = io.read("*all")
local md5Str = md5.sumhexa(fileData)

This is my simple code.. and when the file is not too large it shows correct value of MD5.

@jayceefun
Copy link

I've tried, if the string length more than 63543 , return "00000000000000000000000000000000"

@kikito
Copy link
Owner Author

kikito commented May 4, 2015

@jayceefun This is probably not it, but just in case: try opening the files in binary mode:

local f = io.input(filePath, "rb") -- notice the "rb" instead of just "r"

@jayceefun
Copy link

@kikito it seems that lua stack has a limit size, i'm not sure. Maybe I should cut the file in pieces to calculate, i tried "rb", still got all 0.

@pablomayobre
Copy link
Contributor

@kikito I'm thinking the issue is with:

  1. A string being too long
  2. A number becoming too big and being turned into math.huge or Inf
  3. A table getting too many elements and Lua not handling them right

So in any case an overflow

@stejacob
Copy link

Hi,

Are you planning to add an update() function in the md5 library instead? That would alleviate the issue with very long strings. We could just do a for loop and pass in the chunks of the string.

Regards.

@kikito
Copy link
Owner Author

kikito commented Nov 19, 2015

Hi, can you elaborate? I don't understand what you mean
On Nov 19, 2015 2:00 PM, "stejacob" notifications@github.com wrote:

Hi,

Are you planning to add an update() function in the md5 library instead?
That would alleviate the issue with very long strings. We could just to a
for loop and pass in the chunks of the string.

Regards.


Reply to this email directly or view it on GitHub
#4 (comment).

@stejacob
Copy link

Basically, when we need to generate an MD5 from a very long string, we would split the long string into smaller chunks and pass each chunk to the MD5 in an iterative approach. Therefore, at each iteration, we would update the Hash value by looking at the current checksum state.

For example (something like that):
//=========================
For each block of string Do
md5.update(block of string)
End
local md5Hex = md5.generateHex()
//=========================

Source:
http://stackoverflow.com/questions/2214259/combining-md5-hash-values

Hope that explains better what I mean. By the way, very good job on the MD5 library. Thank you very much.

@tst2005
Copy link

tst2005 commented Nov 19, 2015

FYI, I found alternative implementation https://github.com/tst2005/lua-lockbox/blob/master/lockbox/digest/md5.lua

@shixiaomao
Copy link

Someone solved this problem? I want know how to solve it.

@ghost
Copy link

ghost commented Jul 1, 2016

This patch fixes it for me. Not sure how it will work in a 64 bit machine.

--- md5org.lua  2016-07-01 09:03:17.000000000 +0200
+++ md5.lua 2016-07-01 08:56:52.000000000 +0200
@@ -346,7 +346,7 @@
   c=z(i,c,d,a,b,X[ 2],15,t[63])
   b=z(i,b,c,d,a,X[ 9],21,t[64])

-  return A+a,B+b,C+c,D+d
+  return bit_and(A+a,-1),bit_and(B+b,-1),bit_and(C+c,-1),bit_and(D+d,-1)
 end

 ----------------------------------------------------------------

The problem appears to be that the results of the additions are often out of the 32-bit range. I haven't checked how the out-of-range results propagate to produce erroneous results. It appears that at least two several turns are necessary.

@ghost
Copy link

ghost commented Jul 1, 2016

After a bit more research, I've found that the cause is that the values grow and grow until the precision of the mantissa is not enough to hold the complete integer values. Printing a,b,c,d on each iteration sufficed to show this. The code breaks when reaching 253, because past that number, there are integers that are not representable in double precision.

Based on that information, a more robust patch would be:

--- md5org.lua  2016-07-01 09:03:17.000000000 +0200
+++ md5.lua 2016-07-01 17:05:30.000000000 +0200
@@ -346,7 +346,7 @@
   c=z(i,c,d,a,b,X[ 2],15,t[63])
   b=z(i,b,c,d,a,X[ 9],21,t[64])

-  return A+a,B+b,C+c,D+d
+  return (A+a)%0x100000000,(B+b)%0x100000000,(C+c)%0x100000000,(D+d)%0x100000000
 end

@kikito
Copy link
Owner Author

kikito commented Jul 2, 2016

@pgimeno thanks for taking the time to investigate this. Would you mind sending a pull request? That way you will appear on the contributors list

Also, if you can, please include a test in the PR which fails with the current implementation but works with yours; probably doing something similar to this line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants