New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
computeMD5 with special characters has 2 different outputs #1527
Comments
Thanks @fabiencrassat for bringing this issue to our attention. And thanks for making the recreation scenario. It definitely helps. We will add this to our list of issues. |
Good! Thank you @molivasdat And I'm very curious to understand what will be your analysis. Regards, |
Hi @fabiencrassat Which version of java were you using for the tests? |
Hi, On windows it is:
For Linux I can do it only next week :( Regards, |
Hello @molivasdat, Hope you are being well! I have your answer for linux, I'm using a docker image with IBM jdk-8.251. Hope it will help you :) Regards, |
@fabiencrassat Thanks for getting back to us. Looks like both windows and Linux are using the same major version 1.8 |
Yes ;) |
The goal of the MD5Util class is to be a low-level "just encode the bytes" given helper function. Any "smoothing" of cross-os differences like line ending standardization etc. should be handled upstream, such as in My guess is that there are byte-order differences or encoding differences between the different platforms that makes java generate different bytes for the same input strings and that's why we get the difference. From a library-api standpoint, perhaps it would help to remove the MD5Util.computeMD5(string) function completely? If we just have the InputStream version, it would be more obvious that the function is just computing a value based on bytes and it is up to the caller to be careful of what those bytes are. In the javadoc, we can direct them to either use a ByteArrayInputStream for reading from a string or use the CheckSum.compute() method does smoothing of strings. Internal testing shows that the CheckSum.compute() method is giving the same values across different OSs For the tests you are adding, @fabiencrassat, do they give the same values across platforms if you use the InputStream version of the method and specify the encoding? Also, are the tests there in an attempt to isolate a problem you are seeing in general liquibase usage? Or were you exploring how the method works out of curiosity? |
Hello and sorry for the delay, First, maybe I did not present well the description of my ticket because yes this is a problem that my team encountered when changing the operating system of our Jenkins (Windows to Linux). Since then, Liquibase's generateSQL has been in error because the MD5s of the files compared with the database are different. We therefore are doing our builds locally on our Windows computers. And yes it's also curiosity to better understand your software because I'm always interested in presenting problems with reproducibility. But because I don't know your architecture maybe I added some tests in the wrong place and I'm sorry. @nvoxland, coming back to the CheckSum.compute() method, it does not take special characters (accents, ...) in these tests. So I added them for the method here: https://github.com/fabiencrassat/liquibase/blob/md5encoding/liquibase-core/src/test/java/liquibase/change/CheckSumTest.java#L104 Please feel free to continue to share your thoughts here, I always like to understand the why and how of things. Fabien P.S .: I hope my English is good enough! |
Yes, your english is great @fabiencrassat . Far better than my french :) It makes sense that we'd get different checksums with different encodings of strings, since that could be giving us different bytes that we checksum. The code upstream from the CheckSum should be in charge of ensuring a consistent charset for the stream. We've been making improvements to that since 4.1.1, can you see if you are still seeing the problem with the newest release? If you are, can you tell me how the problem changeset(s) are set up? What type of changes are in there? The problem will more likely actually lie more with sqlFile or the createView or whatever code. |
Closing due to lack of follow-up |
Environment
Liquibase Version: master branch and last tag is v4.1.1
Liquibase Integration & Version: maven 3.6.3
Liquibase Extension(s) & Version: none
Database Vendor & Version: none
Operating System Type & Version: Windows 10 & Ubuntu 18.04
Description
I discovered that the md5 checksum can be different between Windows 10 and Ubuntu 18.04 when we deal with special characters.
I discovered this issue when my team changed their Jenkins pipeline from Windows OS to Linux OS. The generateSQL process was finishing with md5 checksum error. After investigation I saw it was due to accent characters.
Steps To Reproduce
To be very efficient (I hope), I forked your repository and added unit tests to reproduce easily the issue.
What to do:
mvn -Dtest='liquibase.util.MD5UtilTest' test
intoliquibase-core
folder.Do it on Windows OS and Linux OS and you will see the output (commented in
MD5UtilTest.java
file too).Actual Behavior
A string with special characters has two different computeMD5 outputs, depending of the stream encoding and the OS.
Expected/Desired Behavior
A string with special characters should have the same computeMD5 outputs, without concern of stream encoding or the OS.
Additional Context
I can not imagine the impacts of a change like that. That's why I prefer to show you with unit tests. And don't hesitate to explain these impacts here, I will be very happy to understand better!
Sorry for my english, I'm a French old developers! And take care of you all!!
The text was updated successfully, but these errors were encountered: