Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksum calculation affected by environment locale #1806

Closed
mauritz-lovgren opened this issue Apr 18, 2021 · 8 comments
Closed

Checksum calculation affected by environment locale #1806

mauritz-lovgren opened this issue Apr 18, 2021 · 8 comments
Assignees

Comments

@mauritz-lovgren
Copy link

mauritz-lovgren commented Apr 18, 2021

Environment

Liquibase Version:
4.3.3

Liquibase Integration & Version:
Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
Maven home: /usr/local/Cellar/maven/3.8.1/libexec
Java version: 15.0.2, vendor: N/A, runtime: /usr/local/Cellar/openjdk/15.0.2/libexec/openjdk.jdk/Contents/Home
Default locale: nb_NO, platform encoding: UTF-8
OS name: "mac os x", version: "11.2.3", arch: "x86_64", family: "mac"

Database Vendor & Version:
MySQL JDBC connector 8.0.23, MySQL server version: 5.7.34

Operating System Type & Version:
Apple macOS Big Sur 11.2.3

Description

Checksum calculation behaves differently with different terminal locale settings, regardless of encoding settings applied in Maven project (UTF-8). This can cause liquibase commands that works in one environment to fail in another environment where the only difference is LC_CTYPE (or LANG) environment variable value.

Steps To Reproduce

Execute mvn liquibase:updateSQL on changeset source files that have UTF-8 characters in them (in our case, native Swedish and Norwegian character values inside custom SQL changesets), using different terminal LC_TYPE (or LANG) variable values, like this:

LC_CTYPE=nb_NO.UTF-8 mvn liquibase:updateSQL
  • fails with checksum error
LC_CTYPE=UTF-8 mvn liquibase:updateSQL
  • succeeds with no errors

Actual Behavior

In our case, the command that uses LC_CTYPE=nb_NO.UTF-8 (as described above) fails producing the following error in the build output:

[ERROR] Failed to execute goal org.liquibase:liquibase-maven-plugin:4.3.3:updateSQL (default-cli) on project stamina: 
[ERROR] Error setting up or running Liquibase:
[ERROR] Validation Failed:
[ERROR]      2 change sets check sum
[ERROR]           db/migration/4.0.x/4.0.6.sql::SD-495daterangevalidation::mauritz was: 8:43f8396fd8bfa0c53d370374729af8f4 but is now: 8:3324783cfbbbecc4b7686e2f9cbafd80
[ERROR]           db/migration/4.0.x/4.0.13.sql::SD-419::paal was: 8:add472d94b3af98ea97563f05980a6a1 but is now: 8:84bdb044920324ffacc733d22e053b43

One of the changsets that fails with checksum errors is the following one (it contains native norwegian and swedish characters å, æ and ä):

-- changeset mauritz:SD-495daterangevalidation runOnChange:false
-- comment Correction of illegal date values to avoid future validation errors on existing data
UPDATE ansatt SET foedselsdato = '1850-01-01' WHERE foedselsdato IS NOT NULL AND foedselsdato < '1850-01-01';
UPDATE ansatt_aud SET foedselsdato = '1850-01-01' WHERE foedselsdato IS NOT NULL AND foedselsdato < '1850-01-01';
UPDATE ansatt SET foedselsdato = '2100-01-01' WHERE foedselsdato IS NOT NULL AND foedselsdato > '2100-01-01';
UPDATE ansatt_aud SET foedselsdato = '2100-01-01' WHERE foedselsdato IS NOT NULL AND foedselsdato > '2100-01-01';

UPDATE ansatt SET ansettelse_dato = '1850-01-01' WHERE ansettelse_dato IS NOT NULL AND ansettelse_dato < '1850-01-01';
UPDATE ansatt_aud SET ansettelse_dato = '1850-01-01' WHERE ansettelse_dato IS NOT NULL AND ansettelse_dato < '1850-01-01';
UPDATE ansatt SET ansettelse_dato = '2100-01-01' WHERE ansettelse_dato IS NOT NULL AND ansettelse_dato > '2100-01-01';
UPDATE ansatt_aud SET ansettelse_dato = '2100-01-01' WHERE ansettelse_dato IS NOT NULL AND ansettelse_dato > '2100-01-01';

UPDATE ansatt SET slutt_dato = '1850-01-01' WHERE slutt_dato IS NOT NULL AND slutt_dato < '1850-01-01';
UPDATE ansatt_aud SET slutt_dato = '1850-01-01' WHERE slutt_dato IS NOT NULL AND slutt_dato < '1850-01-01';
UPDATE ansatt SET slutt_dato = '2100-01-01' WHERE slutt_dato IS NOT NULL AND slutt_dato > '2100-01-01';
UPDATE ansatt_aud SET slutt_dato = '2100-01-01' WHERE slutt_dato IS NOT NULL AND slutt_dato > '2100-01-01';

UPDATE ansatt SET ansiennitets_dato = '1850-01-01' WHERE ansiennitets_dato IS NOT NULL AND ansiennitets_dato < '1850-01-01';
UPDATE ansatt_aud SET ansiennitets_dato = '1850-01-01' WHERE ansiennitets_dato IS NOT NULL AND ansiennitets_dato < '1850-01-01';
UPDATE ansatt SET ansiennitets_dato = '2100-01-01' WHERE ansiennitets_dato IS NOT NULL AND ansiennitets_dato > '2100-01-01';
UPDATE ansatt_aud SET ansiennitets_dato = '2100-01-01' WHERE ansiennitets_dato IS NOT NULL AND ansiennitets_dato > '2100-01-01';

DELETE FROM resource_message WHERE message_key = 'validation.within_4_centuries';
INSERT INTO resource_message (message_key, message_value, key_bundle) SELECT 'validation.within_4_centuries', 'Datoen kan ikke være mer enn 200 år tilbake eller framover i tid', id FROM resource_bundle WHERE locale = 'no';
INSERT INTO resource_message (message_key, message_value, key_bundle) SELECT 'validation.within_4_centuries', 'The date can not be more than 200 years back or forward in time', id FROM resource_bundle WHERE locale = 'en';
INSERT INTO resource_message (message_key, message_value, key_bundle) SELECT 'validation.within_4_centuries', 'Datumet kan inte vara mera än 200 år tillbaka eller framåt i tid', id FROM resource_bundle WHERE locale = 'sv';
-- rollback ;

We are using the following encoding setting in our Maven pom.xml:

...
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
...

All our source files are stored using UTF-8 encoding.

Expected/Desired Behavior

The checksum calculation should not be affected by client terminal language / locale settings and respect / use the source file encoding settings configured in the Maven project (in our case UTF-8) to ensure stable builds across environments.

@ckulenkampff
Copy link

I am not sure, maybe this is related to #1760 ?

@mauritz-lovgren
Copy link
Author

I am not sure, maybe this is related to #1760 ?

We are using the <include file="<uniquefilenamehere>.sql" relativeToChangelogFile="true"/> tag which does not seem to have the encoding attribute, which is present in the sqlFile tag. (https://docs.liquibase.com/concepts/advanced/include.html) vs (https://docs.liquibase.com/change-types/community/sql-file.html).

@ckulenkampff
Copy link

Ah I see. As far as I can see liquibase.parser.core.sql.SqlChangeLogParser.parse(String, ChangeLogParameters, ResourceAccessor) uses the default file encoding when parsing SQL files. There is no concept of an encoding at all. So I don't think it's just the checksum calculation but the whole file is parsed with the "wrong encoding". I guess the INSERT would contain wrongly rendered characters when you execute it against a fresh database. The wrong checksum is just a symptom of this.

@mauritz-lovgren
Copy link
Author

Yes, I would suspect that the SQL statements are probably affected as well. It seems that the LC_CTYPE affects the encoding used during the Maven build, even though UTF-8 has been specified in the pom.xml. Maybe I need to set other encoding options for Maven / Java as well, like -Dfile.encoding=UTF-8.

@molivasdat
Copy link
Contributor

Hi @mauritz72 Thanks for alerting us to this issue. We will add it to our list of issues to process. In meantime, it looks like there is a workaround to set LC_CTYPE=UTF8 on your environments.

@FBurguer
Copy link

There is a few things on this issue to talk about:

  1. Liquibase ignores <sourceEncoding>
  2. I got this msg that is incorrect: [INFO] Char encoding not set! The created file will be system dependent!
    Its not system dependent but it defaults to UTF-8
  3. It is true that liquibase its looking at LC_CTYPE and it shouldnt.

We are going to investigate this points further.

@FBurguer
Copy link

I got this msg that is incorrect: [INFO] Char encoding not set! The created file will be system dependent! when trying to set LC_CTYPE=nb_NO.UTF-8, we are planning to fix this in the near future.

Current versions of liquibase default to UTF-8 vs. the system charset which was used as a default between 4.0 and about 4.6. To use a different encoding as a default, use the liquibase.fileEncoding setting. If you are still seeing issues with Liquibase not defaulting to UTF-8 and/or not respecting the fileEncoding setting in the current version, let us know.

Thanks!

@FBurguer FBurguer self-assigned this Aug 19, 2022
@nvoxland
Copy link
Contributor

I created #3189 to remove the problem warning, so closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

6 participants