Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore line endings when calculating checksums #253

Closed
flyway opened this issue Jun 25, 2013 · 36 comments
Closed

Ignore line endings when calculating checksums #253

flyway opened this issue Jun 25, 2013 · 36 comments

Comments

@ghost
Copy link
Collaborator

@ghost ghost commented Jun 25, 2013

Original author: micha...@utest.com (September 28, 2011 15:49:50)

We ran into the following problem.

Developer on Mac/Linux creates a sql migration file and checks it into Git. It runs on production (which is on Linux) just fine.

Then a Windows developer takes a copy of the production database and loads it into their local machine. But when they pulled the sql migration file from Git, it automatically converted the line endings to CRLF (Windows style). Now Flyway validate and migrate fail saying there is a checksum mismatch.

One solution I can think of would be to ignore line endings when doing the checksum.

Original issue: http://code.google.com/p/flyway/issues/detail?id=167

@ghost
Copy link
Collaborator Author

@ghost ghost commented Jun 25, 2013

From axel.fontaine.business@gmail.com on January 02, 2012 07:04:09
There are two workarounds for this:

  • Don't use Git with core.autocrlf
  • Set the checksum to null on the DEV DB after the restore from PROD

Would any of these be a workable solution for you?

@ghost
Copy link
Collaborator Author

@ghost ghost commented Jun 25, 2013

From micha...@utest.com on January 03, 2012 15:02:13
Hi Axel,

Thanks for looking into it.

The workaround that we went with was updating our Maven configuration to always convert line endings to unix style on all of our db scripts before running Flyway on them. It is working fine for us, so this is not an important issue any more.

@flyway flyway closed this Jun 25, 2013
@emmanueltouzery
Copy link

@emmanueltouzery emmanueltouzery commented Oct 23, 2014

would you accept a patch (on which I didn't even start looking into) that would change flyway so that it would compute the checksum on the SQL string with all the \r removed? That would effectively fix the problem completely and allow the checksums to perform their purpose (check which SQL queries were run) instead of making sure the companies using the tool use a single OS or have a perfect git autocrlf setup :-)

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Oct 23, 2014

The only kind of patch I would I accept, would have to be transparent for existing users. Forcing them to recaclculate the checksums or perform other manual actions would not be acceptable.

@emmanueltouzery
Copy link

@emmanueltouzery emmanueltouzery commented Oct 23, 2014

We can think of two possible implementations:

  1. new option, off by default, that computes checksums without \r and \n. Requires resetting existing checksums in DBs or to apply only on new databases.
  2. change the default path, no option: instead of comparing the stored checksum to the checksum of the just-applied migration, we'd compare the stored checksum to the checksum of the just-applied migration, the just-applied migration with carriage returns forced to CR and the just-applied migration with carriage returns forced to CRLF, accepting any of those as valid.

the first solution would also "fix" a case we actually spotted, that a developer made a migration with mixed carriage returns, having both CR and CRLF in a single file. But it does break backwards compatibility and introduces a new codepath through that option.

Would you accept any of those?

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Oct 23, 2014

To be honest, neither sounds very appealing.

  1. I'd like to keep the number of configuration options to a minimum, replacing themm with intelligent solutions where possible
  2. This sounds like a recipe for disaster with edge-cases (current db checksums with mixed line-endings, ...)

What I may accept would look like this:

  • checksums are always computed after stripping \r
  • on first run with this new functionality (part of 4.0 upgrade?), checksums in DB are aligned to those in file system. This must be rock solid to avoid major disruptions.

Cheers
Axel

@emmanueltouzery
Copy link

@emmanueltouzery emmanueltouzery commented Oct 23, 2014

By the way, note that the git crlf options do not in fact fully solve the problem. The reason is that in the repository and for developers which pull the migrations, the CRLF will be fixed. But git does not overwrite the file that the author wrote on his own disk. And if he runs flyway migrate from his own machine as the author, git offers us no guarantees as to which line ending he has or not.

@emmanueltouzery
Copy link

@emmanueltouzery emmanueltouzery commented Oct 23, 2014

That's indeed the best solution, it does sounds like more work than what we envisioned. Any ETA for 4.0 so that we consider our options?

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Oct 23, 2014

6 months from now more or less...

On Thu, Oct 23, 2014 at 10:41 AM, emmanueltouzery notifications@github.com
wrote:

That's indeed the best solution, it does sounds like more work than what
we envisioned. Any ETA for 4.0 so that we consider our options?


Reply to this email directly or view it on GitHub
#253 (comment).

Boxfuse GmbH, Groffstraße 14, 80638 München, Deutschland
USt-IdNr.: DE284939966, St.Nr.: 143/121/71601
Geschäftsführer: Axel Fontaine, Handelsgericht: München, HRB 200564

@emmanueltouzery
Copy link

@emmanueltouzery emmanueltouzery commented Oct 23, 2014

Given the scope and the late reward, not sure we'll do it, but it's good to have a plan and we'll see. Thank you!

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Oct 23, 2014

You also have the option to sponsor the development of this feature, if you
would like me to do the work. Besides generally helping the project, you
would get it sooner, without the need to analyse, design, develop, test,
maintain and document this feature yourself. Feel free to contact me by
email (axel@boxfuse.com) if you are interested.

On Thu, Oct 23, 2014 at 10:42 AM, Axel Fontaine axel@boxfuse.com wrote:

6 months from now more or less...

On Thu, Oct 23, 2014 at 10:41 AM, emmanueltouzery <
notifications@github.com> wrote:

That's indeed the best solution, it does sounds like more work than what
we envisioned. Any ETA for 4.0 so that we consider our options?


Reply to this email directly or view it on GitHub
#253 (comment).

Boxfuse GmbH, Groffstraße 14, 80638 München, Deutschland
USt-IdNr.: DE284939966, St.Nr.: 143/121/71601
Geschäftsführer: Axel Fontaine, Handelsgericht: München, HRB 200564

Boxfuse GmbH, Groffstraße 14, 80638 München, Deutschland
USt-IdNr.: DE284939966, St.Nr.: 143/121/71601
Geschäftsführer: Axel Fontaine, Handelsgericht: München, HRB 200564

@sebdotv
Copy link

@sebdotv sebdotv commented Nov 10, 2014

sadly, I too was expecting Flyway to ignore EOL when computing checksums

@axelfontaine axelfontaine reopened this Nov 10, 2014
@axelfontaine axelfontaine added this to the Flyway 4.0 milestone Nov 10, 2014
@kylecordes
Copy link

@kylecordes kylecordes commented Feb 1, 2015

I am really surprised to run in to this problem - it is exactly the sort of thing I expected to "Just Work".

I am even more surprised to see resistance to fixing it!

Hopefully the suggest workaround of converting all files to LF-only, before letting Flyway see them, won't be too awful to hack together.

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Feb 1, 2015

Hi Kyle,

you may be surprised and you're probably thinking that since it's only a matter of adding script.replace("\r", "") it should be fixed in an instant. However there is more than meets the eye. This seemingly trivial fix invalidates the metadata table on every single Flyway installation out there. And so great care must be taken to make this transition as smooth as possible, with no surprises. And since Flyway follows the tried and true principles of semantic versioning, it is something that'll have to wait until the next major release.

Cheers
Axel

@kylecordes
Copy link

@kylecordes kylecordes commented Feb 1, 2015

While that smallest fix has the problem above, the software is already completely broken in the way you describe for anyone working across platforms. So while invalidating metadata is a bad thing, it is already invalidated for lots of people.

There is an easy fix could be made, even without a major version: add an optional setting that brings in the new behavior now. Well I very much agree with the bias against adding settings, a "stop being completely broken on cross platform projects" setting would be welcomed, even by the most vigorous opponent of settings.

There is also a more involved approach, which avoids the setting and produces immediately good results: start calculating the checksum both ways, and accept either is a match. This would be perfectly backward-compatible, so it would not require a major version, and it would fix the problem without invalidating any old metadata.

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Feb 2, 2015

Hi Kyle,

Truth be told the problem is really Git's "convenient" autocrlf feature. However, since it dominates the market, I agree we need to compensate for it within Flyway, so that users have a good experience.

Adding a temporary band-aid property in 3.2 is not an option. And unfortunately your last suggestion, which sounds very good on paper, is not fail-proof either. Example:

A windows & a linux user share the same DB. The windows user migrates it first with CRLF files. The linux user can now never validate the migrations. That is unless we go further and re-add the CRs in the migration files. But then by that point I must say, let's all just wait until April when this is fixed properly in 4.0 :-)

Cheers
Axel

@rallengrc
Copy link

@rallengrc rallengrc commented Aug 27, 2015

There are two workarounds for this:
Don't use Git with core.autocrlf
Set the checksum to null on the DEV DB after the restore from PROD

I set the checksums to null and now its more broken than before I did that. What can I do to get the checksums recalculated? I'm running flyway from a WAR file on deployment and it breaks the app since it won't start.

@rallengrc
Copy link

@rallengrc rallengrc commented Aug 27, 2015

@IRus Thanks, I did not realize that command could be run from WAR file but it did work with beans:bean class="org.flywaydb.core.Flyway" init-method="repair" id="flyway" bean entry in my spring config

jsklassen pushed a commit to jsklassen/flyway that referenced this issue Sep 16, 2015
@liefke
Copy link

@liefke liefke commented Oct 23, 2015

If I understand this thread correctly: The only reason that prevents fixing this issue is backward compatibility.

It seems you need to think about a strategy for working with backward compatibility anyway:

  • Either keep the structure and format of schema_version for all times
  • Or allow changes for major versions and provide a migration tool
  • Or add a flyway_version column to schema_version which helps to identify the format of the corresponding row

From my point of view the first option is the worst one, because it prevents improvements. I'm indifferent about the last two, because the migration tool would need an explicit action but keeps the remaining code clean from any "historic garbage", while the last one could help to build automatic migration or version switches.

@hohwille
Copy link

@hohwille hohwille commented Dec 15, 2015

I also ran into this problem and am quite annoyed. Accepting #1031 is better than doing nothing.
Even better would be to add a property to disable ignoring of CR so one could gain backward compatibility and add this to the release notes.

@mymasse
Copy link

@mymasse mymasse commented Dec 18, 2015

Running into this problem just now... Some devs use Windows and some Linux having a flag to make it ignore CR would be the best solution... And like previous commenters would make it backwards compatible.

@valery-barysok
Copy link

@valery-barysok valery-barysok commented Dec 29, 2015

Running into this issue too...Sadly that it has no option for resolving this issue. And everyone have to do ugly fixes for that. For instance, put fixed SqlMigrationResolver on classpath before flyway or repack flyway during build process.

I vote for internal option (that will be removed in next version) that can ignore LE.

@mac2000
Copy link

@mac2000 mac2000 commented Jan 2, 2016

Wondering can it be done in some kind of pluggable way. E.g. will be nice if I will be able to put some kind of jar somewhere with custom checksum calculation to override native one.

In my case I do want to strip not only new line symbols but also all space symbols before checksum calculation to avoid tab vs. space indentation in different editors.

@kylecordes
Copy link

@kylecordes kylecordes commented Jan 2, 2016

Stripping all whitespace is great - clearly better than stripping only just CR.

This project has a bunch of recent commits, maybe there is hope for this long-known issue!

axelfontaine added a commit to flyway/flywaydb.org that referenced this issue Jan 20, 2016
@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Jan 20, 2016

Fixed. Line-endings are ignored and individual lines are stripped of leading and trailing whitespace before calculating the checksum. The upgrade to 4.0 will force a full checksum recalculation on existing metadata tables.

@mac2000
Copy link

@mac2000 mac2000 commented Jan 23, 2016

@axelfontaine it is great news but to be fair that fix not going to solve issue (at least if we are speaking about white space symbols), for example if you are using something like SQL Management Studio - it will format your code in one way, but if in same time I will use something like DataGrip it will reformat code in another way. For example code like this:

CREATE TABLE Acme (
....Username.NVARCHAR(100),
....Age.INT
)

may become

CREATE TABLE Acme (
....Username.NVARCHAR(100),
....Age......INT
)

and checksums will be broken.

To me its seems that all whitespace symbols should be trimmed before calculating checksums rather than forcing people to use tabs/spaces/editorconfigs

@mymasse
Copy link

@mymasse mymasse commented Jan 25, 2016

I don't agree with you @mac2000
this:

insert into table values ('....Test....');

Is not the same as:

insert into table values ('Test');

You can't blindly trim all whitespace. Your problem is that both formatters are not the same. This fix resolve GIT line endings being different on Windows vs Linux.

@mac2000
Copy link

@mac2000 mac2000 commented Jan 25, 2016

@mymasse you are totally right, but that can be respected while trimming all other whitespaces, and sorry for offtopic just wanted to share my thoughts of how can checksum calculation be improved.

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Feb 1, 2016

Come to think of it, I have now also removed the line trimming as it could potentially lead to different multiline string literals have the same checksum. Think of this issue as a workaround for the Git autocrlf "feature".

And yes, in an ideal world, Flyway would semantically parse the SQL file and ignore differences in formatting, spaces vs tabs, header comments (but not those inside Oracle packages and not the MySQL comment directives), complete vs concatenated strings with the same contents, etc.

However to avoid opening up a bottomless pit, for Flyway 4.0 this will be an autocrlf workaround and nothing more.

@mymasse
Copy link

@mymasse mymasse commented Feb 1, 2016

@axelfontaine Wow missed that one from your original comment, thought you fixed it by only ignoring line endings... By the way when roughly can we expect version 4.0? Couldn't find any info about a release date

@hohwille
Copy link

@hohwille hohwille commented Feb 1, 2016

However to avoid opening up a bottomless pit, for Flyway 4.0 this will be an autocrlf workaround and nothing more.

Feedback: This is totally fine with me. One can use additional tools to keep SQLs normalized. You could also say the normalizing EOL can be done by tools such as git. However, this is a very common pitfall and can easily happen when building on different platforms (one release on windows and one on linux). As this one is easy to fix, it is great that flyway is doing so and saving lots of headaches in the future.
In an ideal world it would also be nice to not take pointless whitespace changes or comment changes into checksum but at least this should be out of scope for this issue that needs a more urgent fix.

@ctapobep
Copy link

@ctapobep ctapobep commented Feb 25, 2016

Waiting to see Flyway 4.0, this fix will be very helpful :)

@kiview
Copy link

@kiview kiview commented Mar 1, 2016

How does this fix perform when upgrading to 4.0? Will it still be compatible to existing checksums?

@axelfontaine
Copy link
Contributor

@axelfontaine axelfontaine commented Mar 1, 2016

@kiview All checksums are automatically recalculated and updated with the new algorithm on first run

@steliomo
Copy link

@steliomo steliomo commented Mar 7, 2018

Hi I simple fix this issue by running flyway:repair then flyway:migrate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet