Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected results on repo with filename case changes #24

Closed
m4tiz5zktonj opened this issue Nov 27, 2019 · 7 comments
Closed

Unexpected results on repo with filename case changes #24

m4tiz5zktonj opened this issue Nov 27, 2019 · 7 comments

Comments

@m4tiz5zktonj
Copy link

Hello.

I've encountered strange results after running filter-repo on repo than contains commits with filename case changes.

I'm using latest stable releases of git, python and filter-repo on Windows 10 1909.

$ git --version
git version 2.24.0.windows.2

$ python --version
Python 3.8.0

$ python git_filter_repo.py --version
f3e8e0f8a87c

Steps to reproduce:

  1. Initialize new repo.
git init test
cd test
  1. Create test.txt and trash.txt files.
echo test > test.txt
echo trash > trash.txt
git add .
git commit -m "add test.txt and trash.txt"
  1. Rename test.txt to Test.txt. Has to do it via multiple renaming and amending, because I don't know another ways to do that, at least on Windows.
git mv test.txt Test1.txt
git commit -m "rename test.txt -> Test.txt"
git mv Test1.txt Test.txt
git commit --amend --no-edit
  1. Remove trash.txt file.
git rm trash.txt
git commit -m "remove trash.txt"
  1. Modify Test.txt.
echo more >> Test.txt
git add .
git commit -m "add more to Test.txt"
  1. Rename Test.txt to test.txt back.
git mv Test.txt test2.txt
git commit -m "rename Test.txt -> test.txt"
git mv test2.txt test.txt
git commit --amend --no-edit
  1. Modify test.txt again.
echo new >> test.txt
git add .
git commit -m "add new to test.txt"
  1. Show the history.
git log --oneline -p

It should be similar to this:

f7e0103 (HEAD -> master) add new to test.txt
diff --git a/test.txt b/test.txt
index 9fbef1c..eae8904 100644
--- a/test.txt
+++ b/test.txt
@@ -1,2 +1,3 @@
 test
 more
+new

715bc1e rename Test.txt -> test.txt
diff --git a/Test.txt b/test.txt
similarity index 100%
rename from Test.txt
rename to test.txt

799cdea add more to Test.txt
diff --git a/Test.txt b/Test.txt
index 9daeafb..9fbef1c 100644
--- a/Test.txt
+++ b/Test.txt
@@ -1 +1,2 @@
 test
+more

695b3d9 remove trash.txt
diff --git a/trash.txt b/trash.txt
deleted file mode 100644
index fad67c0..0000000
--- a/trash.txt
+++ /dev/null
@@ -1 +0,0 @@
-trash

f375c0e rename test.txt -> Test.txt
diff --git a/test.txt b/Test.txt
similarity index 100%
rename from test.txt
rename to Test.txt

be4133d add test.txt and trash.txt
diff --git a/test.txt b/test.txt
new file mode 100644
index 0000000..9daeafb
--- /dev/null
+++ b/test.txt
@@ -0,0 +1 @@
+test
diff --git a/trash.txt b/trash.txt
new file mode 100644
index 0000000..fad67c0
--- /dev/null
+++ b/trash.txt
@@ -0,0 +1 @@
+trash
  1. Show the tree's id and contents of HEAD commit.
$ git log --format=raw -1 | grep tree | cut -d ' ' -f 2
b90b63e43b3accb1add5108e94f8f394bf4f4146

$ git ls-tree $(git log --format=raw -1 | grep tree | cut -d ' ' -f 2)
100644 blob eae8904154c5ee09ed95ad74668597f83b8059fc    test.txt
  1. Now run filter-repo to completely remove trash.txt from history.
python git_filter_repo.py --path trash.txt --invert-paths --force

Expected results:

  • trash.txt is completely gone from history;
  • tree object of current HEAD is the same as it was before running filter-repo;
  • test.txt is in tree of HEAD and its history contains single lines additions in appropriate commits.

Actual results:

  • trash.txt is completely gone from history: everything is OK here;
  • tree object of current HEAD differs: see below;
  • test.txt is not in tree of HEAD, it is replaced with Test.txt and its history contains multiple lines additions: have a look at "rename Test.txt -> test.txt" and "rename Test.txt -> test.txt" commits diffs below.

Here are git log and git ls-tree outputs on modified repo:

a73a788 (HEAD -> master) add new to test.txt
diff --git a/Test.txt b/Test.txt
index 9fbef1c..eae8904 100644
--- a/Test.txt
+++ b/Test.txt
@@ -1,2 +1,3 @@
 test
 more
+new

f1af009 rename Test.txt -> test.txt
003e481 add more to Test.txt
diff --git a/Test.txt b/Test.txt
new file mode 100644
index 0000000..9fbef1c
--- /dev/null
+++ b/Test.txt
@@ -0,0 +1,2 @@
+test
+more

2369434 rename test.txt -> Test.txt
diff --git a/test.txt b/test.txt
deleted file mode 100644
index 9daeafb..0000000
--- a/test.txt
+++ /dev/null
@@ -1 +0,0 @@
-test

a5ec46d add test.txt and trash.txt
diff --git a/test.txt b/test.txt
new file mode 100644
index 0000000..9daeafb
--- /dev/null
+++ b/test.txt
@@ -0,0 +1 @@
+test
$ git log --format=raw -1 | grep tree | cut -d ' ' -f 2
b90b63e43b3accb1add5108e94f8f394bf4f4146

$ git ls-tree $(git log --format=raw -1 | grep tree | cut -d ' ' -f 2)
100644 blob eae8904154c5ee09ed95ad74668597f83b8059fc    Test.txt

I think, such behavior of filter-repo is not intended. Or am I missing something?

Thanks in advance.

@newren
Copy link
Owner

newren commented Nov 28, 2019

What's the output of git config core.ignorecase? I wonder if I need to override that and set it to false...

@m4tiz5zktonj
Copy link
Author

$ git config --local core.ignoreCase
true

$ git config --global core.ignoreCase
<empty output>

I haven't changed it so it falls back to default values for git installation and for .git folder template for new repositories.

@newren
Copy link
Owner

newren commented Nov 28, 2019

Yeah, that's what I expected. When you clone a repository, git attempts to check if the filesystem is case-insensitive and sets that config value at clone time for the local repo accordingly.

It looks like git in config.c sets ignore_case based on the setting of core.ignoreCase, and then fast-import.c uses fspathncmp() to compare entries, which when ignore_case is true translates to strncasecmp(). That means fast-import is treating two files within a single commit as the same file. fast-export will emit both a delete-old-file and create-new-file-with-same-contents directive (likely in alphabetical order of the filenames) whenever it sees a rename, but fast-import is basically ignoring whichever of those directives came first since it treats them as the same file and considers the second as an override.

Can you try unsetting the local setting of core.ignoreCase (git config --unset core.ignoreCase), then running filter-repo, then setting core.ignoreCase again (git config core.ignoreCase true)?

@m4tiz5zktonj
Copy link
Author

Yes, you're right.
With unset core.ignoreCase everything works fine on test repo: tree is the same and git log is correct.
On my real giant repo everything seem OK too, at least tree as I expected has not changed and history of several problematic files looks normal.
Thank you very much!

@vfarafonov
Copy link

Would be great to add this to the manual or readme. Or even embed to the script.

I am using git-filter-repo as part of migration of two android repositories to the monorepo. Had an issues with project building on CI server (hosted on Linux) while it was running fine on my local machine (Mac OS). Had to spend almost two days to figure out that somewhere in the middle directory name was upper case while it is lowercase in the original repo.

@newren
Copy link
Owner

newren commented Dec 4, 2019

Would be great to add this to the manual or readme. Or even embed to the script.

I am using git-filter-repo as part of migration of two android repositories to the monorepo. Had an issues with project building on CI server (hosted on Linux) while it was running fine on my local machine (Mac OS). Had to spend almost two days to figure out that somewhere in the middle directory name was upper case while it is lowercase in the original repo.

Sorry that insane broken-by-design filesystems caused you so much pain, it's a pity we can't just get rid of those and instead have to waste so many engineering cycles working around them. But, since they won't be going away, it does make sense to try to work around them where practical. Anyway, embedding the workaround in git-filter-repo is precisely the plan; this issue was left open as a reminder to do that.

@newren
Copy link
Owner

newren commented Dec 27, 2019

Okay, I added '-c core.ignorecase=false' to the command line for the git fast-import invocation, which should prevent anyone else from running into this issue (well, assuming they're using a the version from master or some future release of filter-repo).

Thanks for the detailed report, @m4tiz5zktonj !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants