Add support for file names that contain international characters. #200

Merged
merged 3 commits into from Aug 12, 2012

Projects

None yet

5 participants

Contributor

Greetings!

We are using git-tfs on our current project and are very happy with it! However, some of our file names contain Norwegian characters; when cloning and pulling from TFS, these filenames will become correct on disk, but incorrect in the git trees, thus causing git to always list the actual files as untracked. Since the filenames are stored as UTF-8 in the tree files in recent git versions, this patch fixes it and has worked well for us so far.

People who only use ASCII characters in their file names should be completely unaffected by this patch, since any string with only ASCII characters will become the same byte sequence in ASCII and UTF-8.

Contributor
sc68cal commented Aug 8, 2012

Would you be willing to test out #181 as well? I'm tempted to merge both that PR and this and hope that we can put these issues to bed.

Contributor
sc68cal commented Aug 8, 2012

Sorry -I meant #182

Owner
spraints commented Aug 8, 2012

Can you add a test to CloneTests for this?

Contributor

I will if I can get CloneProjectWithChangesets() to work - it fails on my machine because of an incorrect hash (6ac86b02d2a5a51edcd34d6723db3ee9375df254 instead of dd806911118e6fa16d028b322ad91360d56ea47b), even before applying this patch. Do you have any idea about what the cause may be? The test output is shown below.

PATH: C:\Applications\Development\git-tfs\GitTfsTest\bin\Debug;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Applications\Development\Git\cmd;C:\Applications\Development\Git\bin;c:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\;c:\Program Files\Microsoft SQL Server\100\Tools\Binn\;c:\Program Files\Microsoft SQL Server\100\DTS\Binn\;c:\Program Files (x86)\Microsoft ASP.NET\ASP.NET Web Pages\v1.0\;C:\Applications\Internet\PuTTY;c:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\;c:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\;C:\Applications\Development\git-tfs\GitTfs\bin\Debug
>> C:\Applications\Development\git\cmd\git.cmd tfs --debug clone http://does/not/matter $/MyProject

C:\Users\Eldhuset_a\AppData\Local\Temp\tmpCE7F.tmp>@git.exe %*
git command: Starting process: git init
Initialized empty Git repository in C:/Users/Eldhuset_a/AppData/Local/Temp/tmpCE7F.tmp/MyProject/.git/
git command time: [00:00:00.3440000] init
git command: Starting process: git config core.autocrlf false
git command time: [00:00:00.1810000] config core.autocrlf false
git command: Starting process: git config core.ignorecase false
git command time: [00:00:00.0790000] config core.ignorecase false
git command: Starting process: git config --list
git command time: [00:00:00.0880000] config --list
git command: Starting process: git config tfs-remote.default.url http://does/not/matter
git command time: [00:00:00.0400000] config tfs-remote.default.url http://does/not/matter
git command: Starting process: git config tfs-remote.default.repository $/MyProject
git command time: [00:00:00.0410000] config tfs-remote.default.repository $/MyProject
git command: Starting process: git config tfs-remote.default.fetch refs/remotes/default/master
git command time: [00:00:00.0420000] config tfs-remote.default.fetch refs/remotes/default/master
git command: Starting process: git config --list
git command time: [00:00:00.0450000] config --list
Fetching from TFS remote default
git command: Starting process: git log --no-color --pretty=medium refs/remotes/tfs/default
git stderr: fatal: ambiguous argument 'refs/remotes/tfs/default': unknown revision or path not in the working tree.
git stderr: Use '--' to separate paths from revisions
git command time: [00:00:00.0440000] log --no-color --pretty=medium refs/remotes/tfs/default
An error occurred while loading head refs/remotes/tfs/default (maybe it doesn't exist?): Sep.Git.Tfs.Core.GitCommandException: Command exited with error code: 128
   ved Sep.Git.Tfs.Core.GitHelpers.Close(Process process) i C:\Applications\Development\git-tfs\GitTfs\Core\GitHelpers.cs:linje 205
   ved Sep.Git.Tfs.Core.GitHelpers.<>c__DisplayClass8.<CommandOutputPipe>b__7() i C:\Applications\Development\git-tfs\GitTfs\Core\GitHelpers.cs:linje 59
   ved Sep.Git.Tfs.Core.GitHelpers.Time(String[] command, Action action) i C:\Applications\Development\git-tfs\GitTfs\Core\GitHelpers.cs:linje 182
   ved Sep.Git.Tfs.Core.GitHelpers.CommandOutputPipe(Action`1 handleOutput, String[] command) i C:\Applications\Development\git-tfs\GitTfs\Core\GitHelpers.cs:linje 54
   ved Sep.Git.Tfs.Core.GitRepository.GetParentTfsCommits(String head, Boolean includeStubRemotes) i C:\Applications\Development\git-tfs\GitTfs\Core\GitRepository.cs:linje 249
git command: Starting process: git log --no-color --pretty=medium refs/remotes/tfs/default..HEAD
git stderr: fatal: ambiguous argument 'refs/remotes/tfs/default..HEAD': unknown revision or path not in the working tree.
git stderr: Use '--' to separate paths from revisions
git command time: [00:00:00.0350000] log --no-color --pretty=medium refs/remotes/tfs/default..HEAD
An error occurred while loading head refs/remotes/tfs/default..HEAD (maybe it doesn't exist?): Sep.Git.Tfs.Core.GitCommandException: Command exited with error code: 128
   ved Sep.Git.Tfs.Core.GitHelpers.Close(Process process) i C:\Applications\Development\git-tfs\GitTfs\Core\GitHelpers.cs:linje 205
   ved Sep.Git.Tfs.Core.GitHelpers.<>c__DisplayClass8.<CommandOutputPipe>b__7() i C:\Applications\Development\git-tfs\GitTfs\Core\GitHelpers.cs:linje 59
   ved Sep.Git.Tfs.Core.GitHelpers.Time(String[] command, Action action) i C:\Applications\Development\git-tfs\GitTfs\Core\GitHelpers.cs:linje 182
   ved Sep.Git.Tfs.Core.GitHelpers.CommandOutputPipe(Action`1 handleOutput, String[] command) i C:\Applications\Development\git-tfs\GitTfs\Core\GitHelpers.cs:linje 54
   ved Sep.Git.Tfs.Core.GitRepository.GetParentTfsCommits(String head, Boolean includeStubRemotes) i C:\Applications\Development\git-tfs\GitTfs\Core\GitRepository.cs:linje 249
info: refs/remotes/tfs/default: Getting changesets from 1 to current ...
git command: Starting process: git update-index -z --index-info
   U 6fe1070c3fd527ca710d87c47c4055d2766faa41 = Folder/File.txt
   U b86a8ffd1cb40086ff8539e4f8777f8faff30f7e = README
git command time: [00:00:00.0950000] update-index -z --index-info
git command: Starting process: git write-tree
git command time: [00:00:00.0640000] write-tree
git command: Starting process: git commit-tree 41ab05d8f2a0f7f7f3a39c623e94fee68f64797e
git command time: [00:00:00.0450000] commit-tree 41ab05d8f2a0f7f7f3a39c623e94fee68f64797e
git command: Starting process: git update-ref -m C2 refs/remotes/tfs/default 6ac86b02d2a5a51edcd34d6723db3ee9375df254
git command time: [00:00:00.0570000] update-ref -m C2 refs/remotes/tfs/default 6ac86b02d2a5a51edcd34d6723db3ee9375df254
C2 = 6ac86b02d2a5a51edcd34d6723db3ee9375df254
GC Countdown: 0
git command: Starting process: git gc --auto
git command time: [00:00:00.0340000] gc --auto
git command: Starting process: git merge refs/remotes/tfs/default
git command time: [00:00:00.0740000] merge refs/remotes/tfs/default

C:\Users\Eldhuset_a\AppData\Local\Temp\tmpCE7F.tmp>@set ErrorLevel=%ErrorLevel%

C:\Users\Eldhuset_a\AppData\Local\Temp\tmpCE7F.tmp>@rem Restore the original console codepage.

C:\Users\Eldhuset_a\AppData\Local\Temp\tmpCE7F.tmp>@chcp %cp_oem% > nul < nul
Owner
spraints commented Aug 9, 2012

hrm. I guess the commit created by CloneProjectWithChangesets isn't as invariable as I expected. So, there's something about the commit created on your machine that's different from my machine.

I guess a better assertion for it would be that HEAD has certain properties, like an expected tree, expected commit message, etc. I'm not sure when I'll have time to do this. I'll try to get it done within the next week.

Contributor
sc68cal commented Aug 9, 2012

Dang it, wrong issue. I mean #183 ?

Owner
spraints commented Aug 9, 2012

It could.

Contributor

Here's the structure of the repository after running CloneProjectWithChangesets() - can you tell where it goes wrong?

$ git rev-list --objects --all
6ac86b02d2a5a51edcd34d6723db3ee9375df254
41ab05d8f2a0f7f7f3a39c623e94fee68f64797e
933c9ddf9ae05dd3204d76757fe53fe889804d6b Folder
6fe1070c3fd527ca710d87c47c4055d2766faa41 Folder/File.txt
b86a8ffd1cb40086ff8539e4f8777f8faff30f7e README

$ git cat-file -p 6ac86b02d2a5a51edcd34d6723db3ee9375df254
tree 41ab05d8f2a0f7f7f3a39c623e94fee68f64797e
author Unknown TFS user <todo> 1325502732 +0000
committer Unknown TFS user <todo> 1325502732 +0000

First commit

git-tfs-id: [http://does/not/matter]$/MyProject;C2

$ git cat-file -p 41ab05d8f2a0f7f7f3a39c623e94fee68f64797e
040000 tree 933c9ddf9ae05dd3204d76757fe53fe889804d6b    Folder
100644 blob b86a8ffd1cb40086ff8539e4f8777f8faff30f7e    README

$ git cat-file -p 933c9ddf9ae05dd3204d76757fe53fe889804d6b
100644 blob 6fe1070c3fd527ca710d87c47c4055d2766faa41    File.txt

$ git cat-file -p 6fe1070c3fd527ca710d87c47c4055d2766faa41
File contents

$ git cat-file -p b86a8ffd1cb40086ff8539e4f8777f8faff30f7e
tldr
Owner

It looks like a time zone difference.

$ git rev-list --objects --all
dd806911118e6fa16d028b322ad91360d56ea47b
41ab05d8f2a0f7f7f3a39c623e94fee68f64797e
933c9ddf9ae05dd3204d76757fe53fe889804d6b Folder
6fe1070c3fd527ca710d87c47c4055d2766faa41 Folder/File.txt
b86a8ffd1cb40086ff8539e4f8777f8faff30f7e README

$ git cat-file -p dd80691111
tree 41ab05d8f2a0f7f7f3a39c623e94fee68f64797e
author Unknown TFS user <todo> 1325524332 +0000
committer Unknown TFS user <todo> 1325524332 +0000

First commit

git-tfs-id: [http://does/not/matter]$/MyProject;C2
Contributor

Indeed it was (and you are apparently located at UTC minus 5 hours) - DateTime.Parse("2012-01-02 12:12:12 -05:00") made the test pass! I hope to get the time to write the tests in the course of the next week.

Contributor

Turns out I couldn't resist the temptation to do it now... See e79056a and ca490a2.

Contributor

Note that:

  • CloneProjectWithInternationalCharactersInFileNamesAndFolderNames() proves my fix correct
  • CloneProjectWithInternationalCharactersInCommitMessages() proves #182 correct
  • CloneProjectWithInternationalCharactersInFileContents() works without either fix

For all tests, I checked the resulting working directory, and all international characters were handled correctly.

(An improvement that I thought of too late would have been to use characters outside of the Western European ASCII extensions...)

Contributor
sc68cal commented Aug 10, 2012

@aasmundeldhuse You need to rebase your changes.

Contributor

I haven't sent any pull requests before, so I'm not familiar with how it should be done. Currently, the history looks like A[master]---B---C---D[international_filenames], where B, C, and D are my commits. What should the end result look like?

Owner

@sc68cal I assume you're talking about github saying that it can't be automatically merged. it might be solvable by doing a merge from master to @aasmundeldhuset's branch. i'll take a look.

thanks for the tests!

Contributor
sc68cal commented Aug 10, 2012

Your master branch is out of date. Please do a fetch, then rebase your changes on top of the updated master branch.

Contributor
sc68cal commented Aug 10, 2012

@spraints He's got a conflict in the CloneTests.cs that he needs to resolve.

Owner

when i run the test locally, i get an error in CloneProjectWithInternationalCharactersInFileNamesAndFolderNames. It's creating a directory called "A+A~A.", which doesn't match the dir name in the test. Is there something I need to do to make git+windows be able to use the right file name? Or, maybe it's git that isn't configured correctly. Here's my head tree and objects:

$ git ls-tree -r HEAD
100644 blob 6fe1070c3fd527ca710d87c47c4055d2766faa41    "\303\206\303\230\303\205/\303\244\303\266\303\274.txt"

$ git rev-list --objects --all
2697fc1748a13832ef25804ef2d3be65a7cd3129
14f207f532105e6df76cf69d6481d84b9e5b17ad
038503eb8e97ddf6819024d21851e94179ea24e9 ÆØÅ
6fe1070c3fd527ca710d87c47c4055d2766faa41 ÆØÅ/äöü.txt
Owner

i see the conflict. a simple merge of the main repo's master to the international_filenames branch would work, too. rebasing a pushed branch is not good practice. i don't care which you use, but a rebase isn't the only way to do it.

@aasmundeldhuset something like this:

$ git checkout international_filenames
$ git fetch origin # assuming that origin = git://github.com/git-tfs/git-tfs.git
$ git merge origin/master
# merge the conflict
$ git push yourfork international_filenames

Is there something I need to do to make git+windows be able to use the right file name

@spraints Which version of msysgit do you use? see
https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support#wiki-Git_for_Windows_Unicode_Support

Contributor
sc68cal commented Aug 11, 2012

Heads up - #204 also has some work in this area that we'll have to sort out.

@spraints spraints added a commit that referenced this pull request Aug 12, 2012
@spraints spraints Merge #200
Conflicts:
	GitTfsTest/Integration/CloneTests.cs
aba2485
@spraints spraints merged commit e79056a into git-tfs:master Aug 12, 2012
Owner

I was on 1.7.9. I upgraded to 1.7.11, and it's all green now! Thanks, everyone!

Contributor
sc68cal commented Aug 12, 2012

All green on 1.7.10 as well.

Just for information: On 1.7.11 doing a tfs clone of a large repository with the latest version of git-tfs showed strange errors. However merging bb908dd "Add support for file names that contain international characters." on a branch started from v0.14.0 gave me no problems at all. Thanks for this fix!

Contributor

@rogermelet: Glad to hear that someone has already found it useful! Thanks for including the patch, @spraints.

Contributor
sc68cal commented Aug 16, 2012

Awesome - how's that for validation @aasmundeldhuset? Congrats!

Contributor

Hehe - thanks, @sc68cal; I hadn't imagined anyone else benefiting from it this soon... :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment