Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize to LF in the repository. New policy: files are binary unless #2323

Closed
wants to merge 2 commits into from
Closed

Conversation

ghost
Copy link

@ghost ghost commented May 28, 2013

explicitly marked as text in .gitattributes.

@johnbartholomew
Copy link
Contributor

Python and Perl scripts should also be text.

@johnbartholomew
Copy link
Contributor

Possibly also Objective-C files (*.m) of which we have one for the OS X build.

Given the problems last time I'd like to be extra-careful with this, which may require some coordination. I'd like sign-off from two people (the merger and one other) on the actual final commits. Preferably to check this: apply the gitattributes file yourself and do your own normalisation and check that you end up with the same tree as the PR. The merge should then be done as a fast-forward to guarantee no conflicts or weirdness, and so that master ends up at the exact state that has been signed-off.

When we've confirmed that the gitattributes file itself is configured the way we want, then the commits will need to be remade on top of master to allow for the fast-forward merge.

/cc @robn @laarmen @Luomu

@ghost
Copy link
Author

ghost commented May 29, 2013

The marking and the normalization steps are separate. To avoid any confusion, I've made a list of files which do have CRs, using this command:

rgrep -la --exclude-dir=.git \
  --exclude='*.[pP][nN][gG]' \
  --exclude='*.ogg' \
  --exclude='*.blend' \
  --exclude='*.[ot]tf' \
  --exclude='*.ico' \
  --exclude='*.bmp' \
  --exclude='*.dll' \
  --exclude='*.lib' \
  --exclude='*.icns' \
  --exclude='*.hmap' \
  $(printf \\r) .

These are the only files that can possibly need normalization. No file outside that list should appear in the normalization commit. Every file marked as text in .gitattributes and present in that list should appear.

Here's the list:

./win32/vc2012/pioneer.sln
./win32/vc2012/jenkins/jenkins.vcxproj
./win32/vc2012/pioneer.vcxproj.filters
./win32/vc2012/lua.vcxproj
./win32/vc2012/collider/collider.vcxproj
./win32/vc2012/jobswarm/jobswarm.vcxproj
./win32/vc2012/jobswarm/jobswarm.vcxproj.filters
./win32/vc2012/gui/gui.vcxproj
./win32/vc2012/terrain/terrain.vcxproj.filters
./win32/vc2012/terrain/terrain.vcxproj
./win32/vc2012/text/text.vcxproj
./win32/vc2012/text/text.vcxproj.filters
./win32/vc2012/graphics/graphics.vcxproj
./win32/vc2012/newmodel/newmodel.vcxproj
./win32/vc2012/newmodel/newmodel.vcxproj.filters
./win32/vc2012/pioneer.vcxproj
./win32/vc2012/galaxy/galaxy.vcxproj
./win32/vc2012/perlintest.vcxproj
./win32/vc2012/gameui/gameui.vcxproj.filters
./win32/vc2012/gameui/gameui.vcxproj
./win32/vc2012/ui/ui.vcxproj.filters
./win32/vc2012/ui/ui.vcxproj
./win32/vc2012/perlintest.vcxproj.filters
./win32/vc2012/miniz.vcxproj
./win32/vc2008/pioneer-msvc-9.0.sln
./win32/vc2008/perlintest.vcproj
./win32/vc2008/sbreviewer-msvc-9.0.vcproj
./win32/vc2008/pioneer-msvc-9.0.vcproj
./win32/uitest/uitest.vcxproj
./win32/uitest/uitest.vcxproj.filters
./win32/vc2010/Release.props
./win32/vc2010/pioneer.sln
./win32/vc2010/jenkins/jenkins.vcxproj
./win32/vc2010/jenkins/jenkins.vcxproj.filters
./win32/vc2010/pioneer.vcxproj.filters
./win32/vc2010/lua.vcxproj
./win32/vc2010/Libs_Pioneer.props
./win32/vc2010/collider/collider.vcxproj
./win32/vc2010/collider/collider.vcxproj.filters
./win32/vc2010/gui/gui.vcxproj.filters
./win32/vc2010/gui/gui.vcxproj
./win32/vc2010/terrain/terrain.vcxproj.filters
./win32/vc2010/terrain/terrain.vcxproj
./win32/vc2010/text/text.vcxproj
./win32/vc2010/text/text.vcxproj.filters
./win32/vc2010/graphics/graphics.vcxproj
./win32/vc2010/graphics/graphics.vcxproj.filters
./win32/vc2010/newmodel/newmodel.vcxproj
./win32/vc2010/newmodel/newmodel.vcxproj.filters
./win32/vc2010/PreRelease.props
./win32/vc2010/Debug.props
./win32/vc2010/pioneer.vcxproj
./win32/vc2010/galaxy/galaxy.vcxproj.filters
./win32/vc2010/galaxy/galaxy.vcxproj
./win32/vc2010/gameui/gameui.vcxproj.filters
./win32/vc2010/gameui/gameui.vcxproj
./win32/vc2010/Libs_Thirdparty_Debug.props
./win32/vc2010/ui/ui.vcxproj.filters
./win32/vc2010/ui/ui.vcxproj
./win32/vc2010/pch.props
./win32/vc2010/miniz.vcxproj
./win32/vc2010/Libs_Thirdparty.props
./win32/vc2010/common.props
./Quickstart.txt
./src/WorldView.h
./AUTHORS.txt
./Modelviewer.txt
./README.txt
./contrib/miniz/miniz.h
./contrib/miniz/miniz.c
./licenses/GPL-3.txt
./Changelog.txt
./data/music/README.music.txt
./data/ships/dsminer.lua
./data/models/stations/new_ground/landPad6_mesh.DAE
./data/models/stations/new_ground/landPad6_mesh_lores.DAE
./data/models/stations/new_ground/landPad6_collision.DAE
./data/models/stations/big_crappy_spacestation/big_crappy_spacestation.model
./data/models/stations/hoop_spacestation/collision.obj
./data/models/ships/dsminer/DSMinerHullLOD2.obj
./data/models/ships/dsminer/DSMinerHullLOD1.obj
./data/models/ships/dsminer/DSMinerDecalLOD2.obj
./data/models/ships/dsminer/DSMinerHull.obj
./data/models/ships/dsminer/DSMinerCollision.obj
./data/models/buildings/kcity/kbuilding03.model
./data/models/buildings/kcity/kbuilding01.model
./data/models/buildings/kcity/kbuilding02.model
./data/models/buildings/vlastan/newbuilding1.model
./data/models/buildings/vlastan/newbuilding8.model
./data/models/buildings/vlastan/newbuilding4.model
./data/models/buildings/vlastan/newbuilding9.model
./data/models/buildings/vlastan/newbuilding11.model
./data/models/buildings/vlastan/newbuilding5.model
./data/models/buildings/vlastan/newbuilding3.model
./data/models/buildings/vlastan/newbuilding2.model
./data/models/buildings/vlastan/newbuilding6.model
./data/models/buildings/vlastan/newbuilding7.model
./data/models/buildings/vlastan/newbuilding10.model
./data/lang/Hrvatski.txt
./data/lang/Russian.txt
./data/lang/Spanish.txt
./data/lang/Magyar.txt
./data/lang/Deutsch.txt
./data/lang/English.txt
./data/lang/Catala.txt
./data/fonts/sdf_definition.txt

That pretty much covers the normalization part, which depends on what files are marked as text, that is, the contents of the .gitattributes file.

Now there's an argument against *.vcproj, *.vcxproj, *.vcxproj.filters, *.sln and *.props files (collectively, textual VC files), on one hand, and against .obj and .DAE files (collectively, textual 3D files) being marked as text. I disagree with the arguments.

Textual VC files matter little, as it's extremely unlikely that someone under an OS different than Windows will use or modify them, unless there's some sort of cross-compiler or something like that that uses/generates them. But the argument was that merge conflicts are almost impossible to resolve, and that happens either with binary or with text files, so that's not a reason to exclude them. Merge conflicts in that case should be handled by either overwriting or ditching the whole file, just as with binary files. The text attribute in .gitattributes only influences line endings, not conflict handling. That's an argument for using the merge attribute on them; it does not have to do with the text attribute. I haven't looked into the merge attribute in depth yet.

As for textual 3D files, I don't think that they should be kept as binary. The idea of a file being text is that it is checked out from the repository in the OS's native line ending format. Currently, there are .obj files with CRs and .obj files without CRs in the repository, as well as .DAE files with CRs and .DAE files without CRs. In the current situation, if someone regenerates from Linux a .DAE or .obj file that was generated under Windows, git will interpret that every line in the whole file has changed, when possibly only a minuscule part did actually change. And same if a Linux-saved file is regenerated from Windows. So, by checking out the files in the OS's native format, git would work properly with them, minimizing the differences, without interfering with normal workflow.

An argument can be made that an .obj file could potentially be a binary file if Intel OMF files are ever included in the repository (that's an unfortunate extension clash). It's highly unlikely that OMF files are ever included, but if that's to be prevented, .obj files could be marked as text=auto. Wavefront .obj files will surely be correctly interpreted by git as text, as there's little chance that a character outside the printable ASCII range or CR/LF appears there, not even an Unicode one. And Intel OMF .obj files have records with a type indicator that is always above 0x80, so it's highly unlikely to be ever interpreted as text.

Now, as for what files should be marked as text, I've also made a list of the extensions excluding the same binary files as above, and the extensionless files present in the repository, using this command:

find * ! -type d \
  ! -iname '*.png' \
  ! -name '*.ogg' \
  ! -name '*.blend' \
  ! -name '*.[ot]tf' \
  ! -name '*.ico' \
  ! -name '*.bmp' \
  ! -name '*.dll' \
  ! -name '*.lib' \
  ! -name '*.icns' \
  ! -name '*.hmap' \
  | sed -r 's#^.*/([^/]*)$#\1#' \
  | sed -r 's#^[^.]*\.#.#' \
  | sort | uniq

(The first sed line removes the path; the second sed line removes everything before the first dot only if there's a dot. I've separated them just for clarity when reviewed.)

Conservatively, that command removes up to the first dot, not up to the last. The list doesn't include dotfiles in the top dir, mainly to prevent find from recursing into .git. Here's the result:

.0.sln
.0.txt
.0.vcproj
.1.txt
.DAE
.ac
.am
.c
.common
.cpp
.css
.dae
.exclude
.frag
.frag.glsl
.gitignore
.glsl
.h
.hpp
.inc.h
.ini
.install
.lua
.m
.model
.mtl
.music.txt
.obj
.pbxproj
.pch
.pl
.plist
.props
.py
.rc
.rtf
.sh
.sln
.strings
.svg
.txt
.vcproj
.vcxproj
.vcxproj.filters
.vert
.vert.glsl
.xcworkspacedata
.xib
README
bootstrap
changelog
compat
control
copyright
docs
format
generic-exec
pioneer
rules

After reviewing them one by one, I think that every file in that list should be marked as text, excluding the dupes (.0.sln, .0.txt, .0.vcproj, .1.txt, .frag.glsl, .vert.glsl, .music.txt, .inc.h).

The proposed .gitattributes would thus be:

* -text
README         text
bootstrap      text
changelog      text
compat         text
control        text
copyright      text
docs           text
format         text
generic-exec   text
pioneer        text
rules          text
.gitignore     text
.gitattributes text

*.ac         text
*.am         text
*.c          text
*.common     text
*.cpp        text
*.css        text
*.DAE        text
*.dae        text
*.exclude    text
*.frag       text
*.glsl       text
*.h          text
*.hpp        text
*.ini        text
*.install    text
*.lua        text
*.m          text
*.model      text
*.mtl        text
*.obj        text=auto
*.pbxproj    text
*.pch        text
*.pl         text
*.plist      text
*.props      text
*.py         text
*.rc         text
*.rtf        text
*.sh         text
*.sln        text
*.strings    text
*.svg        text
*.txt        text
*.vcproj     text
*.vcxproj    text
*.vcxproj.filters text
*.vert       text
*.xcworkspacedata text
*.xib        text

Followed by a normalization commit that includes the whole list of files listed at the beginning.

[Edited for grammar]

@ghost
Copy link
Author

ghost commented May 29, 2013

After an in-depth review of https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html I think that the first line should instead say:

* binary

which is equivalent to: * -text -diff -merge.

The page also says that

When more than one pattern matches the path, a later line overrides an earlier line. This overriding is done per attribute.

Since overriding is done per attribute, and only the text attribute is declared, the above .gitattributes is wrong, as it doesn't include diff and merge. And to avoid marking each file as 'text diff merge', we should also make a custom attribute for our text files, which following the example in the DEFINING MACRO ATTRIBUTES section, should probably look like this:

[attr]pitext text diff merge

Then we can change the rest of lines to include pitext instead of text.

If merge conflicts are a concern for textual VC files, their lines can be changed to:

*.props      pitext -merge
*.sln        pitext -merge
*.vcproj     pitext -merge
*.vcxproj    pitext -merge
*.vcxproj.filters pitext -merge

Pedro Gimeno added 2 commits May 29, 2013 21:57
…attributes

Modified as per discussion of PR #2323.
The list of committed files matches the list of files that need normalization posted in PR #2323.
@ghost
Copy link
Author

ghost commented May 29, 2013

The PR is amended as described. The normalized files list, which was created by git following the procedure in http://stackoverflow.com/questions/1510798/trying-to-fix-line-endings-with-git-filter-branch-but-having-no-luck/1511273#1511273 , matches exactly the list of files having CRs obtained with rgrep (compared after some grep/sort/cut trickery). Note that that list was obtained from a fresh clone with core.autocrlf unset, so that no CRLF conversion had taken place.

For reproducibility:

git status | grep modified: | cut -c 15- | sort > sorted1.txt

is the list after committing the new .gitattributes file and after following the normalization steps except the commit itself, and yields the same file as the following command when run on a fresh clone:

rgrep -la --exclude-dir=.git \
  --exclude='*.[pP][nN][gG]' \
  --exclude='*.ogg' \
  --exclude='*.blend' \
  --exclude='*.[ot]tf' \
  --exclude='*.ico' \
  --exclude='*.bmp' \
  --exclude='*.dll' \
  --exclude='*.lib' \
  --exclude='*.icns' \
  --exclude='*.hmap' \
  $(printf \\r) . | cut -c 3- | sort > sorted2.txt

robn pushed a commit to robn/pioneer that referenced this pull request Jun 3, 2013
@robn
Copy link
Member

robn commented Jun 3, 2013

My own normalisation commit on top of master is at robn/gitattributes. Diffing against @41f0a70 shows the changes I'd expect (picodds, some model system changes, etc). So I think we can say I got something close enough to the "correct" tree.

I'm fine with the contents of .gitattributes. I don't personally agree with some of the choices around VC project files, .obj, etc, but I don't care strongly enough to argue the point.

I will update the build scripts to ensure that .txt files are converted to CRLF for Windows builds.

@Luomu
Copy link
Member

Luomu commented Jul 10, 2013

Done. In the end I went with a simpler attributes file, because I wasn't sure if we need anything more complicated. If that turns out to be the case, we can try the more complex attributes file presented here.

I hope we can keep going forward now.

@Luomu Luomu closed this Jul 10, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants