Skip to content
Commits on Jul 9, 2012
  1. @tboegi @gitster

    git on Mac OS and precomposed unicode

    tboegi committed with gitster Jul 8, 2012
    Mac OS X mangles file names containing unicode on file systems HFS+,
    VFAT or SAMBA.  When a file using unicode code points outside ASCII
    is created on a HFS+ drive, the file name is converted into
    decomposed unicode and written to disk. No conversion is done if
    the file name is already decomposed unicode.
    Calling open("\xc3\x84", ...) with a precomposed "Ä" yields the same
    result as open("\x41\xcc\x88",...) with a decomposed "Ä".
    As a consequence, readdir() returns the file names in decomposed
    unicode, even if the user expects precomposed unicode.  Unlike on
    HFS+, Mac OS X stores files on a VFAT drive (e.g. an USB drive) in
    precomposed unicode, but readdir() still returns file names in
    decomposed unicode.  When a git repository is stored on a network
    share using SAMBA, file names are send over the wire and written to
    disk on the remote system in precomposed unicode, but Mac OS X
    readdir() returns decomposed unicode to be compatible with its
    behaviour on HFS+ and VFAT.
    The unicode decomposition causes many problems:
    - The names "git add" and other commands get from the end user may
      often be precomposed form (the decomposed form is not easily input
      from the keyboard), but when the commands read from the filesystem
      to see what it is going to update the index with already is on the
      filesystem, readdir() will give decomposed form, which is different.
    - Similarly "git log", "git mv" and all other commands that need to
      compare pathnames found on the command line (often but not always
      precomposed form; a command line input resulting from globbing may
      be in decomposed) with pathnames found in the tree objects (should
      be precomposed form to be compatible with other systems and for
      consistency in general).
    - The same for names stored in the index, which should be
      precomposed, that may need to be compared with the names read from
    NFS mounted from Linux is fully transparent and does not suffer from
    the above.
    As Mac OS X treats precomposed and decomposed file names as equal,
    we can
     - wrap readdir() on Mac OS X to return the precomposed form, and
     - normalize decomposed form given from the command line also to the
       precomposed form,
    to ensure that all pathnames used in Git are always in the
    precomposed form.  This behaviour can be requested by setting
    "core.precomposedunicode" configuration variable to true.
    The code in compat/precomposed_utf8.c implements basically 4 new
    functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(),
    precomposed_utf8_closedir() and precompose_argv().  The first three
    are to wrap opendir(3), readdir(3), and closedir(3) functions.
    The argv[] conversion allows to use the TAB filename completion done
    by the shell on command line.  It tolerates other tools which use
    readdir() to feed decomposed file names into git.
    When creating a new git repository with "git init" or "git clone",
    "core.precomposedunicode" will be set "false".
    The user needs to activate this feature manually.  She typically
    sets core.precomposedunicode to "true" on HFS and VFAT, or file
    systems mounted via SAMBA.
    Helped-by: Junio C Hamano <>
    Signed-off-by: Torsten Bögershausen <>
    Signed-off-by: Junio C Hamano <>
Commits on Feb 23, 2011
  1. @peff @gitster

    strbuf: add fixed-length version of add_wrapped_text

    peff committed with gitster Feb 23, 2011
    The function strbuf_add_wrapped_text takes a NUL-terminated
    string. This makes it annoying to wrap strings we have as a
    pointer and a length.
    Refactoring strbuf_add_wrapped_text and all of its
    sub-functions to handle fixed-length strings turned out to
    be really ugly. So this implementation is lame; it just
    strdups the text and operates on the NUL-terminated version.
    This should be fine as the strings we are wrapping are
    generally pretty short.  If it becomes a problem, we can
    optimize later.
    Signed-off-by: Jeff King <>
    Signed-off-by: Junio C Hamano <>
Commits on Mar 2, 2010
  1. @gitster

    Merge branch 'rs/optim-text-wrap'

    gitster committed Mar 2, 2010
    * rs/optim-text-wrap:
      utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text()
      utf8.c: remove strbuf_write()
      utf8.c: remove print_spaces()
      utf8.c: remove print_wrapped_text()
Commits on Feb 20, 2010
  1. @gitster

    utf8.c: remove print_wrapped_text()

    René Scharfe committed with gitster Feb 19, 2010
    strbuf_add_wrapped_text() is called only from print_wrapped_text()
    without a strbuf (in which case it writes its results to stdout).
    At its only callsite, supply a strbuf, call strbuf_add_wrapped_text()
    directly and remove the wrapper function.
    Signed-off-by: Rene Scharfe <>
    Signed-off-by: Junio C Hamano <>
Commits on Jan 12, 2010
  1. @gitster

    utf8.c: mark file-local function static

    gitster committed Jan 11, 2010
    Signed-off-by: Junio C Hamano <>
Commits on Oct 19, 2009
  1. @dscho @gitster

    Add strbuf_add_wrapped_text() to utf8.[ch]

    dscho committed with gitster Nov 10, 2008
    The newly added function can rewrap text according to a given first-line
    indent, other-indent and text width.
    Signed-off-by: Johannes Schindelin <>
Commits on Feb 5, 2009
  1. @geofft @gitster

    utf8: add utf8_strwidth()

    geofft committed with gitster Jan 30, 2009
    I'm about to use this pattern more than once, so make it a common function.
    Signed-off-by: Geoffrey Thomas <>
    Signed-off-by: Junio C Hamano <>
Commits on Jan 7, 2008
  1. @gitster

    utf8_width(): allow non NUL-terminated input

    gitster committed Jan 2, 2008
    The original interface assumed that the input string is
    always terminated with a NUL, but that wasn't too useful.
    Signed-off-by: Junio C Hamano <>
  2. @gitster

    utf8: pick_one_utf8_char()

    gitster committed Jan 6, 2008
    utf8_width() function was doing two different things.  To pick a
    valid character from UTF-8 stream, and compute the display width of
    that character.  This splits the former to a separate function
    Signed-off-by: Junio C Hamano <>
Commits on Feb 28, 2007
  1. @dscho

    Actually make print_wrapped_text() useful

    dscho committed with Junio C Hamano Feb 27, 2007
    Now, it returns the current column, does not add a newline, and you can
    pass a negative indent, to indicate that the indent was already printed.
    With this, you can actually continue in the middle of a paragraph, not
    having to print everything into a buffer first.
    Signed-off-by: Johannes Schindelin <>
    Signed-off-by: Junio C Hamano <>
Commits on Dec 30, 2006
  1. commit-tree: cope with different ways "utf-8" can be spelled.

    Junio C Hamano committed Dec 30, 2006
    People can spell config.commitencoding differently from what we
    internally have ("utf-8") to mean UTF-8.  Try to accept them and
    treat them equally.
    Signed-off-by: Junio C Hamano <>
Commits on Dec 26, 2006
  1. Move encoding conversion routine out of mailinfo to utf8.c

    Junio C Hamano committed Dec 23, 2006
    This moves the body of convert_to_utf8() routine used in mailinfo
    to the utf8.c i18n library.
    Signed-off-by: Junio C Hamano <>
Commits on Dec 24, 2006
  1. @dscho

    commit-tree: encourage UTF-8 commit messages.

    dscho committed with Junio C Hamano Dec 22, 2006
    Introduce is_utf() to check if a text looks like it is encoded
    in UTF-8, utf8_width() to count display width, and implements
    print_wrapped_text() using them.
    git-commit-tree warns if the commit message does not minimally
    conform to the UTF-8 encoding when i18n.commitencoding is either
    unset, or set to "utf-8".
    Signed-off-by: Junio C Hamano <>
Something went wrong with that request. Please try again.