Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Commits on Nov 1, 2011
  1. Junio C Hamano

    name-hash.c: always initialize dir_next pointer

    Johannes Sixt authored gitster committed
    Test t2021-checkout-overwrite.sh reveals a segfault in 'git add' on a
    case-insensitive file system when git is compiled with XMALLOC_POISON
    defined. The reason is that 2548183 (fix phantom untracked files when
    core.ignorecase is set) added a new member dir_next to struct cache_entry,
    but forgot to initialize it in all cases.
    
    Signed-off-by: Johannes Sixt <j6t@kdbg.org>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Oct 8, 2011
  1. Jeff King Junio C Hamano

    fix phantom untracked files when core.ignorecase is set

    peff authored gitster committed
    When core.ignorecase is turned on and there are stale index
    entries, "git commit" can sometimes report directories as
    untracked, even though they contain tracked files.
    
    You can see an example of this with:
    
        # make a case-insensitive repo
        git init repo && cd repo &&
        git config core.ignorecase true &&
    
        # with some tracked files in a subdir
        mkdir subdir &&
        > subdir/one &&
        > subdir/two &&
        git add . &&
        git commit -m base &&
    
        # now make the index entries stale
        touch subdir/* &&
    
        # and then ask commit to update those entries and show
        # us the status template
        git commit -a
    
    which will report "subdir/"  as untracked, even though it
    clearly contains two tracked files. What is happening in the
    commit program is this:
    
      1. We load the index, and for each entry, insert it into the index's
         name_hash. In addition, if ignorecase is turned on, we make an
         entry in the name_hash for the directory (e.g., "contrib/"), which
         uses the following code from 5102c61's hash_index_entry_directories:
    
            hash = hash_name(ce->name, ptr - ce->name);
            if (!lookup_hash(hash, &istate->name_hash)) {
                    pos = insert_hash(hash, &istate->name_hash);
    		if (pos) {
    			ce->next = *pos;
    			*pos = ce;
    		}
            }
    
         Note that we only add the directory entry if there is not already an
         entry.
    
      2. We run add_files_to_cache, which gets updated information for each
         cache entry. It helpfully inserts this information into the cache,
         which calls replace_index_entry. This in turn calls
         remove_name_hash() on the old entry, and add_name_hash() on the new
         one. But remove_name_hash doesn't actually remove from the hash, it
         only marks it as "no longer interesting" (from cache.h):
    
          /*
           * We don't actually *remove* it, we can just mark it invalid so that
           * we won't find it in lookups.
           *
           * Not only would we have to search the lists (simple enough), but
           * we'd also have to rehash other hash buckets in case this makes the
           * hash bucket empty (common). So it's much better to just mark
           * it.
           */
          static inline void remove_name_hash(struct cache_entry *ce)
          {
                  ce->ce_flags |= CE_UNHASHED;
          }
    
         This is OK in the specific-file case, since the entries in the hash
         form a linked list, and we can just skip the "not here anymore"
         entries during lookup.
    
         But for the directory hash entry, we will _not_ write a new entry,
         because there is already one there: the old one that is actually no
         longer interesting!
    
      3. While traversing the directories, we end up in the
         directory_exists_in_index_icase function to see if a directory is
         interesting. This in turn checks index_name_exists, which will
         look up the directory in the index's name_hash. We see the old,
         deleted record, and assume there is nothing interesting. The
         directory gets marked as untracked, even though there are index
         entries in it.
    
    The problem is in the code I showed above:
    
            hash = hash_name(ce->name, ptr - ce->name);
            if (!lookup_hash(hash, &istate->name_hash)) {
                    pos = insert_hash(hash, &istate->name_hash);
    		if (pos) {
    			ce->next = *pos;
    			*pos = ce;
    		}
            }
    
    Having a single cache entry that represents the directory is
    not enough; that entry may go away if the index is changed.
    It may be tempting to say that the problem is in our removal
    method; if we removed the entry entirely instead of simply
    marking it as "not here anymore", then we would know we need
    to insert a new entry. But that only covers this particular
    case of remove-replace. In the more general case, consider
    something like this:
    
      1. We add "foo/bar" and "foo/baz" to the index. Each gets
         their own entry in name_hash, plus we make a "foo/"
         entry that points to "foo/bar".
    
      2. We remove the "foo/bar" entry from the index, and from
         the name_hash.
    
      3. We ask if "foo/" exists, and see no entry, even though
         "foo/baz" exists.
    
    So we need that directory entry to have the list of _all_
    cache entries that indicate that the directory is tracked.
    So that implies making a linked list as we do for other
    entries, like:
    
      hash = hash_name(ce->name, ptr - ce->name);
      pos = insert_hash(hash, &istate->name_hash);
      if (pos) {
    	  ce->next = *pos;
    	  *pos = ce;
      }
    
    But that's not right either. In fact, it shows a second bug
    in the current code, which is that the "ce->next" pointer is
    supposed to be linking entries for a specific filename
    entry, but here we are overwriting it for the directory
    entry. So the same cache entry ends up in two linked lists,
    but they share the same "next" pointer.
    
    As it turns out, this second bug can't be triggered in the
    current code. The "if (pos)" conditional is totally dead
    code; pos will only be non-NULL if there was an existing
    hash entry, and we already checked that there wasn't one
    through our call to lookup_hash.
    
    But fixing the first bug means taking out that call to
    lookup_hash, which is going to activate the buggy dead code,
    and we'll end up splicing the two linked lists together.
    
    So we need to have a separate next pointer for the list in
    the directory bucket, and we need to traverse that list in
    index_name_exists when we are looking up a directory.
    
    This bloats "struct cache_entry" by a few bytes. Which is
    annoying, because it's only necessary when core.ignorecase
    is enabled. There's not an easy way around it, short of
    separating out the "next" pointers from cache_entry entirely
    (i.e., having a separate "cache_entry_list" struct that gets
    stored in the name_hash). In practice, it probably doesn't
    matter; we have thousands of cache entries, compared to the
    millions of objects (where adding 4 bytes to the struct
    actually does impact performance).
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Oct 6, 2010
  1. Josh Jensen Junio C Hamano

    Add case insensitivity support for directories when using git status

    jjensen authored gitster committed
    When using a case preserving but case insensitive file system, directory
    case can differ but still refer to the same physical directory.  git
    status reports the directory with the alternate case as an Untracked
    file.  (That is, when mydir/filea.txt is added to the repository and
    then the directory on disk is renamed from mydir/ to MyDir/, git status
    shows MyDir/ as being untracked.)
    
    Support has been added in name-hash.c for hashing directories with a
    terminating slash into the name hash. When index_name_exists() is called
    with a directory (a name with a terminating slash), the name is not
    found via the normal cache_name_compare() call, but it is found in the
    slow_same_name() function.
    
    Additionally, in dir.c, directory_exists_in_index_icase() allows newly
    added directories deeper in the directory chain to be identified.
    
    Ultimately, it would be better if the file list was read in case
    insensitive alphabetical order from disk, but this change seems to
    suffice for now.
    
    The end result is the directory is looked up in a case insensitive
    manner and does not show in the Untracked files list.
    
    Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
    Signed-off-by: Johannes Sixt <j6t@kdbg.org>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Apr 9, 2008
  1. Junio C Hamano

    Make hash_name_lookup able to do case-independent lookups

    Linus Torvalds authored gitster committed
    Right now nobody uses it, but "index_name_exists()" gets a flag so
    you can enable it on a case-by-case basis.
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. Junio C Hamano

    Make "index_name_exists()" return the cache_entry it found

    Linus Torvalds authored gitster committed
    This allows verify_absent() in unpack_trees() to use the hash chains
    rather than looking it up using the binary search.
    
    Perhaps more importantly, it's also going to be useful for the next phase,
    where we actually start looking at the cache entry when we do
    case-insensitive lookups and checking the result.
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  3. Junio C Hamano

    Move name hashing functions into a file of its own

    Linus Torvalds authored gitster committed
    It's really totally separate functionality, and if we want to start
    doing case-insensitive hash lookups, I'd rather do it when it's
    separated out.
    
    It also renames "remove_index_entry()" to "remove_name_hash()", because
    that really describes the thing better. It doesn't actually remove the
    index entry, that's done by "remove_index_entry_at()", which is something
    very different, despite the similarity in names.
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Something went wrong with that request. Please try again.