Skip to content

Commit

Permalink
diffcore-rename: compute dir_rename_guess from dir_rename_counts
Browse files Browse the repository at this point in the history
dir_rename_counts has a mapping of a mapping, in particular, it has
   old_dir => { new_dir => count }
We want a simple mapping of
   old_dir => new_dir
based on which new_dir had the highest count for a given old_dir.
Compute this and store it in dir_rename_guess.

This is the final piece of the puzzle needed to make our guesses at
which directory files have been moved to when basenames aren't unique.

For the testcases mentioned in commit 557ac03 ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       12.775 s ±  0.062 s    12.596 s ±  0.061 s
    mega-renames:    188.754 s ±  0.284 s   130.465 s ±  0.259 s
    just-one-mega:     5.599 s ±  0.019 s     3.958 s ±  0.010 s

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
  • Loading branch information
newren authored and gitster committed Feb 27, 2021
1 parent 333899e commit 81afdf7
Showing 1 changed file with 41 additions and 4 deletions.
45 changes: 41 additions & 4 deletions diffcore-rename.c
Expand Up @@ -389,6 +389,24 @@ static void dirname_munge(char *filename)
*slash = '\0';
}

static const char *get_highest_rename_path(struct strintmap *counts)
{
int highest_count = 0;
const char *highest_destination_dir = NULL;
struct hashmap_iter iter;
struct strmap_entry *entry;

strintmap_for_each_entry(counts, &iter, entry) {
const char *destination_dir = entry->key;
intptr_t count = (intptr_t)entry->value;
if (count > highest_count) {
highest_count = count;
highest_destination_dir = destination_dir;
}
}
return highest_destination_dir;
}

static void increment_count(struct dir_rename_info *info,
char *old_dir,
char *new_dir)
Expand Down Expand Up @@ -512,6 +530,8 @@ static void initialize_dir_rename_info(struct dir_rename_info *info,
struct strset *dirs_removed,
struct strmap *dir_rename_count)
{
struct hashmap_iter iter;
struct strmap_entry *entry;
int i;

if (!dirs_removed) {
Expand Down Expand Up @@ -558,6 +578,23 @@ static void initialize_dir_rename_info(struct dir_rename_info *info,
rename_dst[i].p->one->path,
rename_dst[i].p->two->path);
}

/*
* Now we collapse
* dir_rename_count: old_directory -> {new_directory -> count}
* down to
* dir_rename_guess: old_directory -> best_new_directory
* where best_new_directory is the one with the highest count.
*/
strmap_for_each_entry(info->dir_rename_count, &iter, entry) {
/* entry->key is source_dir */
struct strintmap *counts = entry->value;
char *best_newdir;

best_newdir = xstrdup(get_highest_rename_path(counts));
strmap_put(&info->dir_rename_guess, entry->key,
best_newdir);
}
}

void partial_clear_dir_rename_count(struct strmap *dir_rename_count)
Expand Down Expand Up @@ -682,10 +719,10 @@ static int idx_possible_rename(char *filename, struct dir_rename_info *info)
* rename.
*
* This function, idx_possible_rename(), is only responsible for (4).
* The conditions/steps in (1)-(3) will be handled via setting up
* dir_rename_count and dir_rename_guess in a future
* initialize_dir_rename_info() function. Steps (0) and (5) are
* handled by the caller of this function.
* The conditions/steps in (1)-(3) are handled via setting up
* dir_rename_count and dir_rename_guess in
* initialize_dir_rename_info(). Steps (0) and (5) are handled by
* the caller of this function.
*/
char *old_dir, *new_dir;
struct strbuf new_path = STRBUF_INIT;
Expand Down

0 comments on commit 81afdf7

Please sign in to comment.