Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teach git status to show ignored directories when showing all untracked files #1243

Closed

Conversation

jamill
Copy link

@jamill jamill commented Jul 18, 2017

I wanted to put the general idea out for initial feedback. While this PR adds another flag to git status, I think a similar idea could be applied to improve performance of git status --ignore in general.

The git status command exposes the option to report ignored and
untracked files. When reporting ignored files, it can optionally report all untracked
files (--untracked-files=all), but this results in all ignored files being
reported as well.

Our application (Visual Studio) needs all untracked files listed
individually, but does not need all individual ignored files.
Reporting all ignored files can affect the time it takes for status
to run. Here are some measurements on a representative repository:

Command Reported ignore entries Time (s)
1 0 1.3
2 1024 4.2
3 174904 7.5
4 1046 1.6

Commands:

  1. status
  2. status --ignored
  3. status --ignored --untracked=all
  4. status --ignored --untracked=all --show-ignored-directory

The code change takes an new flag (--show-ignored-directory) that is interpreted when
running git status with the --ignore --untracked flags. It will not report individual files contained in a folder that matches an ignore pattern.

If all the files of a folder match an ignore pattern, they are still listed individually. This is why the output of this new flag does not exactly match the output for git status --ignored. On further change that could be made would be to just report the directory if all contained entries are ignored (i.e. make the output of status with this flag match the output of just git status --ignored).

@jamill
Copy link
Author

jamill commented Jul 18, 2017

/cc @jeffhostetler

dir.c Outdated
* check if the directory is empty or not. If directory
* is not empty, we know it is a non-empty excluded directory.
*/
if (stop_at_first_file) {

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link

@ermeckle ermeckle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a while verifying the logic, and it seems to be solid.

builtin/commit.c Outdated
@@ -1360,6 +1361,8 @@ int cmd_status(int argc, const char **argv, const char *prefix)
OPT_COLUMN(0, "column", &s.colopts, N_("list untracked files in columns")),
OPT_BOOL(0, "no-lock-index", &no_lock_index,
N_("do not lock the index")),
OPT_BOOL(0, "show-ignored-directory", &collapse_ignored_directories,

This comment was marked as off-topic.

dir.h Outdated
@@ -152,7 +152,8 @@ struct dir_struct {
DIR_COLLECT_IGNORED = 1<<4,
DIR_SHOW_IGNORED_TOO = 1<<5,
DIR_COLLECT_KILLED_ONLY = 1<<6,
DIR_KEEP_UNTRACKED_CONTENTS = 1<<7
DIR_KEEP_UNTRACKED_CONTENTS = 1<<7,
DIR_COLLAPSE_IGNORED = 1<<8

This comment was marked as off-topic.

@@ -0,0 +1,117 @@
#!/bin/sh

This comment was marked as off-topic.

dir.c Outdated
if (!(dir->flags & DIR_NO_GITLINKS)) {
unsigned char sha1[20];
if (resolve_gitlink_ref(dirname, "HEAD", sha1) == 0)
return exclude ? path_excluded : path_untracked;
}

This comment was marked as off-topic.

dir.c Outdated
@@ -1615,6 +1646,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
struct strbuf *path,
int baselen,
const struct pathspec *pathspec)

This comment was marked as off-topic.

dir.c Outdated
const struct pathspec *pathspec)
{
struct cached_dir cdir;
enum path_treatment state, subdir_state, dir_state = path_none;
struct strbuf path = STRBUF_INIT;
int max_dir_state;

This comment was marked as off-topic.

dir.c Outdated
if (subdir_state > dir_state)
dir_state = subdir_state;
}

if (check_only) {

This comment was marked as off-topic.

dir.c Outdated
break;
}

This comment was marked as off-topic.

dir.c Outdated
if (dir_state == path_untracked) {
if (cdir.fdir) {
add_untracked(untracked, path.buf + baselen);
}

This comment was marked as off-topic.

dir.c Outdated
* check if the directory is empty or not. If directory
* is not empty, we know it is a non-empty excluded directory.
*/
if (stop_at_first_file) {

This comment was marked as off-topic.

@@ -33,6 +33,12 @@ The notable options are:
Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
in addition to untracked files in `entries[]`.

`DIR_SHOW_IGNORED_DIRECTORY`:::

Only has meaning if `DIR_SHOW_INGORED_TOO` is also set; if this is set,

This comment was marked as off-topic.

dir.c Outdated
if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) {
if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
return exclude ? path_excluded : path_untracked;
else

This comment was marked as off-topic.

dir.c Outdated
dir_state = max_dir_state;

if (dir_state == max_dir_state) {
if (dir_state == path_untracked) {

This comment was marked as off-topic.

This comment was marked as off-topic.

@jamill jamill force-pushed the status_ignored_directory branch 2 times, most recently from 1b44b28 to 81d295b Compare July 21, 2017 17:33
@PhilipOakley
Copy link

Don't forget the check-ignore function... though it may not be working properly anyway ;-)

See the thread on the main Git list that follows up a Stackoverflow users issue... Expected behavior of "git check-ignore"...

@jamill jamill changed the title [WIP] Teach git status to show ignored directories when showing all untracked files Teach git status to show ignored directories when showing all untracked files Aug 2, 2017
dir/dir/untracked dir/dir/untracked_ignored \
dir/dir/ignored_dir \
dir/dir/ignored_with_sub_untracked \
dir/dir/ignored_with_sub_untracked/untracked \

This comment was marked as off-topic.

dir.c Outdated
dirname + baselen, len - baselen);
return read_directory_recursive(dir, istate, dirname, len,
untracked, 1, 0, pathspec);
}

This comment was marked as off-topic.

This comment was marked as off-topic.

dir.c Outdated
}
if (exclude &&
(dir->flags & DIR_SHOW_IGNORED_TOO) &&
(dir->flags & DIR_SHOW_IGNORED_DIRECTORY)) {

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -1832,12 +1844,32 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
subdir_state =
read_directory_recursive(dir, istate, path.buf,
path.len, ud,
check_only, pathspec);
check_only, stop_at_first_file, pathspec);
if (subdir_state > dir_state)
dir_state = subdir_state;
}

if (check_only) {

This comment was marked as off-topic.


touch tracked/tracked \
ignored/ignored_1.ign \
ignored/ignored_2.ign tracked_ignored/tracked_1 \

This comment was marked as off-topic.

untracked_ignored/ignored_1.ign \
untracked_ignored/ignored_2.ign \
ignored_dir/test \
ignored_dir/test.unignore \

This comment was marked as off-topic.

This comment was marked as off-topic.

The git status command exposes the option to report ignored and
untracked files. When reporting untracked files, it can report untracked
files (--untracked=all), but this results in all ignored files being
reported as well. This teaches Git to optionally show all untracked
files, but not show individual ignored files contained in directories
that match an ignore rule.

Motivation:
Our application (Visual Studio) needs all untracked files listed
individually, but does not need all ignored files listed individually.
Reporting all ignored files can affect the time it takes for status
to run. For a representative repository, here are some measurements
showing a large perf improvement for this scenario:

| Command | Reported ignored entries | Time (s) |
| ------- | ------------------------ | -------- |
| 1       | 0                        | 1.3      |
| 2       | 1024                     | 4.2      |
| 3       | 174904                   | 7.5      |
| 4       | 1046                     | 1.6      |

Commands:
 1) status
 2) status --ignored
 3) status --ignored --untracked-files=all
 4) status --ignored --untracked-files=all --show-ignored-directory

This changes exposes a --show-ignored-directory flag to the git status
command. This flag is utilized when running git status with the
--ignored and --untracked-files options to not list ignored individual
ignored files contained in directories that match an ignore pattern.

Part of the perf improvement comes from the tweak to
read_directory_recursive to stop scanning the file system after it
encounters the first file. When a directory is ignored, all it needs to
determine is if the directory is empty or not. The logic currently keeps
scanning the file system until it finds an untracked file. However, as
the directory is ignored, all the contained contents are also marked
excluded. For ignored directories that contain a large number of files,
this can take some time.

Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Jameson Miller <jamill@microsoft.com>
@dscho
Copy link
Member

dscho commented Aug 5, 2017

This will be part of Git for Windows v2.14.0. Thanks!

@dscho dscho added this to the v2.14.0 milestone Aug 5, 2017
dscho added a commit to git-for-windows/build-extra that referenced this pull request Aug 5, 2017
The experimental option [`--show-ignored-directory` was added to `git
status`](git-for-windows/git#1243) to show
only the name of ignored directories when the option `--untracked=all`
is used.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
dscho added a commit that referenced this pull request Aug 5, 2017
It is totally my fault that I failed to notice the updated PR at
#1243. This backports the fixes
into the next Git for Windows version's commit graph.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho
Copy link
Member

dscho commented Aug 5, 2017

Sorry, I missed this update... Fixed via f0a126c

@dscho dscho closed this Aug 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants