New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

browsing annexed files can be slow #20

Closed
anarcat opened this Issue Jun 27, 2018 · 6 comments

Comments

2 participants
@anarcat

anarcat commented Jun 27, 2018

@l can be pretty slow. Even on a modestly large repository, it takes longer than tolerable. In my use case, I want to browse a repository and see which files are present and which are not, and "get" some of those files. The way things work right now, it can take more than a minute. In a offhand test, it seems to take 80 seconds to list a 6500 files repository. Yet listing the files with only git-annex is much, much faster:

$ time sh -c 'git annex find --include "*" | wc -l'
6551
0.38user 0.04system 0:00.40elapsed 105%CPU (0avgtext+0avgdata 51916maxresident)k
0inputs+0outputs (0major+3300minor)pagefaults 0swaps

That's 400 miliseconds, 200 times faster. I thought at first it was because magit-annex was using list instead, but it doesn't seem to be the case:

$ time sh -c 'git annex list | wc -l'
6559
7.98user 1.08system 0:08.27elapsed 109%CPU (0avgtext+0avgdata 37332maxresident)k
0inputs+0outputs (0major+41074minor)pagefaults 0swaps

Still a good 10 times faster. So something fishy is happening when listing the files...

I am not sure how to fix this. From reading the source code, there should be a list-specific popup that would allow me to restrict the list to a subdirectory. This would help, but I can't figure out how to trigger that. Doing @l triggers the full list. Calling M-x magit-annex-list-popup also creates a full list.

Fundamentally, there's something inefficient in the way magit-annex lists files that makes it difficult to use. It would be great to have a smoother operation there, as there are very few interfactive interfaces to operate on git-annex files on Linux platforms. Mac OS X has the incredible git-annex-turtle but there's nothing even remotely ressembling this in Linux...

Thanks!

@anarcat

This comment has been minimized.

anarcat commented Jun 27, 2018

Ideally, this process would be done in two asynchronous steps: git annex find --include="*" would first run to fetch the list of all files, and git annex whereis could then figure out where the files are. git annex list is nice, but it's hard to parse and not designed to be consumed by porcelain like magit-annex. The other two commands have --json output that should be easier to parse...

@kyleam

This comment has been minimized.

Member

kyleam commented Jun 27, 2018

@l can be pretty slow. Even on a modestly large repository, it takes longer than tolerable.

Thanks for the information. I use git-annex for large files but rarely do I have an even modestly large numbers of files. Trying it out on a test repo with 7007 annex files created with

 for dname in a b c d e f g; do mkdir $dname && for i in {0..1000}; do od -vAn -N4 -tu4 </dev/urandom >$dname/rand$i; done; done

I agree that it's not pleasant (~10 seconds for me).

I am not sure how to fix this. From reading the source code, there should be a list-specific popup that would allow me to restrict the list to a subdirectory. This would help, but I can't figure out how to trigger that. Doing @l triggers the full list. Calling M-x magit-annex-list-popup also creates a full list.

You can access the with @ l C-u d (i.e., use a prefix argument to get the popup rather than using the default action) or with M-x magit-annex-list-dir-files. In the past I've thought about moving away from the "default action unless prefix argument" behavior to the standard "default action if prefix argument is given, otherwise popup" behavior. I'll plan to do so now.

Fundamentally, there's something inefficient in the way magit-annex lists files

Nearly all the time is spent washing the lines (magit-annex-list-wash-line).

Ideally, this process would be done in two asynchronous steps: git annex find --include="*" would first run to fetch the list of all files, and git annex whereis could then figure out where the files are.

I'll think about this more, but I don't think there's any straightforward way to make a Magit washer asynchronous. But I'm open to alternative ways to display the information. Even without the asynchronous steps, I think a command that uses whereis --json underneath should get a good speedup.

@anarcat

This comment has been minimized.

anarcat commented Jun 27, 2018

ah! so it's actually @ C-u l d. I had tried C-u @ l without luck.

Changing the default seems like a good idea.

I don't know how fast Emacs is at parsing JSON, but yeah, hopefully that would be better... I would rather have only filenames without tracking information and a fast display than a display that loads in more than one second except for the most degenerate cases...

@kyleam

This comment has been minimized.

Member

kyleam commented Jun 27, 2018

ah! so it's actually @ C-u l d.

Yep, sorry for the typo in my response. Glad you got to the right invocation anyway.

I would rather have only filenames without tracking information

For a display that shows only a list of filenames, I don't see an advantage over dired/git-annex-el. I suppose the main reason you want a magit buffer to display the files is that you want to see just annex files and you have lots of non-annex files in the repo? Or is it that you want to view all the repo's annex files at once? In either case, I still think it'd make sense to build off of dired. (We could adjust magit-annex-file-action-popup to be more useful if called from dired buffers by making file actions consider marked files or the file at point in dired buffers.)

@anarcat

This comment has been minimized.

anarcat commented Jun 27, 2018

kyleam added a commit that referenced this issue Jul 11, 2018

magit-annex-list-popup: Don't default to an action
Behave like most other popups in Magit do (magit-show-refs-popup is
the exception).  This gives more exposure to
magit-annex-list-dir-files, which useful in repos where parsing the
entire list output is too slow.

Re: #20

kyleam added a commit that referenced this issue Jul 11, 2018

Make file commands more useful from Dired buffers
The two main things here are (1) taking defaults from the Dired buffer
and (2) updating the Dired listing after an action so that
git-annex.el fontification and type changes (in the case of lock and
unlock) are up to date.

At this point, executing a command from Dired always displays the
process buffer.  I think this is convenient because in Dired buffers,
unlike in Magit buffers, the process buffer isn't available with a
single key press.  But if that annoys some users, we can consider
adjusting this (e.g., showing the process buffer if there is an error
and otherwise letting asynchronous calls be hanled by the user's
magit-process-popup-time setting).

Re: #20
@kyleam

This comment has been minimized.

Member

kyleam commented Jul 15, 2018

OK, magit-annex-list-popup will now show the popup by default, making it easier to call magit-annex-list-dir-files. And magit-annex-file-action-popup is now more useful from Dired buffers. Both should help work around magit-annex-list-files slowness. Please re-open this issue if these things, along wtih Dired/git-annex.el, don't end up working for you.

@kyleam kyleam closed this Jul 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment