The loop detection should be definitely implemented. I've studied the problem a little and looked through grep source to learn how they do it.
Their solution is the following: create a hash table and while entering each directory save the pair [device number and inode of the dir] as a value in the hash table (actually, it's a set as there are no explicit keys). After the directory is 'grepped', the entry is deleted from the hash table.
And if we enter the directory and there's already a corresponding entry in the hash table, then probably there are symlinks that form a loop (or directory hierarchy may be corrupted).
Therefore, it would be awesome to add the similar feature to ag.
I'm ready to dive into the problem and propose a working solution. But I have no desire to write a hash table from scratch. It would be great to use some small and fast implementation. I haven't chosen any yet, also I'm not good at licensing stuff.
Your suggestions are welcome.
+1 I like the idea of a hash table.
Only the main thread needs to read or modify the hash table, so there shouldn't be any concurrency issues.
I don't want to add another dependency to Ag, so I'd prefer a relatively simple hash library that could be bundled as a single source file (and header, of course).
I'm pretty busy with work stuff now, so you're more than welcome to try building this. If you want to hash-out some details in real-time (pun not intended), I'm ggreer on Freenode. I can also help you with any questions you have about the code. Although with you in Russia and me in California, it could be difficult to align schedules.
I've found a small header-only library called uthash (http://troydhanson.github.com/uthash/). It's distributed under BSD revised license.
Here's my first version of symlink loop detection implementation using this library: rev112/the_silver_searcher@98fcb1f
It works OK with my test cases. If everything looks good I'll make a pull request.
Hecks yes. Submit the pull request.