Ignore *.min.js files #142

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
9 participants
Contributor

salty-horse commented Nov 21, 2010

The commit adds *.min.js files to the ignore list.
These are usually minified javascript, which makes no sense to grep, since it's usually generated from another file in the same repository, and the text has no newlines, so it's a constant annoyance.

Contributor

felixge commented Apr 1, 2011

+1 - I need this functionality as well. In fact, I've slightly improved the involved regex:

felixge/ack@510a6f4

+1 I need this functionality as well. Please accept in to core. Until then I'll use @felixge's fork.

+1

Contributor

felixge commented Apr 28, 2011

It might be even nicer to have this configurable via .ackrc. I can try to hack a proper patch for that if there is a chance of it getting merged.

Collaborator

hoelzro commented Jun 27, 2011

+1. Would it be possible to detect minified JS files without the .min.js extension as well?

Groxx commented Jul 24, 2011

@hoelzro the best way I can think of would be to read the first few hundred characters, and check the amount of whitespace. If less than X%, assume it's minified.

But this requires checking all the js files every time, or storing a list on the filesystem somewhere. Not great for speed or cleanliness. Personally, I'm seeing how hard it would be to make ack check the current folder for a .ackrc file, and not just ~/.ackrc. ~/.ackrc isn't any good for projects, as you can't add project-specific ignores or refine types or anything, and can't check it into your repo.

joliss commented Jul 24, 2011

@hoelzro the best way I can think of would be to read the first few hundred characters, and check the amount of whitespace.  If less than X%, assume it's minified.

Sometimes there are human-readable copyright headers though. You could
read 1K and count the whitespace (or newlines), but I wonder if it's
too fragile a heuristic.

Being able to just exclude based on the .min.js extension would be
very helpful though. I can't for the life of me imagine why someone
would want to match inside a .min.js file.

joliss commented Jul 24, 2011

Btw, @petdance, any comments?

Owner

petdance commented Jul 25, 2011

I'm loathe to make changes like this before ack 2.0. But this might be worth doing.

Owner

petdance commented Jul 25, 2011

@hoelzro: How would you suggest we detect it?

Collaborator

hoelzro commented Jul 25, 2011

Tough to say. What heuristics do grep/file/revision control systems use? I think they check for extremely long lines, for some definition of extreme. As @Groxx said, this would require checking every js file for the condition, but we'd be searching them if this feature were not added. One way (probably not the best, though) would be to store search results for a given file until we're confident the heuristic has been met, or won't be met. If the heuristic has been met, we drop the search results we've gathered and move on to the next file. If not, we spit out the search results so far and continue as normal.

@hoelzro Freegrep reads BUFSIZE bytes into a buffer, then checks that each character is either printable, a space, or not \b: https://github.com/howardjp/freegrep/blob/master/binary.c#L35 .

I'd go read the grep source, but their repo browser is down and I don't feel like installing CVS. Welcome to 2011, GNU! http://cvs.savannah.gnu.org/viewvc/grep/

Contributor

salty-horse commented Jul 25, 2011

Rewinding the discussion a bit, can anyone provide a real-world example of a minified js file that isn't suffixed with .min.js?

I don't understand the requirement for a special algorithm. My guesstimate says filtering for *.min.js will catch 99% of the cases.

joliss commented Jul 25, 2011

Rewinding the discussion a bit, can anyone provide a real-world example of a minified js file that isn't suffixed with .min.js?

Sure -- there was one in the Capybara project until I removed it a few
days back: teamcapybara/capybara@fcdeeca

I don't understand the requirement for a special algorithm.
My guesstimate says filtering for *.min.js will catch 99% of the cases.

I still agree with this though. It's easy to rename JS files to follow
the .min.js convention after all.

Jo Liss
http://opinionated-programmer.com/

I'm fine with the naming convention too, I just thought it was an interesting challenge to go read the grep code :)

BTW, and purely for our edification, grep does something simpler, looking for \0 in the first N bytes of a file:

memchr (bufbeg, eol ? '\0' : '\200', buflim - bufbeg)

(I don't immediately see why eol can be false, or what purpose '\200' serves. Anyway, eol is '\n' by default)

Where buflim-bufbeg should be the same size as a page of memory, if I'm scanning the file correctly.

http://cvs.savannah.gnu.org/viewvc/grep/grep/src/grep.c?view=markup

Collaborator

hoelzro commented Jul 25, 2011

ext-all.js in the ExtJS distribution is compressed, but doesn't end with .min.js. It's the reason I brought it up. =)

Owner

petdance commented Jul 25, 2011

Something else to consider: Does foo.min.js show up in ack -f --js ?

@petdance

$ mkdir test
$ touch test/foo.min.js
$ ack -f --js
test/foo.min.js
Owner

petdance commented Jul 25, 2011

Right, but should it? Remember that -f is "all the files that ack would search."

Well, then that seems to suggest the answer :)

(As in, my preference would be: don't search min.js, and don't return it on -f)

Owner

petdance commented Jul 25, 2011

But then is that surprising when you expect that ack -f --js will return all javascript files? :-)

Then again, .min.js aren't source code, are they?

Collaborator

hoelzro commented Jul 25, 2011

In my opinion, searching .min.js files is analogous to searching .o files.

Owner

petdance commented Jul 25, 2011

Right. I just wonder if it will screw anyone up.

In my opinion, searching .min.js files is analogous to searching .o files.

+1

Owner

petdance commented Jul 25, 2011

Agreed, AND it still is a .js file. So I'm not sure how to deal with that.

Collaborator

hoelzro commented Jul 26, 2011

@petdance I thought I read that Ack 2.0 will support some sort of plugin system. If that is the case, I'd favor holding off on this feature, as it would introduce complexity in Ack 1.x that could be reimplemented as a plugin in Ack 2.0. Also, users could then disable this functionality if they don't want it.

joliss commented Jul 29, 2011

Right.  I just wonder if it will screw anyone up.

Try:

wget https://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js
&& ack jquery

and marvel at the output. I can't imagine anyone actually relying on
Ack output such as this one.

Owner

petdance commented Sep 18, 2011

This pull request has been dealt with elsewhere.

petdance closed this Sep 18, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment