Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Ignore *.min.js files #142

Closed
wants to merge 1 commit into from

9 participants

@salty-horse

The commit adds *.min.js files to the ignore list.
These are usually minified javascript, which makes no sense to grep, since it's usually generated from another file in the same repository, and the text has no newlines, so it's a constant annoyance.

@felixge

+1 - I need this functionality as well. In fact, I've slightly improved the involved regex:

felixge@510a6f4

@kevinold

+1 I need this functionality as well. Please accept in to core. Until then I'll use @felixge's fork.

@felixge

It might be even nicer to have this configurable via .ackrc. I can try to hack a proper patch for that if there is a chance of it getting merged.

@hoelzro
Collaborator

+1. Would it be possible to detect minified JS files without the .min.js extension as well?

@Groxx

@hoelzro the best way I can think of would be to read the first few hundred characters, and check the amount of whitespace. If less than X%, assume it's minified.

But this requires checking all the js files every time, or storing a list on the filesystem somewhere. Not great for speed or cleanliness. Personally, I'm seeing how hard it would be to make ack check the current folder for a .ackrc file, and not just ~/.ackrc. ~/.ackrc isn't any good for projects, as you can't add project-specific ignores or refine types or anything, and can't check it into your repo.

@joliss
@joliss

Btw, @petdance, any comments?

@petdance
Owner

I'm loathe to make changes like this before ack 2.0. But this might be worth doing.

@petdance
Owner

@hoelzro: How would you suggest we detect it?

@hoelzro
Collaborator

Tough to say. What heuristics do grep/file/revision control systems use? I think they check for extremely long lines, for some definition of extreme. As @Groxx said, this would require checking every js file for the condition, but we'd be searching them if this feature were not added. One way (probably not the best, though) would be to store search results for a given file until we're confident the heuristic has been met, or won't be met. If the heuristic has been met, we drop the search results we've gathered and move on to the next file. If not, we spit out the search results so far and continue as normal.

@llimllib

@hoelzro Freegrep reads BUFSIZE bytes into a buffer, then checks that each character is either printable, a space, or not \b: https://github.com/howardjp/freegrep/blob/master/binary.c#L35 .

I'd go read the grep source, but their repo browser is down and I don't feel like installing CVS. Welcome to 2011, GNU! http://cvs.savannah.gnu.org/viewvc/grep/

@salty-horse

Rewinding the discussion a bit, can anyone provide a real-world example of a minified js file that isn't suffixed with .min.js?

I don't understand the requirement for a special algorithm. My guesstimate says filtering for *.min.js will catch 99% of the cases.

@joliss
@llimllib

I'm fine with the naming convention too, I just thought it was an interesting challenge to go read the grep code :)

@llimllib

BTW, and purely for our edification, grep does something simpler, looking for \0 in the first N bytes of a file:

memchr (bufbeg, eol ? '\0' : '\200', buflim - bufbeg)

(I don't immediately see why eol can be false, or what purpose '\200' serves. Anyway, eol is '\n' by default)

Where buflim-bufbeg should be the same size as a page of memory, if I'm scanning the file correctly.

http://cvs.savannah.gnu.org/viewvc/grep/grep/src/grep.c?view=markup

@hoelzro
Collaborator

ext-all.js in the ExtJS distribution is compressed, but doesn't end with .min.js. It's the reason I brought it up. =)

@petdance
Owner

Something else to consider: Does foo.min.js show up in ack -f --js ?

@llimllib

@petdance

$ mkdir test
$ touch test/foo.min.js
$ ack -f --js
test/foo.min.js
@petdance
Owner

Right, but should it? Remember that -f is "all the files that ack would search."

@llimllib

Well, then that seems to suggest the answer :)

(As in, my preference would be: don't search min.js, and don't return it on -f)

@petdance
Owner

But then is that surprising when you expect that ack -f --js will return all javascript files? :-)

Then again, .min.js aren't source code, are they?

@hoelzro
Collaborator

In my opinion, searching .min.js files is analogous to searching .o files.

@petdance
Owner

Right. I just wonder if it will screw anyone up.

@djanowski

In my opinion, searching .min.js files is analogous to searching .o files.

+1

@petdance
Owner

Agreed, AND it still is a .js file. So I'm not sure how to deal with that.

@hoelzro
Collaborator

@petdance I thought I read that Ack 2.0 will support some sort of plugin system. If that is the case, I'd favor holding off on this feature, as it would introduce complexity in Ack 1.x that could be reimplemented as a plugin in Ack 2.0. Also, users could then disable this functionality if they don't want it.

@joliss
@petdance
Owner

This pull request has been dealt with elsewhere.

@petdance petdance closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 21, 2010
  1. @salty-horse
This page is out of date. Refresh to see the latest.
Showing with 19 additions and 12 deletions.
  1. +6 −3 Ack.pm
  2. +5 −3 ack
  3. +4 −3 ack-help-dirs.txt
  4. +4 −3 ack-help.txt
View
9 Ack.pm
@@ -558,6 +558,7 @@ Recognized files:
/#.+#$/ - Emacs swap files
/[._].*\.swp$/ - Vi(m) swap files
/core\.\d+$/ - core dumps
+ /[.]min\.js$/ - Minified javascript files
Note that I<$filename> must be just a file, not a full path.
@@ -572,6 +573,7 @@ sub is_searchable {
return if $filename =~ m{^#.*#$}o;
return if $filename =~ m{^core\.\d+$}o;
return if $filename =~ m{[._].*\.swp$}o;
+ return if $filename =~ /[.]min\.js$/;
return 1;
}
@@ -810,10 +812,11 @@ File inclusion/exclusion:
$ignore_dirs
Files not checked for type:
- /~\$/ - Unix backup files
- /#.+#\$/ - Emacs swap files
+ /~\$/ - Unix backup files
+ /#.+#\$/ - Emacs swap files
/[._].*\\.swp\$/ - Vi(m) swap files
- /core\\.\\d+\$/ - core dumps
+ /core\\.\\d+\$/ - core dumps
+ /[.]min\\.js\$/ - Minified javascript files
Miscellaneous:
--noenv Ignore environment variables and ~/.ackrc
View
8 ack
@@ -1642,6 +1642,7 @@ sub is_searchable {
return if $filename =~ m{^#.*#$}o;
return if $filename =~ m{^core\.\d+$}o;
return if $filename =~ m{[._].*\.swp$}o;
+ return if $filename =~ /[.]min\.js$/;
return 1;
}
@@ -1845,10 +1846,11 @@ File inclusion/exclusion:
$ignore_dirs
Files not checked for type:
- /~\$/ - Unix backup files
- /#.+#\$/ - Emacs swap files
+ /~\$/ - Unix backup files
+ /#.+#\$/ - Emacs swap files
/[._].*\\.swp\$/ - Vi(m) swap files
- /core\\.\\d+\$/ - core dumps
+ /core\\.\\d+\$/ - core dumps
+ /[.]min\\.js\$/ - Minified javascript files
Miscellaneous:
--noenv Ignore environment variables and ~/.ackrc
View
7 ack-help-dirs.txt
@@ -110,10 +110,11 @@ File inclusion/exclusion:
~.dot, .git, .hg, _MTN, ~.nib, .pc, ~.plst, RCS, SCCS, _sgbak and .svn
Files not checked for type:
- /~$/ - Unix backup files
- /#.+#$/ - Emacs swap files
+ /~$/ - Unix backup files
+ /#.+#$/ - Emacs swap files
/[._].*\.swp$/ - Vi(m) swap files
- /core\.\d+$/ - core dumps
+ /core\.\d+$/ - core dumps
+ /[.]min\.js$/ - Minified javascript files
Miscellaneous:
--noenv Ignore environment variables and ~/.ackrc
View
7 ack-help.txt
@@ -110,10 +110,11 @@ File inclusion/exclusion:
~.dot, .git, .hg, _MTN, ~.nib, .pc, ~.plst, RCS, SCCS, _sgbak and .svn
Files not checked for type:
- /~$/ - Unix backup files
- /#.+#$/ - Emacs swap files
+ /~$/ - Unix backup files
+ /#.+#$/ - Emacs swap files
/[._].*\.swp$/ - Vi(m) swap files
- /core\.\d+$/ - core dumps
+ /core\.\d+$/ - core dumps
+ /[.]min\.js$/ - Minified javascript files
Miscellaneous:
--noenv Ignore environment variables and ~/.ackrc
Something went wrong with that request. Please try again.