Support for searching in gzipped files #141

Merged
merged 8 commits into from Feb 9, 2013

Projects

None yet

3 participants

@nall
Contributor
nall commented Jan 22, 2013

I've added new functionality will search the contents of gzipped files. It has some initial infrastructure in place for other compressed files (ZIP, UNIX compress), but these aren't implemented yet (looks like miniz http://code.google.com/p/miniz might actually be useful here).

This change adds a new command line argument (-z, --search-zip). Searching in zipped files is off by default currently.

This change adds a new dependency on zlib. I've updated configure.ac to reflect this.

Most of the new code is in decompress.c, but I did write a small little wrapper around search_buf in search.c to check for whether the buffer contains compressed data.

I hope this is useful -- it certainly is to me!

@nall
Contributor
nall commented Jan 22, 2013

Hang tight -- just realized I have a mem leak.

@ggreer
Owner
ggreer commented Jan 22, 2013

I want this feature.

@nall
Contributor
nall commented Jan 22, 2013

OK, I think it's pretty good to go now. Give it a review and see what you think.

@ggreer
Owner
ggreer commented Jan 22, 2013

There's no rush. I doubt I'll be able to fully review this tonight.

Ag survived a year without this feature. It can survive another few days. :)

@ggreer ggreer commented on an outdated diff Jan 22, 2013
src/decompress.c
+ inflateEnd(&stream);
+ goto error_out;
+ }
+
+ stream.avail_out = result_size / 2;
+ stream.next_out = &result[stream.total_out];
+ ret = inflate(&stream, Z_SYNC_FLUSH);
+ log_debug("inflate ret = %d", ret);
+ switch(ret) {
+ case Z_STREAM_ERROR: {
+ log_err("Found stream error while decompressing zlib stream: %s", stream.msg);
+ inflateEnd(&stream);
+ goto error_out;
+ }
+ case Z_NEED_DICT:
+ ret = Z_DATA_ERROR;
@ggreer
ggreer Jan 22, 2013 Owner

It looks like this is supposed to fall though and go to error_out. In that case, nothing looks at ret. You can kill this line.

@ggreer ggreer commented on the diff Jan 22, 2013
src/search.c
+ if(!opts.search_zip_files) {
+ log_debug("File %s is zipped. Skipping...", dir_full_path);
+ return;
+ }
+ _buf = decompress(zip_type, buf, buf_len, dir_full_path, &_buf_len);
+ if(_buf == NULL || _buf_len == 0) {
+ log_warn("Cannot decompress zipped file %s", dir_full_path);
+ return;
+ }
+ }
+
+ search_nonzipped_buf(_buf, _buf_len, dir_full_path);
+
+ /* Check if this is the _buf we allocated in decompress. If so, free it */
+ if(_buf != buf) {
+ free(_buf);
@ggreer
ggreer Jan 22, 2013 Owner

Freeing a const char*? Can you do that? (apparently you can. the compiler just warns).

To avoid the warning, I suggest declaring _buf a char* and explicitly casting buf to non-const. char* _buf = (char*)buf; or something like that.

@nall
Contributor
nall commented Jan 22, 2013

Thanks for the good comments. I've updated my repo.

@sorin-ionescu

I prefer the UNIX philosophy — do one thing and do it well. This should go into a separate tool called zag that will replicate zgrep, see man zgrep.

@ggreer ggreer commented on the diff Feb 6, 2013
src/search.c
void search_buf(const char *buf, const int buf_len,
const char *dir_full_path) {
+ char* _buf = (char*)buf;
+ int _buf_len = buf_len;
+ ag_compression_type zip_type = AG_NO_COMPRESSION;
+
+ zip_type = is_zipped((void*)_buf, _buf_len);
@ggreer
ggreer Feb 6, 2013 Owner

Hmm... it's much more efficient to check opts.search_zip_files first instead of scanning every buffer for zip headers.

@ggreer
Owner
ggreer commented Feb 6, 2013

I still want to merge this, with one caveat: -a and -u shouldn't search compressed files. -az or -uz would be the way to search all files including zipped ones.

Apologies for not getting back to this sooner. I've been busy with work stuff.

@nall
Contributor
nall commented Feb 8, 2013

Do you mean --az and --uz since short options can't be two characters?

@ggreer
Owner
ggreer commented Feb 8, 2013

I meant you'd use -a -z or -u -z, which can be combined to -az or -uz.

@ggreer
Owner
ggreer commented Feb 9, 2013

Actually, since it's just a two-line change I'll merge and fix it myself.

Well done.

@ggreer ggreer merged commit e80a032 into ggreer:master Feb 9, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment