a text scanning suite
Python C++ C Haskell Other
Switch branches/tags
Latest commit 5b93a35 Apr 20, 2011 @mstone Record that tarball_traversal.o depends on libarchive and libz.
Reported-by: Christopher Witter <christopher.witter@gmail.com>
Permalink
Failed to load latest commit information.
config Initial public release. Feb 15, 2011
contrib Initial public release. Feb 15, 2011
docs Initial public release. Feb 15, 2011
t Avoid needlessly recomputing end iterators. Mar 4, 2011
INSTALL Initial public release. Feb 15, 2011
LICENSE Initial public release. Feb 15, 2011
Makefile Record that tarball_traversal.o depends on libarchive and libz. Apr 20, 2011
README Update the README to mention the relationship to find+grep. Feb 19, 2011
binder.cc Initial public release. Feb 15, 2011
binder.h Initial public release. Feb 15, 2011
cgen.h Initial public release. Feb 15, 2011
compat_fstatat.h Initial public release. Feb 15, 2011
compat_lseek64.h Initial public release. Feb 15, 2011
compat_openat.h Initial public release. Feb 15, 2011
config.Darwin.mk Initial public release. Feb 15, 2011
config.Linux.mk Initial public release. Feb 15, 2011
config.cc Initial public release. Feb 15, 2011
config.h Initial public release. Feb 15, 2011
config.lua Initial public release. Feb 15, 2011
configure Initial public release. Feb 15, 2011
counter.cc Initial public release. Feb 15, 2011
counter.h Initial public release. Feb 15, 2011
cpplint.py Initial public release. Feb 15, 2011
decider.cc Avoid relying on an uninitialized 'struct stat'. Mar 9, 2011
decider.h Avoid relying on an uninitialized 'struct stat'. Mar 9, 2011
decode.cc Avoid needlessly recomputing end iterators. Mar 4, 2011
decode.h Initial public release. Feb 15, 2011
encode.cc Avoid needlessly recomputing end iterators. Mar 4, 2011
encode.h Initial public release. Feb 15, 2011
fake_fdopendir.cc Initial public release. Feb 15, 2011
fake_fdopendir.h Initial public release. Feb 15, 2011
fd_scannable.cc Initial public release. Feb 15, 2011
fd_scannable.h Initial public release. Feb 15, 2011
fd_traversal.cc Avoid relying on an uninitialized 'struct stat'. Mar 9, 2011
fd_traversal.h Initial public release. Feb 15, 2011
highlight.cc Avoid needlessly recomputing end iterators. Mar 4, 2011
hits_model.cc Initial public release. Feb 15, 2011
hits_model.h Initial public release. Feb 15, 2011
log.cc Avoid needlessly recomputing end iterators. Mar 4, 2011
log.h Initial public release. Feb 15, 2011
luaa.cc Initial public release. Feb 15, 2011
luaa.h Initial public release. Feb 15, 2011
maintainer Initial public release. Feb 15, 2011
nat.c Initial public release. Feb 15, 2011
nat.h Initial public release. Feb 15, 2011
path_dir_pair.h Initial public release. Feb 15, 2011
path_model.cc Initial public release. Feb 15, 2011
path_model.h Initial public release. Feb 15, 2011
regex_scanner.cc Avoid needlessly recomputing end iterators. Mar 4, 2011
regex_scanner.h Initial public release. Feb 15, 2011
report.cc Initial public release. Feb 15, 2011
sample_model.cc Initial public release. Feb 15, 2011
sample_model.h Initial public release. Feb 15, 2011
scan_dir.cc Initial public release. Feb 15, 2011
scan_storage.cc Initial public release. Feb 15, 2011
scan_tarball.cc Initial public release. Feb 15, 2011
scannable.cc Initial public release. Feb 15, 2011
scannable.h Initial public release. Feb 15, 2011
scanner.cc Initial public release. Feb 15, 2011
scanner.h Initial public release. Feb 15, 2011
scanner_mode.cc Initial public release. Feb 15, 2011
scanner_mode.h Initial public release. Feb 15, 2011
sensor_model.cc Initial public release. Feb 15, 2011
sensor_model.h Initial public release. Feb 15, 2011
sensors.cc Initial public release. Feb 15, 2011
sensors.h Initial public release. Feb 15, 2011
sqlite3_compat.h Initial public release. Feb 15, 2011
stream_traversal.cc Avoid relying on an uninitialized 'struct stat'. Mar 9, 2011
stream_traversal.h Initial public release. Feb 15, 2011
summarize.cc Avoid needlessly recomputing end iterators. Mar 4, 2011
system.hh Initial public release. Feb 15, 2011
tarball_scannable.cc Initial public release. Feb 15, 2011
tarball_scannable.h Initial public release. Feb 15, 2011
tarball_scanner.cc Initial public release. Feb 15, 2011
tarball_scanner.h Initial public release. Feb 15, 2011
tarball_traversal.cc Initial public release. Feb 15, 2011
tarball_traversal.h Initial public release. Feb 15, 2011
traversal.cc Initial public release. Feb 15, 2011
traversal.h Initial public release. Feb 15, 2011
vscan-view Initial public release. Feb 15, 2011

README

% VSCAN README
% Michael Stone <mistone@akamai.com>
% February 1, 2011

`vscan` is a toolkit for making fast but crude measurements of the prevalence
of named textual features in algorithmically selected samples of large corpora.

It's useful in the same places as `find` and `grep` but it's designed to yield
more useful reports, e.g., by letting you name the patterns that you're
searching for and by storing the resulting matches in a SQLite database for
later correlation with upload or modification logs.

So far, we've used it, with some success, for

  a) hunting for JavaScript malware in FTP-accessible file systems and for

  b) hunting for call-sites of deprecated cryptographic primitives in large
     collections of source code.

To install `vscan`, please follow the instructions in the `INSTALL` file
located alongside this `README` or check to see whether `vscan` is available
through your favorite package manager.

For information on how to use `vscan`, please see the overview and command-
specific documentation in the `docs/` subdirectory of the source code. (Also,
take a look at the `config.lua` file alongisde this `README` -- it's got some
fun examples of nasty JavaScript patterns!)

Finally, please write if you have trouble getting `vscan` to work or if you've
done cool things with `vscan` that we might want to merge -- we'd love to hear
from you!