Commits on Dec 3, 2011
  1. @danlucraft

    Bump version

    danlucraft committed Dec 3, 2011
  2. @danlucraft
  3. @danlucraft

    Merge remote-tracking branch 'taw/master'

    danlucraft committed Dec 3, 2011
  4. @danlucraft

    Add a --lua option

    danlucraft committed Dec 3, 2011
  5. @danlucraft

    Fix up specs

    danlucraft committed Dec 3, 2011
  6. @danlucraft

    Merge pull request #2 from FooBarWidget/master

    Fixed Ruby 1.9 encoding problems
    danlucraft committed Dec 3, 2011
Commits on Sep 3, 2010
  1. @FooBarWidget
Commits on Jul 29, 2010
  1. @taw

    A few bugs caused by combinations of -m/-v/-c options fixed.

    Both these work in ack:
    * -v -c displayed incorrect results if multiple matches per line occurred.
    * -m -A/-C didn't display context after last match.
    * -v -c -m treated lines after last match as missing instead of non-matching.
    -m can be interpreted in two ways:
    * Lines after limit still match, just don't print them (ack interpretation)
    * Lines after limit just don't match (what I made rak do)
    I don't see any way to have any kind of --eval and ack way.
    --eval 'break' means code after that should never execute.
    This only affects a few edge cases like:
    * matches in -A/-C context past -m limit
    * -v -c -m counts
    * -c -m counts
    I'm not convinced -c -m is such a sensible combination.
    -A/-C -m is rather sensible, but neither way seems to me
    to be obviously wrong (except with --eval).
    taw committed Jul 29, 2010
  2. @taw

    Finishing spec cleanup.

    As we never leave ANSI escape codes untouched,
    and we almost always want indentation, make rak()
    do both by default. Now specs really look pretty good.
    Also replaced split("\n") by String#lines (which keeps final \n) -
    makes code somewhat cleaner in a few places.
    taw committed Jul 29, 2010
  3. @taw

    Big spec cleanup.

    * Extra test added checking that all supported options are mentioned in help message.
    * To make test readable, <<-END.unindent(6) + properly indented text everywhere.
    * rak() command to do suitable ansi stripping itself.
    * RSpec supports syntax like:
        foo.should include bar
      which seems more readable than:
        foo.include?(bar).should be_true
    This isn't how I want spec to look like, I'm just committing as
    it's a good cleanup milestone.
    taw committed Jul 29, 2010
  4. @taw
Commits on Jul 27, 2010
  1. @taw

    Merge remote branch 'danlucraft/master'

    Conflicts (trivial):
    taw committed Jul 27, 2010
  2. @taw

    First attempt at making --eval interface for filtering with arbitrary…

    … Ruby code.
    It's sort of like -nle/-ple. It's controlled mainly with next.
    Simple examples (if rak_spec.rb is too unreadable):
    Print lines 15..25 (only line 20 is highlighted)
      rak -C5 --eval 'next unless $. == 20'
    Grep numbers.
    Except skip every section demarkated by =begin and =end.
    (skipped lines could still show up as context)
      rak --eval 'next if $_[/^=begin/]..$_[/^=end]; next unless $_ =~ /\d+/'
    Look for fails but only in first 1000 lines per file.
      rak --eval 'break if $. >= 1000; next unless $_ =~ /\bfail/i'
    This next/break would be quite straightforward except we don't
    actually want to next/break File#each_line's loop - we just want
    to go to no-match case once (for next) or for the rest of file
    (for break).
    We should probably really break once after-context clears,
    for the sake of performance.
    (By the way -m in rak actually breaks, and by doing so it doesn't
    print context requested. It works in ack, but it keeps highlighting -
    I'm not sure if this is right)
    This --eval doesn't have elegant solution for multiple matches.
    $_.scan(/x/) { matches << $~ } is the best it can do.
    It really should be something simple like /x/g.
    This interface wasn't really all that well thought, I just hacked
    it together and it seems to work for now.
    Full --eval API:
    * $_ is current line. $. is current line number. $~ is nil.
    * next means no match
    * break means no match for either this line or the rest of file
    * finishing without next or break means a match
    * if you put something in matches, we're done
      (horrible things will happen if these don't correspond to current state of $_)
    * otherwise if $~ is not-nil - it's considered a match
      (likewise, horrible things will happen if $~ is not against current $_)
    * otherwise, the entire line is considered a single match.
    taw committed Jul 27, 2010
Commits on Jul 25, 2010
  1. @taw

    Code generation cleanup.

    I was mostly trying to figure out why rak is slower than ack.
    Turns out it's mostly Ruby i/o being slower, and these tiny changes
    have nearly no measurable effect. Oh well.
    taw committed Jul 25, 2010
  2. @taw

    Use Pathname, not String for pathnames.

    Dir["#{path}/*"] will fail if path contains any unusual characters,
    like {}s from firefox extension ids for example.
    Now it works and is even marginally nicer.
    taw committed Jul 25, 2010
  3. @taw

    Option to print skipped files.

    Weird file extensions happen all the time, and it's often difficult
    to figure out if rak doesn't find a match because it's not there,
    or because it doesn't recognize some extension. --skipped will
    provide a quick way to check that.
    Related - added some extra extensions for SML.
    taw committed Jul 25, 2010
Commits on Jul 23, 2010
  1. @taw

    More source file types.

    I merged three lists:
    * One from rak
    * One from ack (except special types like --text/--binary/--skipped)
    * One I made up for a script for grepping my own code repository some time ago
      (only those types that seem likely to be useful more generally)
    In case you're wondering .l / .y / .mll / .mly are lex/yacc for C/ocaml.
    taw committed Jul 23, 2010
  2. @taw
  3. @danlucraft

    Fix help

    danlucraft committed Jul 23, 2010
  4. @taw
  5. @taw

    Major refactoring of file finding part. It was all so entangled that …

    …it wasn't at all
    clear what it was doing, and it turned out it was doing many dubious things.
    Some decisions current version does could be argued with too, but it should at least
    be a lot cleaner what it does.
    * Except when sorting, files are yielded to matching or printing
      immediately. This makes a huge difference in cases like grepping server
      logs over sshfs - something I do pretty much every day.
    * File matches .x if it ends with .x,
      Old rak treated all .css files as .c files.
      (also take a look at extension_regexp method)
    * Always check shebang for type if unknown, even with -a.
      Otherwise -a can actually cause fewer files to match than not
      using -a, and cause other weirdness. For example
      --perl matched #!perl scripts, but -a --perl did not, leading to
    * VCS check is only performed on descend. If you explicitly
      ask for .svn - or for something need inside .svn , you should get it.
    * Regexp vs Oniguruma::ORegexp stuff mostly moved aside to separate
    * Somehow this didn't make any spec fail. You should recheck that
      it does what it's supposed to anyway. Big changes that don't cause
      test failures are suspicious ;-)
    I think this closes list of issues I had with rak.
    taw committed Jul 23, 2010
Commits on Jul 22, 2010
  1. @taw

    I want to make rak able to match files immediately once found,

    not with a huge delay.
    Code related to that is now very messy, so first a few batches of refactoring.
    Some comments:
    * You cannot have both :do_search and :print_filelist be true
      It's possible that neither is true (help messages etc.) but then
      we instantly quit. Using that, a lot of cleanup.
    * xs.each{|x| puts x}
      is exactly what puts does on collections already
      puts xs
    * Testing if(x == false) is usually a bad idea. A lot of operations
      don't really bother being consistent about returning false vs nil,
      or true vs anything-but-false-and-nil. Better test if(!x) or unless(x).
    * huge cascades of nested if/elses can usually be simplified a lot.
    * when doing open(file).readline there's not much reason to rescue
      out of readline's problems but not of open's
      By the way ruby files don't need closing - gc handles them really well,
      If you write to file then open(fn,'w'){|f| ... } ensures close
      on block exit, and so you know file state. For just reading,
      open(fn).read_something works pretty much as well.
      (whichever is cleaner)
    taw committed Jul 22, 2010
  2. @taw

    Not only is all != re not a valid shortcut, it also slows rak down a …

    and I cannot think of any case where it actually improves performance.
    For piping in real time - rak just couldn't do that.
    For huge files, rak would runs out of memory instead of just keeping a few lines at time.
    Or if we only cared for the first N first - then just reading so much more data in
    was pointless.
    Not to mention regexps that had exponential backtracking - they are totally fine
    if matched against one line at a time, but a huge file?
    Anyway, now it's faster, corrected, and can handle streams.
    taw committed Jul 22, 2010
  3. @taw
  4. @taw

    Got rid of extra \n on the end.

    Also made -v use the same separator newlines as normal match.
    In both cases new behaviour is exactly what ack does.
    taw committed Jul 22, 2010
  5. @taw

    With tests showing failures this time.

    You need to parenthesize a regexp before prepending or appending
    ^$\b etc.
    $ echo "ruby" | rak_before -s 'x|y'
    $ echo "ruby" | rak_after -s 'x|y'
    Now this solves -s/-e/-x, but with -w it's a more complicated
    I started cross-checking these things against other
    regexp tools and failures are everywhere.
    Bug fixed in this patch, ack also has it
      $ echo 123 | egrep -w "1|3"
      $ echo 123 | pcregrep -w "1|3"
      $ echo 123 | ack -w "1|3"
      $ echo 123 | rak_before -w "1|3"
      $ echo 123 | rak_after -w "1|3"
    ack also has another bug on its own
    it skips \b if *regexp* starts/ends with non-\w
    That is it thinks -w of /12?/ is /\b12?/
      $ echo 123 | egrep -w "12?"
      $ echo 123 | pcregrep -w "12?"
      $ echo 123 | ack -w "12?"
      $ echo 123 | rak_before -w "12?"
      $ echo 123 | rak_after -w "12?"
    So -w definitely doesn't mean "one word"
      $ echo "0 0" | egrep -w "0.0"
      0 0
      $ echo "0 0" | pcregrep -w "0.0"
      0 0
      $ echo "0 0" | ack -w "0.0"
      0 0
      $ echo "0 0" | rak_before -w "0.0"
      1|0 0
      $ echo "0 0" | rak_after -w "0.0"
      1|0 0
    ack/rak say '' is -w , egrep/pcregrep say '' is not -w
      $ echo 123 | egrep -w "1?"
      $ echo 123 | pcregrep -w "1?"
      $ echo 123 | ack -w "1?"
      $ echo 123 | rak_before -w "1?"
      $ echo 123 | rak_after -w "1?"
    Similar, except ack matches 1 due to bug, rak matches ''
      $ echo "1xx" | egrep -w "^\d*"
      $ echo "1xx" | pcregrep -w "^\d*"
      $ echo "1xx" | ack -w "^\d*"
      $ echo "1xx" | rak_before -w "^\d*"
      $ echo "1xx" | rak_after -w "^\d*"
    And this is just ridiculous. Now egrep matches '', but rak doesn't?
      $ echo " 1xx" | egrep -w "^\d*"
      $ echo " 1xx" | pcregrep -w "^\d*"
      $ echo " 1xx" | ack -w "^\d*"
      $ echo " 1xx" | rak_before -w "^\d*"
      $ echo " 1xx" | rak_after -w "^\d*"
    And to make confusion complete:
      $ echo "1xx" | egrep "\b^\d*\b"
      $ echo "1xx" | pcregrep "\b^\d*\b"
      $ echo "1xx" | ack "\b^\d*\b"
      $ echo "1xx" | rak_before "\b^\d*\b"
      $ echo "1xx" | rak_after "\b^\d*\b"
    By the way /\b./ matches "1" but not " " in perl/ruby/egrep
    regexps - start and end of string are treated as non-words.
    (also \b^ = ^\b - they are zero length so their order doesn't matter)
    If you have more clue than me, do tell.
    taw committed Jul 22, 2010
  6. @danlucraft

    Update version to 1.1

    danlucraft committed Jul 22, 2010
  7. @taw

    All specs now pass.

    * Pathname from Ruby stdlib is an instant solution to the huge mess of File.blahs/Dir.blehs
    * Using supposedly correct way of finding Ruby binary (supports ruby1.9, jruby, ruby.exe etc.).
    * All specs standarized on behaviour that a group like:
      1| line
      2| line
      is always followed by empty line.
      I don't think the last one should - newline should be separator not terminator,
      but this is another issue.
    * Reason #1 why Ruby is awesome for Unix scripting is that it makes it so easy
      to totally avoid manipulating global state. Changing directory in any
      way other than Dir.chdir(dir){ ... } or exactly once on start/exit is just so shell...
      Current directory is global state of the worst kind - affecting every
      almost every i/o operation. Never do that.
    * ENV isn't as bad as chdir, but it's far better to encapsulate it.
      Here oddly Perl pwns Ruby with its local $ENV{A}='B' that restores
      either old value or its absence on exit.
      Ruby code for that would be something horrible like:
      def with_changed_env(key, *args)
        existed, old = ENV.has_key?(key), ENV[key]
          if args.empty?
            ENV[key] = args[0]
          if existed
            ENV[key] = old
      But know RAK_TEST doesn't exist when started, so it's simplified.
    Anyway there is no global mutable state now left,
    and all specs pass.
    taw committed Jul 22, 2010
  8. @taw
  9. @danlucraft
  10. @danlucraft
  11. @danlucraft

    Clean up tests

    danlucraft committed Jul 22, 2010
  12. @taw

    Fixed XML files autodetection.

    Ruby doesn't support /\Q<?xml/ from Perl's regexps.
    To get \Q effect you need to backslash manually /<\?xml/,
    call Regexp.escape("<?xml"), or use just ignore regexps
    and do something like line["<?xml"].
    taw committed Jul 22, 2010
  13. @taw

    It's a big interconnected patch, but these issues are related.

    To find multiple matches you cannot cut string after first match,
    and match against the rest. If you do that /^x/ will match 'xxx'
    three times. The first match is done correctly, so same lines get printed,
    but highlighting and --output will be broken.
    This also led to infinite loops if regexp could match empty string
    (eg. matching empty lines with /^$/, (?=) look-ahead hackery, or common mistakes)
    (pcregrep seems to have the same bug)
    If last match also matches final \n, puts'ing its post_match will
    produce double \ns in output.
    If matched line contained any tabs (like far too much code),
    then prefixing it with anything will just break it.
    Now ignoring this issue entirely seems to be the standard practice.
    I'd still prefer to fix it, as workarounds are really hard:
    * there's no way to pre-expand tabs before rak if more than one file is searched
    * piping to tab expansion disables highlighting
    * Even with highlighting forced, results of tab expansion on output
      would still be completely wrong because of line prefixes.
      (not to mention added difficulty of expanding tabs in
      presence of unprintable escape codes)
    * expanding tabs in original files would be a good idea,
      except when it's someone else's code repository ;-)
    So it has to be either done in rak, or stay broken.
    Expanding tabs is usually simple (String#expand_tabs).
    There is however a big complication:
    * We cannot expand tabs before matching, as that would change
      what matches and what doesn't.
    * We cannot expand tabs between matching and highlighting,
      as match indexes refer to old string, and completely
      incorrect things would get highlighted.
    * We cannot expand between highlighting and printing,
      unless we teach String#expand_tabs which characters
      are unprintable highlighting codes.
    We can cut string into pieces on match boundaries and expand these separately.
    Recursive formula is only slightly nontrivial, as we need to count offsets
    after not before expansion:
        ax = a.expand_tabs(i)
        bx = b.expand_tabs(i+ax.size)
    It's not terribly pretty but it works just fine.
    taw committed Jul 22, 2010
  14. @taw

    Improved shebang matches:

    * Accept any numbers after executable name (ruby1.9, python3.0 etc.)
    * Added /sh/ for shell and /make/ for Makefile
    * Made it case insensitive ( uses that...)
    * If something has shebang, it's almost certainly code, so it should be searched by default.
    taw committed Jul 22, 2010