Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
mbrubeck committed Jan 13, 2010
0 parents commit a15eeb7
Show file tree
Hide file tree
Showing 4 changed files with 175 additions and 0 deletions.
13 changes: 13 additions & 0 deletions OutlineGrep.cabal
@@ -0,0 +1,13 @@
Name: OutlineGrep
Version: 1.0
Description: Search through indentation-structured text files.
License: MIT
License-file: README.markdown
Author: Matt Brubeck
Maintainer: mbrubeck@limpet.net
Build-Type: Simple
Cabal-Version: >=1.2
Data-Files: compleat_setup
Executable ogrep
Main-is: OutlineGrep.lhs
Build-Depends: base >= 3 && < 5, haskell98, regex-posix
85 changes: 85 additions & 0 deletions OutlineGrep.lhs
@@ -0,0 +1,85 @@
#!/usr/bin/env runhaskell

This is a short program that acts like "grep," but treats its input as a
outline based on indentation, and preserves this structure in its output.

> import Data.Char
> import IO
> import System
> import Text.Regex.Posix

An outline has three parts: The first node, its children (another outline),
and the remaining nodes (also an outline). The outline with zero nodes is
called Empty. An outline is basically a list of trees.

> data Outline a = Empty | Outline a (Outline a) (Outline a)

This helper function tells if an outline is empty.

> empty Empty = True
> empty _ = False

We group input lines into nodes by the amount of leading whitespace. A tab
counts the same as a space, so you may not want to mix tabs and spaces.

> indentLevel :: String -> Int
> indentLevel = length . takeWhile isSpace

readNodes reads a serious of nodes starting at a specified column. It returns
the nodes as an outline, along with any remaining lines.

> readNodes :: Int -> [String] -> (Outline String, [String])
> readNodes col [] = (Empty, [])
> readNodes col (x:xs) =
> let n = indentLevel x in
> if n < col then (Empty, x:xs)
> else let (children, xs') = readNodes (n+1) xs
> (rest, xs'') = readNodes col xs'
> in ((Outline x children rest), xs'')

To read an outline from a file, we read all the top-level nodes.

> readOutline :: String -> Outline String
> readOutline = fst . readNodes 0 . lines

To print an outline, we just turn it into a list and print each line.
(Indentation is preserved by the readOutline function.)

> prettyPrint :: Outline String -> String
> prettyPrint = unlines . flatten
>
> flatten :: Outline a -> [a]
> flatten Empty = []
> flatten (Outline root children rest) =
> root : flatten children ++ flatten rest

To prune an outline, we remove any subtree that contains no matching nodes.
This leaves the matching nodes and all their ancestors.

> prune :: (a -> Bool) -> Outline a -> Outline a
> prune p Empty = Empty
> prune p (Outline root children rest) =
> let rest' = prune p rest
> children' = prune p children in
> if p root || not (empty children')
> then (Outline root children' rest')
> else rest'

Our main program takes a regex as its first argument, and reads an outline
from the file named by the second argument (default stdin). It prunes the
outline using the regex, and prints the result.

> main = do
> (pattern:fileNames) <- getArgs
> handle <- input fileNames
> s <- hGetContents handle
> let o = readOutline s
> let o' = prune (=~ pattern) o
> putStr $ prettyPrint o'

Note: Only the single filename is used; remaining arguments are ignored.
(I should fix this.)

> input :: [String] -> IO Handle
> input [] = return stdin
> input args = openFile (head args) ReadMode
73 changes: 73 additions & 0 deletions README.markdown
@@ -0,0 +1,73 @@
Outline Grep
============

Given a text file with a Python-style indentation structure, `ogrep`
searches the file for a regular expression. It prints matching lines, with
their "parent" lines as context. For example, if input.txt looks like this:

2009-01-01
New Year's Day!
No work today.
Visit with family.
2009-01-02
Grocery store and library.
2009-01-03
Stay home.
2009-01-04
Back to work.
Remember to set an alarm.

then `ogrep work input.txt` will produce the following output:

2009-01-01
New Year's Day!
No work today.
2009-01-04
Back to work...

Installation
------------

Get the source code: `git clone git://github.com/mbrubeck/outline-grep.git; cd
outline-grep`

OS X or Windows users, download the [Haskell Platform][1]. (Mac OS X 10.6 may
require a [workaround][2] for 64-bit compatibility.) Then type `cabal
install`.

Debian/Ubuntu users, run: `sudo aptitude install libghc6-regex-posix-dev`,
then run:

./Setup.lhs configure
./Setup.lhs build
sudo ./Setup.lhs install

Or to try the program without building and installing it, just run
`./OutlineGrep.lhs` in the source directory after installing the Haskell
Platform (or the libghc6-regex-posix-dev package).

Copyright
---------

Copyright (c) 2009 Matt Brubeck

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
4 changes: 4 additions & 0 deletions Setup.lhs
@@ -0,0 +1,4 @@
#!/usr/bin/env runhaskell

> import Distribution.Simple
> main = defaultMain

0 comments on commit a15eeb7

Please sign in to comment.