Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit a15eeb7
Showing
4 changed files
with
175 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Name: OutlineGrep | ||
Version: 1.0 | ||
Description: Search through indentation-structured text files. | ||
License: MIT | ||
License-file: README.markdown | ||
Author: Matt Brubeck | ||
Maintainer: mbrubeck@limpet.net | ||
Build-Type: Simple | ||
Cabal-Version: >=1.2 | ||
Data-Files: compleat_setup | ||
Executable ogrep | ||
Main-is: OutlineGrep.lhs | ||
Build-Depends: base >= 3 && < 5, haskell98, regex-posix |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
#!/usr/bin/env runhaskell | ||
|
||
This is a short program that acts like "grep," but treats its input as a | ||
outline based on indentation, and preserves this structure in its output. | ||
|
||
> import Data.Char | ||
> import IO | ||
> import System | ||
> import Text.Regex.Posix | ||
|
||
An outline has three parts: The first node, its children (another outline), | ||
and the remaining nodes (also an outline). The outline with zero nodes is | ||
called Empty. An outline is basically a list of trees. | ||
|
||
> data Outline a = Empty | Outline a (Outline a) (Outline a) | ||
|
||
This helper function tells if an outline is empty. | ||
|
||
> empty Empty = True | ||
> empty _ = False | ||
|
||
We group input lines into nodes by the amount of leading whitespace. A tab | ||
counts the same as a space, so you may not want to mix tabs and spaces. | ||
|
||
> indentLevel :: String -> Int | ||
> indentLevel = length . takeWhile isSpace | ||
|
||
readNodes reads a serious of nodes starting at a specified column. It returns | ||
the nodes as an outline, along with any remaining lines. | ||
|
||
> readNodes :: Int -> [String] -> (Outline String, [String]) | ||
> readNodes col [] = (Empty, []) | ||
> readNodes col (x:xs) = | ||
> let n = indentLevel x in | ||
> if n < col then (Empty, x:xs) | ||
> else let (children, xs') = readNodes (n+1) xs | ||
> (rest, xs'') = readNodes col xs' | ||
> in ((Outline x children rest), xs'') | ||
|
||
To read an outline from a file, we read all the top-level nodes. | ||
|
||
> readOutline :: String -> Outline String | ||
> readOutline = fst . readNodes 0 . lines | ||
|
||
To print an outline, we just turn it into a list and print each line. | ||
(Indentation is preserved by the readOutline function.) | ||
|
||
> prettyPrint :: Outline String -> String | ||
> prettyPrint = unlines . flatten | ||
> | ||
> flatten :: Outline a -> [a] | ||
> flatten Empty = [] | ||
> flatten (Outline root children rest) = | ||
> root : flatten children ++ flatten rest | ||
|
||
To prune an outline, we remove any subtree that contains no matching nodes. | ||
This leaves the matching nodes and all their ancestors. | ||
|
||
> prune :: (a -> Bool) -> Outline a -> Outline a | ||
> prune p Empty = Empty | ||
> prune p (Outline root children rest) = | ||
> let rest' = prune p rest | ||
> children' = prune p children in | ||
> if p root || not (empty children') | ||
> then (Outline root children' rest') | ||
> else rest' | ||
|
||
Our main program takes a regex as its first argument, and reads an outline | ||
from the file named by the second argument (default stdin). It prunes the | ||
outline using the regex, and prints the result. | ||
|
||
> main = do | ||
> (pattern:fileNames) <- getArgs | ||
> handle <- input fileNames | ||
> s <- hGetContents handle | ||
> let o = readOutline s | ||
> let o' = prune (=~ pattern) o | ||
> putStr $ prettyPrint o' | ||
|
||
Note: Only the single filename is used; remaining arguments are ignored. | ||
(I should fix this.) | ||
|
||
> input :: [String] -> IO Handle | ||
> input [] = return stdin | ||
> input args = openFile (head args) ReadMode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
Outline Grep | ||
============ | ||
|
||
Given a text file with a Python-style indentation structure, `ogrep` | ||
searches the file for a regular expression. It prints matching lines, with | ||
their "parent" lines as context. For example, if input.txt looks like this: | ||
|
||
2009-01-01 | ||
New Year's Day! | ||
No work today. | ||
Visit with family. | ||
2009-01-02 | ||
Grocery store and library. | ||
2009-01-03 | ||
Stay home. | ||
2009-01-04 | ||
Back to work. | ||
Remember to set an alarm. | ||
|
||
then `ogrep work input.txt` will produce the following output: | ||
|
||
2009-01-01 | ||
New Year's Day! | ||
No work today. | ||
2009-01-04 | ||
Back to work... | ||
|
||
Installation | ||
------------ | ||
|
||
Get the source code: `git clone git://github.com/mbrubeck/outline-grep.git; cd | ||
outline-grep` | ||
|
||
OS X or Windows users, download the [Haskell Platform][1]. (Mac OS X 10.6 may | ||
require a [workaround][2] for 64-bit compatibility.) Then type `cabal | ||
install`. | ||
|
||
Debian/Ubuntu users, run: `sudo aptitude install libghc6-regex-posix-dev`, | ||
then run: | ||
|
||
./Setup.lhs configure | ||
./Setup.lhs build | ||
sudo ./Setup.lhs install | ||
|
||
Or to try the program without building and installing it, just run | ||
`./OutlineGrep.lhs` in the source directory after installing the Haskell | ||
Platform (or the libghc6-regex-posix-dev package). | ||
|
||
Copyright | ||
--------- | ||
|
||
Copyright (c) 2009 Matt Brubeck | ||
|
||
Permission is hereby granted, free of charge, to any person | ||
obtaining a copy of this software and associated documentation | ||
files (the "Software"), to deal in the Software without | ||
restriction, including without limitation the rights to use, | ||
copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the | ||
Software is furnished to do so, subject to the following | ||
conditions: | ||
|
||
The above copyright notice and this permission notice shall be | ||
included in all copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, | ||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES | ||
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | ||
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT | ||
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, | ||
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING | ||
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR | ||
OTHER DEALINGS IN THE SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
#!/usr/bin/env runhaskell | ||
|
||
> import Distribution.Simple | ||
> main = defaultMain |