Browse files


  • Loading branch information...
1 parent f854f17 commit 4fa335784771dc179a9b5095175f4bc02b57ab43 @ozataman committed Apr 6, 2012
Showing with 39 additions and 55 deletions.
  1. +27 −50 README.markdown
  2. +12 −5 csv-conduit.cabal
@@ -20,29 +20,23 @@ This library is an attempt to close these gaps.
## This package
-csv-enumerator is an enumerator-based CSV parsing library that is easy to use,
-flexible and fast. Furthermore, it provides ways to use constant-space during
-operation, which is absolutely critical in many real world use cases.
+csv-conduit is a conduits based CSV parsing library that is easy to
+use, flexible and fast. Furthermore, it provides ways to use
+constant-space during operation, which is absolutely critical in many
+real world use cases.
### Introduction
-* ByteStrings are used for everything
+* The CSVeable typeclass implements the key operations.
+* CSVeable is parameterized on both a stream type and a target CSV row type.
* There are 2 basic row types and they implement *exactly* the same operations,
so you can chose the right one for the job at hand:
- - type MapRow = Map ByteString ByteString
- - type Row = [ByteString]
-* Folding over a CSV file can be thought of as the most basic operation.
-* Higher level convenience functions are provided to "map" over CSV files,
- modifying and transforming them along the way.
-* Helpers are provided for simple input/output of CSV files for simple use
- cases.
-* For extreme / advanced use cases, the user can drop down to the
- Enumerator/Iteratee level and do interleaved IO among other things.
-### API Docs
-The API is quite well documented and I would encourage you to keep it handy.
+ - type MapRow t = Map t t
+ - type Row t = [t]
+* You basically use the Conduits defined in this library to do the
+ parsing from a CSV stream and rendering back into a CSV stream.
+* Use the full flexibility and modularity of conduits for sources and sinks.
### Speed
@@ -57,42 +51,25 @@ regressions or optimization opportunities.
{-# LANGUAGE OverloadedStrings #-}
- import Data.CSV.Enumerator
- import Data.Char (isSpace)
- import qualified Data.Map as M
- import Data.Map ((!))
- -- Naive whitespace stripper
- strip = reverse . B.dropWhile isSpace . reverse . B.dropWhile isSpace
+ import Data.Conduit.Text
+ import Data.Conduit.Binary
+ import Data.Conduit
+ import Data.CSV.Conduit
+ -- Let's simply stream from a file, parse the CSV, reserialize it
+ -- and push back into another file.
+ test :: IO ()
+ test = runResourceT $
+ sourceFile "test/BigFile.csv" $=
+ decode utf8 $=
+ (intoCSV defCSVSettings
+ :: forall m. MonadResource m => Conduit Text m (MapRow Text)) $=
+ fromCSV defCSVSettings $=
+ encode utf8 $$
+ sinkFile "test/BigFileOut.csv"
- -- A function that takes a row and "emits" zero or more rows as output.
- processRow :: MapRow -> [MapRow]
- processRow row = [M.insert "Column1" fixedCol row]
- where fixedCol = strip (row ! "Column1")
- main = mapCSVFile "InputFile.csv" defCSVSettings procesRow "OutputFile.csv"
and we are done.
-Further examples to be provided at a later time.
-### TODO - Next Steps
-* Refactor all operations to use iterCSV as the basic building block --
- in progress.
-* The CSVeable typeclass can be refactored to have a more minimal definition.
-* Get mapCSVFiles out of the typeclass if possible.
-* Need to think about specializing an Exception type for the library and
- properly notifying the user when parsing-related problems occur.
-* Some operations can be further broken down to their atoms, increasing the
- flexibility of the library.
-* Operating on Text in addition to ByteString would be phenomenal.
-* A test-suite needs to be added.
-* Some benchmarking would be nice.
-Any and all kinds of help is much appreciated!
@@ -33,7 +33,18 @@ Description:
* Fast operation
- This library is an attempt to close these gaps.
+ This library is an attempt to close these gaps. Please note that
+ this library started its life based on the enumerator package and
+ has recently been ported to work with conduits instead. In the
+ process, it has been greatly simplified thanks to the modular nature
+ of the conduits library.
+ .
+ Following the port to conduits, the library has also gained the
+ ability to parameterize on the stream type and work both with
+ ByteString and Text.
For more documentation and examples, check out the README at:
@@ -42,10 +53,6 @@ Description:
- The API is fairly well documented and I would encourage you to keep your
- haddocks handy. If you run into problems, just email me or holler over at
- #haskell.

0 comments on commit 4fa3357

Please sign in to comment.