Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

parse returns a lazy seq of vectors, delimiter support, parsing strings. #2

Merged
merged 6 commits into from

2 participants

@RobinRamael

Hi,

I added the possibility to let parse return a lazy-seq of vectors instead of maps,
added basic support for using custom delimiters and support for parsing strings instead of just reading from files.
I hope this project isn't dead, it's the only parser that worked for me...
(disregard the ither pull requet I closed, there were problems with it)

Robin

RobinRamael added some commits
@RobinRamael RobinRamael parse can returns vectors and custom delimiters
Added the possibility to let parse return a lazy-seq of vectors instead of maps and
added supprt for using custom delimiters.
05cbbd3
@RobinRamael RobinRamael Added support for stringparsing instead of only reading from files. 145c856
@grinnbearit
Owner

Not at all, I just haven't seen this code for a while

Do you feel that instead of adding multiple readers, i.e. for files, strings, urls etc it could be made more general? Just accepting reader for instance.

Also the vectors is a good idea, how bout trying optional keyword params instead of arguments. That way it could be changed later to allow more options without breaking existing code.

@grinnbearit grinnbearit reopened this
@RobinRamael

You're right, just accepting a reader and arguments would be best.
I see the usage then being something like (csv/parse reader :mapped (true/false) :delimiter \; :end-of-line \\r).

I probably pushed this too soon, as i'm still figuring out what I need the parser to do myself. I'll push my work to this request when I'm sure.

@grinnbearit
Owner
RobinRamael added some commits
@RobinRamael RobinRamael Parse now takes a reader and options.
This restructures the ideas of my last two commits:
- Parse now takes a reader and the options :mapped, :delimiter and :quoter.
- Added two helper-functions parse-file and parse-string which take a
  filename and a string resp.
- Updated the documentation for those functions.
4bc3e35
@RobinRamael RobinRamael Added tests. 24313df
@RobinRamael RobinRamael Updated README. 10aa3c9
@RobinRamael

There we go. I've implemented the parse function with the options, added the parse-file and parse-string helper-functions, added tests and updated the readme.

@grinnbearit grinnbearit merged commit d17346e into grinnbearit:master
@grinnbearit
Owner

looks great! merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on May 21, 2011
  1. @RobinRamael

    parse can returns vectors and custom delimiters

    RobinRamael authored
    Added the possibility to let parse return a lazy-seq of vectors instead of maps and
    added supprt for using custom delimiters.
  2. @RobinRamael
Commits on May 22, 2011
  1. @RobinRamael

    Parse now takes a reader and options.

    RobinRamael authored
    This restructures the ideas of my last two commits:
    - Parse now takes a reader and the options :mapped, :delimiter and :quoter.
    - Added two helper-functions parse-file and parse-string which take a
      filename and a string resp.
    - Updated the documentation for those functions.
  2. @RobinRamael

    Added tests.

    RobinRamael authored
  3. @RobinRamael

    Updated README.

    RobinRamael authored
  4. @RobinRamael

    The readme is in markdown.

    RobinRamael authored
This page is out of date. Refresh to see the latest.
View
31 README
@@ -1,31 +0,0 @@
-# opencsv-clj
-
-A lazy opencsv (http://opencsv.sourceforge.net/) wrapper in Clojure
-
-## Usage
-
-(require '[opencsv-clj.core :as csv])
-(take 1 (csv/parse "filename"))
-
-## Installation
-
-leiningen
-
-[opencsv-clj "1.0.0-SNAPSHOT"]
-
-maven
-
-<dependency>
- <groupId>opencsv-clj</groupId>
- <artifactId>opencsv-clj</artifactId>
- <version>1.0.0-SNAPSHOT</version>
-</dependency>
-
-## License
-
-Copyright (C) 2010 Sidhant Godiwala
-
-Distributed under the Eclipse Public License, the same as Clojure.
-
-
-
View
56 README.md
@@ -0,0 +1,56 @@
+# opencsv-clj
+
+A lazy opencsv (http://opencsv.sourceforge.net/) wrapper in Clojure
+
+## Usage
+
+(require '[opencsv-clj.core :as csv])
+
+Get a lazy seq of the lines from a csv file:
+ (csv/parse-file "filename")
+
+or directly from a string:
+ (csv/parse-string "a,b,c")
+ => (["a" "b" "c"])
+
+or from any reader you want, really:
+ (csv/parse (MyAwesomeReader.))
+
+These three functions can take three options:
+
+* :mapped - when true, the function returns a map for each line with the csv's header (first line) as keys (default: false)
+
+ (get (second (csv/parse-string "1,2,3\n2,4,6\n3,6,9" :mapped true)) "2")
+ ==> "6"
+
+* :delimiter - the delimiterchar used by the parser (default \,)
+
+ (csv/parse-string "a+b+c")
+ => (["a" "b" "c"])
+
+* :quoter - the quote character used by the parser (default \")
+
+ (csv/parse-string "'a,a,a','1,2,3','I,II,III'" :quoter \')
+ => (["a,a,a" "1,2,3" "I,II,III"])
+
+
+
+## Installation
+
+Leiningen:
+
+ [opencsv-clj "1.0.0-SNAPSHOT"]
+
+Maven:
+
+ <dependency>
+ <groupId>opencsv-clj</groupId>
+ <artifactId>opencsv-clj</artifactId>
+ <version>1.0.0-SNAPSHOT</version>
+ </dependency>
+
+## License
+
+Copyright (C) 2010 Sidhant Godiwala
+
+Distributed under the Eclipse Public License, the same as Clojure.
View
45 src/opencsv_clj/core.clj
@@ -4,23 +4,42 @@
(:import [au.com.bytecode.opencsv CSVReader CSVWriter])
(:require [clojure.java.io :as io]))
-(defn- read-csv [path]
+(defn- read-csv [reader delimiter quoter]
"Reads a csv file and generates a lazy sequence of rows"
- (let [buffer (CSVReader. (io/reader path))]
+ (let [buffer (CSVReader. reader delimiter quoter)]
(lazy-seq
(loop [res []]
(if-let [nxt (.readNext buffer)]
- (recur (conj res (seq nxt)))
- res)))))
+ (recur (conj res (seq nxt)))
+ res)))))
(defn- parse-csv [csv-seq]
"Converts a lazy sequence of rows to a lazy sequence of maps"
(let [header (first csv-seq)]
(map #(zipmap header %) (rest csv-seq))))
-(defn parse [path]
- "Converts a csv file to a lazy sequence of maps of the values where the keys are the items in the header row"
- (parse-csv (read-csv path)))
+(defn parse
+ "Converts a csv reader to a lazy sequence of vectors of the values or the values mapped to the header.
+ Options:
+ :mapped - if true, returns the values of each row mapped to the header(default false)
+ :delimiter - the delimiter char used by the parser (default \\,,)
+ :quoter - the quote char used by the parser (default \\\" "
+ [reader & {:keys [mapped delimiter quoter]
+ :or {mapped false delimiter \, quoter \"}}]
+ (let [csv-seq (read-csv reader delimiter quoter)]
+ (if mapped
+ (parse-csv csv-seq)
+ (map vec csv-seq))))
+
+(defn parse-file
+ "Converts a csv file to a lazy sequence of maps of the values where the keys are the items in the header row. Takes the same options as (parse)"
+ [f & opts]
+ (apply parse (io/reader f) opts))
+
+(defn parse-string
+ "Converts a csv string to a lazy sequence of maps of the values where the keys are the items in the header row. Takes the same options as (parse)"
+ [s & opts]
+ (apply parse (java.io.StringReader. s) opts))
(defn dump
"Dumps a sequence of maps to a file given by path, pass a header to specify the order and columns to be written"
@@ -30,9 +49,9 @@
([path csv-seq header]
(with-open [writer (io/writer path)]
(let [csv-writer (CSVWriter. writer)]
- (.. csv-writer (writeNext (into-array (map str header))))
- (doseq [entry (map (comp into-array
- (fn [csv-entry] (map (comp str #(get csv-entry %))
- header)))
- csv-seq)]
- (.. csv-writer (writeNext entry)))))))
+ (.. csv-writer (writeNext (into-array (map str header))))
+ (doseq [entry (map (comp into-array
+ (fn [csv-entry] (map (comp str #(get csv-entry %))
+ header)))
+ csv-seq)]
+ (.. csv-writer (writeNext entry)))))))
View
37 test/opencsv_clj/core_test.clj
@@ -2,5 +2,38 @@
(:use [opencsv-clj.core] :reload-all)
(:use [clojure.test]))
-(deftest replace-me ;; FIXME: write
- (is false))
+(deftest basic-functionality
+ (is (= '(["a" "b" "c"]) (parse-string "a,b,c")))
+ (is (= '(["", ""]) (parse-string ",")))
+ (is (= '() (parse-string ""))))
+
+
+(deftest mapping-to-headers
+ (is (= "e"
+ (get
+ (first (parse-string "a,b,c\nd,e,f" :mapped true))
+ "b"))))
+
+(deftest quoting
+ (is (= '(["a,a", "b,b", "c,c"]) (parse-string "\"a,a\",\"b,b\",\"c,c\""))))
+
+(deftest different-tokens
+ (is (= '(["a" "b" "c"] ["d" "e" "f"])
+ (parse-string "'a';'b';'c'\n'd';'e';'f'"
+ :delimiter \;
+ :quoter \'
+ ))))
+
+(deftest all-together-now
+ (is (= "awesome"
+ (get
+ (first
+ (parse-string "+bears: dancing+:+bears: attacking+\n+awesome+:+not so awesome+"
+ :mapped true
+ :delimiter \:
+ :quoter \+))
+ "bears: dancing"))))
+
+
+;; TODO: test that proves that even on a large file, no
+;; OutOfMemoryErrors are thrown, which should be the case.
Something went wrong with that request. Please try again.