/
Csv.hs
198 lines (178 loc) · 5.93 KB
/
Csv.hs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
-- | This module implements encoding and decoding of CSV data. The
-- implementation is RFC 4180 compliant, with the following
-- extensions:
--
-- * Empty lines are ignored.
--
-- * Non-escaped fields may contain any characters except
-- double-quotes, commas, carriage returns, and newlines
--
-- * Escaped fields may contain any characters (but double-quotes
-- need to be escaped).
module Data.Csv
(
-- * Usage example
-- $example
-- * Treating CSV data as opaque byte strings
-- $generic-processing
-- * Custom type conversions
-- $customtypeconversions
-- * Encoding and decoding
-- $encoding
decode
, decodeByName
, encode
, encodeByName
-- ** Encoding and decoding options
-- $options
, DecodeOptions(..)
, defaultDecodeOptions
, decodeWith
, decodeByNameWith
, EncodeOptions(..)
, defaultEncodeOptions
, encodeWith
, encodeByNameWith
-- * Core CSV types
, Csv
, Record
, Field
, Header
, Name
, NamedRecord
-- * Type conversion
-- $typeconversion
-- ** Index-based record conversion
-- $indexbased
, FromRecord(..)
, Parser
, runParser
, index
, (.!)
, ToRecord(..)
, record
, Only(..)
-- ** Name-based record conversion
-- $namebased
, FromNamedRecord(..)
, lookup
, (.:)
, ToNamedRecord(..)
, namedRecord
, namedField
, (.=)
-- ** Field conversion
-- $fieldconversion
, FromField(..)
, ToField(..)
) where
import Prelude hiding (lookup)
import Data.Csv.Conversion
import Data.Csv.Encoding
import Data.Csv.Types
-- $example
--
-- A short encoding usage example:
--
-- > >>> encode $ fromList [("John" :: Text, 27), ("Jane", 28)]
-- > Chunk "John,27\r\nJane,28\r\n" Empty
--
-- Since string literals are overloaded we have to supply a type
-- signature as the compiler couldn't deduce which string type (i.e.
-- 'String' or 'Text') we want to use. In most cases type inference
-- will infer the type from the context and you can omit type
-- signatures.
--
-- A short decoding usage example:
--
-- > >>> decode False "John,27\r\nJane,28\r\n" :: Either String (Vector (Text, Int))
-- > Right (fromList [("John",27),("Jane",28)])
--
-- We pass 'False' as the first argument to indicate that the CSV
-- input data isn't preceded by a header.
--
-- In practice, the return type of 'decode' rarely needs to be given,
-- as it can often be inferred from the context.
-- $generic-processing
--
-- Sometimes you might want to work with a CSV file which contents is
-- unknown to you. For example, you might want remove the second
-- column of a file without knowing anything about its content. To
-- parse a CSV file to a generic representation, just convert each
-- record to a @'Vector' 'ByteString'@ value, like so:
--
-- > decode False "John,27\r\nJane,28\r\n" :: Either String (Vector (Vector ByteString))
-- > Right (fromList [fromList ["John","27"],fromList ["Jane","28"]])
--
-- As the example output above shows, all the fields are returned as
-- uninterpreted 'ByteString' values.
-- $customtypeconversions
--
-- Most of the time the existing 'FromField' and 'ToField' instances
-- do what you want. However, if you need to parse a different format
-- (e.g. hex) but use a type (e.g. 'Int') for which there's already a
-- 'FromField' instance, you need to use a @newtype@. Example:
--
-- > newtype Hex = Hex Int
-- >
-- > parseHex :: ByteString -> Parser Int
-- > parseHex = ...
-- >
-- > instance FromField Hex where
-- > parseField s = Hex <$> parseHex s
--
-- Other than giving an explicit type signature, you can pattern match
-- on the @newtype@ constructor to indicate which type conversion you
-- want to have the library use:
--
-- > case decode False "0xff,0xaa\r\n0x11,0x22\r\n" of
-- > Left err -> putStrLn err
-- > Right v -> forM_ v $ \ (Hex val1, Hex val2) ->
-- > print (val1, val2)
--
-- In order to ignore column you can use unit type '()'. It always
-- successfully decode. Note that it lacks corresponding 'ToField'
-- instance. It serves as placeholder to indicate that there's
-- something in the column you don't care what.
--
-- > case decode False "foo,1\r\nbar,22" of
-- > Left err -> putStrLn err
-- > Right v -> forM_ v $ \ ((), i) -> print (i :: Int)
-- $encoding
--
-- Encoding and decoding is a two step process. To encode a value, it
-- is first converted to a generic representation, using either
-- 'ToRecord' or 'ToNamedRecord'. The generic representation is then
-- encoded as CSV data. To decode a value the process is reversed and
-- either 'FromRecord' or 'FromNamedRecord' is used instead. Both
-- these steps are combined in the 'encode' and 'decode' functions.
-- $typeconversion
--
-- There are two ways to convert CSV records to and from and
-- user-defined data types: index-based conversion and name-based
-- conversion.
-- $indexbased
--
-- Index-based conversion lets you convert CSV records to and from
-- user-defined data types by referring to a field's position (its
-- index) in the record. The first column in a CSV file is given index
-- 0, the second index 1, and so on.
-- $namebased
--
-- Name-based conversion lets you convert CSV records to and from
-- user-defined data types by referring to a field's name. The names
-- of the fields are defined by the first line in the file, also known
-- as the header. Name-based conversion is more robust to changes in
-- the file structure e.g. to reording or addition of columns, but can
-- be a bit slower.
-- $options
--
-- These functions can be used to control how data is encoded and
-- decoded. For example, they can be used to encode data in a
-- tab-separated format instead of in a comma-separated format.
-- $fieldconversion
--
-- The 'FromField' and 'ToField' classes define how to convert between
-- 'Field's and values you care about (e.g. 'Int's). Most of the time
-- you don't need to write your own instances as the standard ones
-- cover most use cases.