Skip to content

CSV Parser Parameters

Alf Eaton edited this page Mar 5, 2014 · 14 revisions

Notes

Where a single character is specified, some parsers take a list of characters, regular expression, function or class.

File reading

name description default notes
encoding file encoding ASCII, system, UTF-8 originally ASCII, but JS parsers use UTF-8 as default

Parsing (dialect)

Common

name description default notes
delimiter/separator column separator ,
enclosure/quotechar/quote field enclosure " double = escaped
newline/lineTerminator/rowDelimiter/row_sep line ending CRLF, \n \n, \r, \r\n, auto, unix, mac, windows, unicode
trim ignore whitespace at start/end of value both? start, end, both, none

Less common

name description default notes
escape escape character
comment character at start of line that denotes comment
quoting whether fields are enclosed in quotes special always, sometimes (special characters), sometimes (non-integer), specific columns, never
length/field_size_limit maximum line length to read to look for a row separator 0,null
skipBlankRows/skip_blanks/IgnoreEmptyLines skip blank lines, or return as an array (empty or with a null value) might be separation between multiple tables in the same file
skip_lines regexp of lines to skip
heading whether/how many heading rows to read false/0 boolean/integer
fields/names/columns/keys/headings field names
doublequote if the enclosure character is repeated to be escaped
skipLines/HeaderLines the number of lines/rows at the start of the file to skip
rows/fragment range/subset of rows to return
columns/usecols range/subset of columns to return
emptyCells/fill (boolean) trim empty cells from the end of the row, add empty cells to the end of the row
nrows number of rows to read
auto_non_chars characters to ignore when attempting to auto-detect delimiter
prefix prefix to skip on each line
skipfooter number of rows at end of file to skip
flags flags for reading file stream
columnPrefix prefix to add to column numbers when no header names

Converting (mapping to object/index/data typing)

name description default notes
convertEncoding whether to convert the encoding
transform/converters mapping of fields to converter functions
sanitize whether to sanitize fields
dtype data type of column
multi-index header rows array of row locations
index_column column(s) to use for the index
na_values values to recognize as null
true_values values to recognize as true
false_values values to recognize as false
parse_dates columns to parse as dates, may combine columns
date_parser function to use for parsing date columns
dayfirst DD/MM format dates
thousands thousands separator
decimal decimal point character
CurrencyTokens currency units to be skipped when importing numerical values
DateStringFormat date format
EmptyField how to represent empty fields
Numeric whether to import data fields as numbers if possible

Links

CSV Application Support - Wikipedia

CSV Dialect Description Format

CSV JSON Table Schema

ODI CSV Validation Research

CSV Dialect Description Format