# Introduction

The `csv` module can deal with delimiters other than a comma, such as tab. It can also deal with other aspects of parsing such as quoting, line separators, or treatment of initial spaces. We can alter any or all of these aspects when creating a CSV reader or writer. 

# Built-in Dialects

`csv` comes with a couple of built-in dialects:

In [1]:
import csv
csv.list_dialects()

['excel', 'excel-tab', 'unix']

In [2]:
help(csv.excel)

Help on class excel in module csv:

class excel(Dialect)
 |  Describe the usual properties of Excel-generated CSV files.
 |  
 |  Method resolution order:
 |      excel
 |      Dialect
 |      builtins.object
 |  
 |  Data and other attributes defined here:
 |  
 |  delimiter = ','
 |  
 |  doublequote = True
 |  
 |  lineterminator = '\r\n'
 |  
 |  quotechar = '"'
 |  
 |  quoting = 0
 |  
 |  skipinitialspace = False
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from Dialect:
 |  
 |  __init__(self)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from Dialect:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Dat

In the help text above for `csv.excel` dialect, take notice of the delimiter, quoting, line terminator, and initial space settings. All of these make up a dialect.

# Using a Built-in Dialect

Let say we are dealing with a tab-delimited text file, we can deal with them using two different methods.

## Specifying the Delimiter when Creating a Reader or Writer

In this method, we specify the delimiter as part of creating a reader or writer:

In [3]:
import csv

buffer = """
501\tkaren\tbash
502\tjohn\ttcsh
""".strip().splitlines()

for row in csv.reader(buffer, delimiter='\t'):
    print(row)

['501', 'karen', 'bash']
['502', 'john', 'tcsh']


## Specifying a Dialect when Creating a Reader or Writer

Lucky for us, the built-in `csv.excel_tab` dialect can handle tab-delimited data, so we can use that by supplying the `dialect=`:

In [4]:
import csv

buffer = """
501\tkaren\tbash
502\tjohn\ttcsh
""".strip().splitlines()

for row in csv.reader(buffer, dialect=csv.excel_tab):
    print(row)

['501', 'karen', 'bash']
['502', 'john', 'tcsh']


# Using Custom Dialect

In a project where we found ourselves having to deal with a particular dialect, we can define and register a custom dialect which promotes reusability and consistency. For example, if we need to deal with semicolon-delimited data and also would like to skip initial spaces, we can create a custom module where we define and register this dialect:

In [24]:
# Contents of csv_dialects.py:
print(open('csv_dialects.py').read())

# csv_dialect.py
import csv

SEMICOLON = 'semicolon'
SPACE = 'space'

csv.register_dialect(SEMICOLON, delimiter=';', skipinitialspace=True)
csv.register_dialect(SPACE, delimiter=' ', skipinitialspace=True)



After registering that dialog, we can use it:

In [25]:
import csv
import csv_dialects

buffer = """
501; karen; bash
502; john; tcsh
""".strip().splitlines()

for row in csv.reader(buffer, dialect=csv_dialects.SEMICOLON):
    print(row)

['501', 'karen', 'bash']
['502', 'john', 'tcsh']


In the output above, notice how the reader skips (ignores) the intial spaces.

In [27]:
import csv
import csv_dialects

buffer = """
501  karen  bash
502  john   tcsh
""".strip().splitlines()

for row in csv.reader(buffer, dialect=csv_dialects.SPACE):
    print(row)

['501', 'karen', 'bash']
['502', 'john', 'tcsh']
