# Agenda: CSV files

1. A little background on CSV
2. `read_csv` and `to_csv`
3. `sep`
4. `header`
5. `index_col`
6. `usecols`
7. `names`
8. `dtype`
9. `engine`
10. `parse_dates`
11. `date_format`
12. `comment`

# What are CSV files?

CSV originally stood for "comma-separated values," but now I've heard it can be called "character-separated values."

The idea is that we have a text-file format that contains our records:

- Every record is one line
- On each line, we have multiple fields
- Fields are separated by commas

On the face of it, that seems reasonable! But there are a few problems:

- Because it's a text format, not a binary format, Pandas needs to figure out what type of values should be in each column
- Sometimes, we (or the authors of the file) want to use a different separator. I'm partial to `'\t'` (tab), because it's unlikely to be used in the data.
- Sometimes, the column names are included, and sometimes they aren't.

You might think that CSV files aren't that complex; once you know the separator and a few other pieces of information, you should be able to read it into Python with `str.split`. 

The problem is that CSV allows us to have commas (or whatever separators we're using) *inside* of a data field.

`read_csv` handles the separator inside of a field just fine.

In [1]:
s = 'ab,cde,fg'

s.split(',')

['ab', 'cde', 'fg']

In [2]:
# we can include , inside of a field if we use "" around it
s = 'ab,"cd,e",fg'
s.split(',')

['ab', '"cd', 'e"', 'fg']