# CSVs in Python

- [Overview](#overview)
- [Reading CSVs](#reading-csvs)
  - [csv.reader](#reader)
  - [csv.DictReader](#dictreader)
- [Writing CSVs](#writing-csvs)
  - [csv.writer](#writer)
  - [csv.DictWriter](#dictwriter)

<h2 id="overview">Overview</h2>

Files containing comma-separated values, more commonly known as CSVs, are among the most common and easy-to-work with data formats. Government agencies and other data sources often publish data in the CSV format, so it's important to learn how to work with such files.

Fortunately, the [Python Standard Library](https://docs.python.org/3/library/index.html) provides a built-in [csv module][] that makes it much easier to work with these types of files. The CSV module frees coders from having to perform low-level, tedious chores such as handling newline characters and splitting files on commas and other delimiter types. It also helps in more tricky scenarios, such as CSVs in which columns are separated by commas *and* field values also contain commas.

[csv module]: https://docs.python.org/3/library/csv.html

The [csv module][] works in tandem with the built-in `open` function that is typically used when [reading and writing files](python_file_io.ipynb). It can simply be [imported and used](python_libraries.ipynb)in Python scripts, the interactive interpreter or Jupyter notebooks without any extra installation steps.

Below, we demo the `csv` module using [files/data/animal_ratings.csv](files/data/animal_ratings.csv), which contains the data below:

```
animal,awesomeness
cat,5
cougar,10
dog,8
snake,2
narwhal,11
```

<h2 id="reading-csvs">Reading CSVs</h2>

<h3 id="reader">Reader</h3>

The CSV module provides two primary modes for working with a CSV. The first, [csv.reader][], simply reads data from a CSV and returns the values for each row in a list.

[csv.reader]: https://docs.python.org/3/library/csv.html#csv.reader

In [None]:
import csv
with open('files/data/animal_ratings.csv', 'r') as source_file:
    reader = csv.reader(source_file)
    for row in reader:
        print(row)

A few important notes on the above code:

* `csv.reader` is passed an open file "handle", as opposed to a file path
* Rows are automatically split on the delimiter (a comma in this case)
* Newline characters are automatically stripped

Because `csv.reader` returns rows as lists of values, we must access individual values in each row based on their index position (e.g. `row[0]`). This can be less readable and more brittle from a code perspective. For example, if a government agency changes the structure of a CSV by re-ordering or adding new columns, code that relies on index position will need to be updated.

<h3 id="dictreader">DictReader</h3>

When working with CSVs, it's often more useful, readable and future-proof to look up field values by column name. The `csv` module provides [DictReader][] for just this purpose.

[DictReader]: https://docs.python.org/3/library/csv.html#csv.DictReader

In [None]:
with open('files/data/animal_ratings.csv', 'r') as source_file:
    dict_reader = csv.DictReader(source_file)
    for row in dict_reader:
        print(row)

Some important notes on the above:

* `DictReader` accepts an open file handle
* It returns a [dictionary](python_dict_basics.ipynb) for each row, which contain `key:value` pairs for each item in the row.
* `DictReader` assumes the first row contains column names, or headers, and automatically assigns the appropriate header as the key for each value in the row.

Using `DictReader` frees us from having to more carefully handle the header row in a file, for example, when counting rows in a file. It also makes our code more readable and helps "future-proof" our work in the event that a government agency restructures the CSV.

<h2 id="writing-csvs">Writing CSVs</h2>

The `csv` module also provides the ability to easily generate CSV files. Mirroring its read functionality, the module lets you write data rows structured as lists as well as dictionaries.

<h3 id="writer">Writer</h3>

Below is an example showing how to write a truncated version of our data to a new file.


In [None]:
animals = [['animal', 'awesomeness'], ['cat', '5'], ['dog', '10']]
with open('animal_shorlist.csv', 'w') as new_file:
    writer = csv.writer(new_file)
    for row in animals:
         writer.writerow(row)

A few notes on the above:

* We included the header row as the first item in the `animals` list.
* [csv.writer][] requires an open file rather than a file path
* We used the [writerow](https://docs.python.org/3/library/csv.html#csv.csvwriter.writerow) method in a "for" loop, although it's possible to skip the loop entirely and write all rows at once with the [writerows](https://docs.python.org/3/library/csv.html#csv.csvwriter.writerow) method.

[csv.writer]: https://docs.python.org/3/library/csv.html#csv.writer

The above generates `animals_shortlist.csv` with the below content:

```
animal,awesomeness
cat,5
dog,10
```

<h3 id="dictwriter">DictWriter</h3>

Similar to its counterpart on the reading side, [csv.DictWriter][] can make your code more readable and future-proof by allowing you to write rows structured as dictionaries.

[csv.DictWriter]: https://docs.python.org/3/library/csv.html#csv.DictWriter

Here's an example:


In [None]:
animals = [
    {'animal': 'cat', 'awesomeness': '5'},
    {'animal': 'dog', 'awesomeness': '10'}
]
with open('animals_shortlist.csv', 'w') as newfile:
    col_headers = ['animal', 'awesomeness']
    dict_writer = csv.DictWriter(newfile, fieldnames=col_headers)
    dict_writer.writeheader()
    dict_writer.writerows(animals)

`csv.DictWriter` is largely identical in usage to `csv.writer`, except that we must pass the column headers to the `fieldnames` argument. We also have to call `writeheader` to ensure that the header row is written to the file.