# Reading CSV files

**Prerequisites**: Basic Python, lists, dicts

## Introduction

The process of converting an object to text data is known as **serialisation**.

The process of converting text data to an object is known as **deserialisation**.

Python provides modules to ease the serialisation of data to and deserialisation of data from commonly used formats.

Two of the most commonly used modules are `csv` and `json`. These are **text-based formats** that allow object data to be written out in readable text-based form.

## CSV format

**CSV** stands for **C**omma-**S**eparated **V**alues. This format is commonly used for table exports, for data science, etc.

In this format, data is stored in the form of rows.  
Each text item in the row is separated by a comma.  
Items with commas in them (e.g. names, addresses) must be double-quoted (i.e. "")  
An optional header row may be placed at the top of the file.

### Format example

    student_name,enrollment_year,class
    Student1,2020,2004
    Student2,2020,2005
    Student3,2020,2005
    ...

## Using the `csv` module: reader and writer

The `csv` module provides `reader()` and `writer()` function. These functions return an iterable object. An iterable is an object that can be looped over (e.g. with a `for` or `while` loop). A Python list is an iterable, but an iterable need not be a list.

`open()` should be called with the optional argument `newline=''` (See https://stackoverflow.com/a/3191811 for a detailed explanation).

Official documentation for csv module: https://docs.python.org/3/library/csv.html

To read data from a .csv file as a list of rows:

```python
import csv
with open('data.csv', 'r', newline='') as f:
    csv_reader = csv.reader(f)
    for row in csv_reader:
        # each row is a list of strings
        ...
```

To write a list of rows:

```python
# Assuming your data is a list of rows with the variable name `data`
import csv
with open('data.csv', 'w') as f:
    csv_writer = csv.writer(f)
    for row in data:
        # assuming each row is a list of strings
        csv_writer.writerow(row)
```

## Using the csv` module: DictReader and DictWriter

The `csv` module also provides `DictReader` and `DictWriter` classes that makes it easy to obtain each row as a dictionary, with column headers as row keys.

Reading to list of dicts with `csv.DictReader`:

```python
import csv
with open('data.csv', 'r', newline='') as f:
    # The csv file is assumed to have a header row
    csv_dictreader = csv.DictReader(f)
    for row_dict in csv_dictreader:
        # each row is a dict with col headers as keys
        ...
```

Writing a list of dicts with `csv.DictWriter`:

```python
import csv
with open('data.csv', 'w') as f:
    # DictWriter requires a list of header names
    # these can be generated from the .keys() method of
    # the row dict, and converted to a list
    fieldnames = ['key1', 'key2']
    dict_writer = csv.DictWriter(f, fieldnames=fieldnames)
    dict_writer.writeheader()
    for row in data:
        # assuming each row is a dict
        dict_writer.writerow(row)
```

## Exercise: Reading and Exploring Weather Data

1. **Load the CSV File**: Use the `csv` module to read the `HistoricalDailyWeatherRecords.csv` file into `weatherdata`, a list of rows.

2. **Inspect the Data**:
   - Display the first 5 rows of `weatherdata`.
   - Print the column names by reading the header row.

3. **Data Cleaning**:
   - Identify rows with missing values (contains `"na"` for all rainfall columns).
     - Remove these rows from `weatherdata`.
   - Convert the remaining data values for weather
     - `"na"` --> `None`
     - decimal numbers --> `float`

4. **Basic Analysis**:
   - Calculate monthly rainfall for the Admiralty weather station
