# __Parsing CSV with Python__

---
__CSV = Comma-Seperated Values__

Generally, CSV is used to refer to any plain text file which:

- Store a row of data on each line
- Seperates values within a row using a __delimiter character like , or ; or \t or :..__

CSV files may have a header row which give the column names (or they may not)

Very easy to implement programs that write CSVs, and append data to existing CSVs

There are no standards for CSV files, meaning __you can never assume that two CSVs
from different sources have the same format.__

---


## __Reading data from a CSV file with csv__


Two choices: reading each row as lists or dictionaries:

    The reader() function creates a list of lists of the CSV file
    The DictReader class returns a list of dictionaries of the CSV file

Both approaches accept a file-like reference to the file itself.

__NOTE:__ All values are read as strings.


__Documentation:__ <https://docs.python.org/3/library/csv.html>




### __A closer look csv.reader:__

Constructor: csv.reader(file, delimiter)

- file: File-like reference to file
- delimiter: Character in file that indicates a new column-value

The constructor returns an iterable object that can be processed by for-loops and
functions accepting iterables

- __Each row__ is read as a __list of values__ with length equal to the number
of columns in the file.


In [None]:
import csv

# Example using the reader() function:

with open("eggs.csv", newline="") as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(row["first_name"], row["last_name"])

### __A closer look csv.DictReader:__

Constructor: csv.DictReader(file)

- file: File-like reference to file

The constructor returns an iterable object that can be processed by for-loops and functions
accepting iterables.

- __Each row__ is read as a __dictionary of values__, where the keys are the taken from the
header-row of the CSV or from the optional __fieldnames__ parameter.

csv.DictReader expects a header-row in the CSV, which will be used as keys in the dictionaries.
If there is no header-row, you may pass __fieldnames__ as an optional paramater in the constructor.

In [10]:
import csv

# Example using the DictReader() class:

with open("names.csv", newline="") as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(' ,'.join(row))

---
__Writing data to a CSV file with csv__
---

Two choices: writing each row as lists or dictionaries.

    The writer() object accepts rows as lists.
    The DictWriter class returns a list of dictionaries of the CSV file.

Both approaches accept a file-like reference to the file itself.

__NOTE:__ You will not get exceptions for writing an unequal amount of values to the file


### __A closer look at csv.writer__

Constructor: csv.writer(file, delimiter)

- file: File-like reference to file, which should be opened with newline="".
- delimiter: Character in file that indicates a new column-value.

Methods:

- .writerow(): Expects a list of calues for each row
    - Each subsequent call appends row below the latest.

In [11]:
import csv

# Example using the writer() object:

with open("eggs.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile, delimiter=' ', quote="|",  quoting=csv.QUOTE_MINIMAL)
    writer.writerow(["Spam"] * 5 + ["Baked Beans"])
    writer.writerow(["Spam", "Lovely Spam", "Wonderful Spam"])



### __A closer look at csv.DictWriter__

Constructor look at csv.DictWriter

- file: File-like reference to file, which should be opened with newline="".
- fieldnames: List of column names for the CSV file.

The constructor itself does not write any data to the file!

Methods:

- .writeheader(): Writes a header_row
- .writerow(): Expect a dictionary of values for a row, where each dictionary
must have keys matching the fieldnames.
    - Each subsequent call appends row below the latest.

In [None]:
import csv

# Example using the DictWriter class:

with open("names.csv", "w", newline="") as csvfile:
    fieldnames = ["first_name", "last_name"]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({"first_name": "Baked", "last_name": "Beans"})
    writer.writerow({"first_name": "Lovely", "last_name": "Spam"})
    writer.writerow({"first_name": "Wonderful", "last_name": "Spam"})


---

# **EXAMPLES**

1. Write a function read_csv that accepts filename as a parameter and prints each line in the file given by filename. <br> Try to use both csv.reader() and csv.DictReader. Create your own CSV files to test, both with and without a header!

In [2]:
import csv

# Using the csv.reader()
def read_csv(filename):
    with open(filename, "r", newline="") as fp:
        reader = csv.reader(fp)
        for row in reader:
            print(row)

read_csv("Filer/test.csv")

['Name', 'Age', 'Birthdate']
['Reodor', '90', '1932-08-08']
['Solan', '45', '1965-01-02']


In [3]:
# Using the csv.DictReader
def read_csv(filename):
    with open(filename, "r", newline="") as fp:
        reader = csv.DictReader(fp)
        for row in reader:
            print(row)

read_csv("Filer/test.csv")

{'Name': 'Reodor', 'Age': '90', 'Birthdate': '1932-08-08'}
{'Name': 'Solan', 'Age': '45', 'Birthdate': '1965-01-02'}


2. Add a parameter n_lines to the function that only prints the n_lines first rows to thefile. If n_lines is None it should print all the lines in the file.

In [11]:
def read_csv(filename, n_lines=None):
    with open(filename, "r", newline="") as fp:
        reader = csv.reader(fp, delimiter=",")
        # If we want to skip the first row, since it is the header, we can use the next() function.
        next(reader)
        for count, row in enumerate(reader):
            if n_lines is None or count < n_lines:
                print(row)

read_csv("Filer/test.csv", 1)


['Reodor', '90', '1932-08-08']


3. Add a new parameter to the function read_csv named filename_out. If filename_out is not None, the function should write each line to the file specified by filename_out instead of printing it.

In [21]:
def write_csv(rows, filename_out):
    with open(filename_out, "w") as fp:
        writer = csv.writer(fp, delimiter=",", lineterminator="\n")
        # Here rows is a list of dicts, so we can use writerows()
        # We could also iterate over the rows list and use writer.writerrow(row)
        writer.writerows(rows)

def read_csv(filename, n_lines=None, filename_out=None):
    output = []
    with open("Filer/test.csv", "r", newline="") as fp:
        reader = csv.reader(fp)
        next(reader)
        for count, row in enumerate(reader):
            if n_lines is None or count < n_lines:
                output.append(row)
        if filename_out:
            print(output)
            write_csv(output, filename_out)

read_csv("Filer/test.csv", 1, "Filer/new_file.csv")

[['Reodor', '90', '1932-08-08']]


4. Expand the function to check if the filename_out already exists. If it does, it should append the rows (without the header) to the file instead of overwriting it.

In [30]:
from os import path

def write_csv(rows, filename_out):          
    filemode = "w" # write                      ##
    if path.exists(filename_out):               ## These are the four lines that are modified in this task (4)
        filemode = "a" # append                 ##
    with open(filename_out, filemode) as fp:    ##
        writer = csv.writer(fp, delimiter=",", lineterminator="\n")
        writer.writerows(rows)

def read_csv(filename, n_lines=None, filename_out=None):
    rows = []
    with open(filename, "r") as fp:
        reader = csv.reader(fp, delimiter=",", lineterminator="\n")
        for row_num, row in enumerate(reader):
            if filename is None or row_num < n_lines:
                rows.append(row)
        if filename_out:
            write_csv(rows, filename_out)

read_csv(filename="Filer/test.csv", filename_out="Filer/new_file.csv", n_lines=2)


5. Add a paramter cols to the function read_csv. Cols should be a list of ints that specify the index of the columns that should be included. For instance: read_csv(filename="test.csv", n_lines=10, filename_out="out.csv", cols=[0,2,3]) should only write the first, third and fourth column (remember that we use zero-index in Python)

In [41]:
from os import path

def write_csv(rows, filename_out, cols):
    filemode = "w"
    if path.exists(filename_out):
        filemode = "a"


# Only include the cols that we care about
# We do this by iterating over each row, and then iterating over each column in the row.
    if cols:
        for i, row in enumerate(rows):
            new_row = [row[col] for col in cols] # This could be replaced with a for loop instead.
            # Alternative solution with for loop:
            # new_row = []
            # for col in cols:
            #       new_row.append(row[col])

            row[i] = new_row

    with open(filename_out, filemode) as fp:
        writer = csv.writer(fp, delimiter=",", lineterminator="\n")
        writer.writerows(rows)

def read_csv(filename, n_lines=None, filename_out=None, cols=[]):
    rows = []
    with open(filename, "r", newline="") as fp:
        reader = csv.reader(fp, delimiter=",", lineterminator="\n")
        for row_num, row in enumerate(reader):
            if filename is None or row_num < n_lines:
                rows.append(row)
        if filename_out:
            write_csv(rows, filename_out, cols)

read_csv(filename="Filer/test.csv", filename_out="Filer/new_file.csv", n_lines=2, cols=[0,2])


6. Experimenting with transforming the values of each row. For instance, try adding a column with integers in your CSV and write each integer squared (i.e i*i) to the out CSV. Also try printing the total sum of that integer row.

In [None]:
def transform_int(my_num):
    return {"Num": str(my_num)}

lines = []
with open("input.csv", "r") as infile:
    reader = csv.DictReader(infile)
    for row in reader:
        # row: {"Num": "1", "Garbage": 25}
        try:
            my_num = int(row["Num"])
        except ValueError as e:
            print("Could not convert to int.")
        squared = my_num ** 2
        to_dict = transform_int(squared)
        lines.append(to_dict)
    
print(lines)

7. Add a column "Date" to your CSV and fill it with values. Try writing that date as three separate columns "Year", "Month", and "Day" in the out csv.

In [46]:
from datetime import datetime
def read_csv(filename):
    rows = []
    with open(filename, "r") as fp:
        reader = csv.DictReader(fp, delimiter=",")

        for row_num, row in enumerate(reader):
            datestr = row["Birthdate"]
            dateobj = datetime.strptime(datestr, "%Y-%m-%d")
            row["Day"] = dateobj.day
            row["Month"] = dateobj.month
            row["Year"] = dateobj.year
            rows.append(row)
    return rows

def write_csv(rows, filename_out):
    with open(filename_out, "w") as fp:
        writer = csv.DictWriter(fp, fieldnames=rows[0].keys(), lineterminator="\n")
        writer.writeheader()
        writer.writerows(rows)

out_rows = read_csv("Filer/test.csv")

write_csv(out_rows, "Filer/output_with_dates.csv")