# Files

In this section we will cover the various types of containers in Python, including:

* How to read and write files
* How to parse JSON and CSV files


## Files

Use the `open()` function to interact with files. The library reference for the open function is [here](https://docs.python.org/3/library/functions.html#open).

The `open()` function returns a [file object](https://docs.python.org/3/glossary.html#term-file-object).

In [None]:
f = open('data/iris.csv')
print(f)

Use the .read() method to read the entire file contents into a single string.

In [None]:
s = f.read()
s

Use the `.readline()` method to read line-by-line.

In [None]:
f.seek(0) # Go back to the beginning of the file

print(f.readline(), end='')
print(f.readline(), end='')
print(f.readline(), end='')

Python provides an easy way to iterate over each line in a file using a for loop.

In [None]:
f.seek(0)

for line in f:
    print(line, end="")

Use the `.close()` method to close a file.

In [None]:
f.close()

## File Context Managers

In most programming languages, a file must be opened and closed as two separate operations.

Python file objects support the *context management* protocol. While entering the context, you open the file and return the file object. When exiting the context, the file is automatically closed, even if an exception occurred when the file is still open.

In [None]:
with open('data/iris.csv') as fin:
    for line in fin:
        parts = line.split(",")
        print(parts[0], end=" ")

In [None]:
fin.read() # Raises an exception

You can open more than one file inside a context manager.

In [None]:
with open('data/iris.csv') as fin, open('data/iris2.csv', 'w') as fout:
    for line in fin:
        parts = line.split(",")
        fout.write(f"{parts[0]}\n")

In [None]:
! cat data/iris2.csv

## Working with CSV files

Python has a built-in library for working with comma-separated value files (CSV).

Using the csv library's `DictReader` class, we can read each line into a dictionary.

In [None]:
import csv

with open("data/iris.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row)

Similar to a `DictReader` class, there is also a `DictWriter` class.

To demonstrate the writer, let's create a new iris CSV data file that computes a rough petal area (length x width).

In [None]:
with open("data/iris.csv") as fin, open("data/iris2.csv", "w") as fout:
    reader = csv.DictReader(fin)
    fieldnames = reader.fieldnames + ["PetalArea"]

    writer = csv.DictWriter(fout, fieldnames=fieldnames, lineterminator="\n")  # Need lineterminator for Windows
    writer.writeheader()
    
    for row in reader:
        row["PetalArea"] = round(float(row["PetalLength"]) * float(row["PetalWidth"]), 2)
        writer.writerow(row)

In [None]:
! cat data/iris2.csv

## Working with JSON data

Python also has a built-in library for working with JSON data.

The json library marshals data back and forth between Python objects and the equivalent string representation.

In [None]:
# data is a Python list of dictionaries
data = [
    {
        "SepalLength": 5.1,
        "SepalWidth": 3.5,
        "PetalLength": 1.4,
        "PetalWidth": 0.2,
        "Class": "Iris-setosa",
    },
    {
        "SepalLength": 4.9,
        "SepalWidth": 3.0,
        "PetalLength": 1.4,
        "PetalWidth": 0.2,
        "Class": "Iris-setosa",
    },  
]

print(type(data))

In [None]:
import json

# string representation of data
data_as_str = json.dumps(data)

print(data_as_str)

JSON data can also be dumped to a file.

In [None]:
with open("data/iris.json", "w") as f:
    json.dump(data, f, indent=2, sort_keys=True)

In [None]:
! cat data/iris.json

## Exercise 1

Converter the iris.csv file in an equivalent iris.json file.

In [None]:
# Your code here

In [None]:
! cat data/iris.json

In [None]:
# Show the answer
! cat answers/files_1.py