# Files and Exceptions

# Reading and writing files

We breezed through reading files when we talked about Pandas.  Now we'll try to go back and more carefully explain what's going on, and also use this opportunity to talk about Exceptions.

There will come a time when you want your calculations to survive the termination of your program.  You'll want to write something to disk.

First, we can create a CSV manually to work with.  The format isn't much more complex than comma separated values on each line, although we'd need to put strings in quotes if there were non-separator commas.  Instead, we can create a file like this:

```
a,b,c
d,e,f
g,h,i
```

This file is readable in any text editor, which is generally true of CSV files.



Next, if you're working in Google Colab, you need to upload this file to Google Drive before we can use it.  On the other hand, if you're working locally, it just needs to be in the same directory that you launched Jupyter notebook from.

In [None]:
# Skip this cell if not working in Google Colab
from google.colab import files

uploaded = files.upload() # pick simple_csv.csv

We could import our file to a Pandas DataFrame, but we're just going to explore working with CSVs directly for now.

To open a file like this, we can use open() and "with."  open() takes a filename string and returns a file object if it was found.  "with" is a handy keyword that cleans up everything associated with an object once its indented block is done.

In [None]:
import csv

def demo_read_csv(filename):
    with open(filename, mode='r') as my_csv:
        reader = csv.reader(my_csv)
        for record in reader:
            first, second, third = record
            print(f'{first};{second};{third}')

demo_read_csv('small_csv.csv')

The reader returns the file one line at a time, with each line returned as a list of strings.


We could write instead of read a CSV file.  We need to change the mode from "r" for read to "w" for write, and change our reader object to a writer.  The CSV format is rather simple if you don't have to deal with values that include commas, but the CSV writer will handle these, too.


In [None]:
# vals1, vals2 are lists of strings
def demo_write_csv(filename, vals1, vals2):
    with open(filename, mode = 'w') as my_csv:
        writer = csv.writer(my_csv)
        writer.writerow(vals1)
        writer.writerow(vals2)

vals1demo = ['peach','pear','plum']
vals2demo = ['strawberry','orange','grape']

demo_write_csv('fruits.csv',vals1demo,vals2demo)


In both the read and write cases, the "with" keyword closes the file for us when we exit the block, freeing it up for the operating system to allow another program to use the file.  (__with__() will, in general, call the method named __exit__() for its object when the block is done.)


We can see the file in its directory with the !ls command.  ! alerts Google Colab that what follows is a system command of the kind you could use at the command line (Terminal in Mac), and ls is a command to list the contents of the current directory.

In [None]:
!ls

CSV isn't the only viable format for writing out data.  Another popular option is JSON (JavaScript Object Notation).  JSON is popular as a platform-independent way to transfer key-value pairs; the Twitter API uses it, for example.  A JSON object is like a dictionary in that it stores property names (keys) and values associated with those keys.

To write a JSON object to file, you need only call json.dump on a dictionary holding the key-value pairs, also providing the file to dump the JSON into.


In [None]:
import json

def demo_dump_json(filename, dict):
    with open(filename, 'w') as myfile:
        json.dump(dict, myfile)

dict_demo = {
    'a': 3,
    'b': 7,
    'c': 10
}

demo_dump_json('sample.json', dict_demo)

# to target file: {"a": 3, "b": 7, "c": 10}


In [None]:
!ls

Values in JSON objects can be strings, numbers, Boolean values (but lowercase), null, arrays, or other JSON objects.  Thus they can potentially communicate richer structure than a CSV.  They're often how a variety of cloud-based services communicate.



The fancy word for committing data to a file is serialization.  Python used to use another, python-specific method of serialization called pickling -- but, it was a little too powerful, as unpacking a pickle could cause arbitrary code to execute.  Now, pickling seems to be less used in favor of platform-independent formats.



JSONs can be read into dictionaries as well.


In [None]:
def demo_read_json(filename):
    with open(filename, 'r') as myfile:
        my_dict = json.load(myfile)
    return my_dict

my_dict = demo_read_json('sample.json')
my_dict

The dictionary we wrote to file in the previous step can be read right back as a dictionary.

# Exercise
Try dumping a dictionary into a .json file on Google Colab, check that it's there with ls, then read it back.

In [None]:
mydict = {"foo": 2, "bar": 5}
# TODO to JSON and !ls

In [None]:
# TODO read back the dictionary

# Directly to DataFrame

As we did in a previous lecture, you can load a CSV into a DataFrame without using the methods discussed in this lecture; pd.read_csv() reads directly into a DataFrame.

In [None]:
import pandas as pd
df = pd.read_csv("fruits.csv", names = ["fruit1", "fruit2","fruit3"])
df.head()

Writing a DataFrame to CSV is similarly straightforward.

In [None]:
df.to_csv('fruits2.csv')
!ls

# Intro to Exceptions

Input and Output (IO) is generally a place where things may not work as expected -- the looked-for file isn't there, or we didn't get permission to write. When things don't go as planned, an exception is thrown.

Exceptions are objects, and there are multiple kinds depending on the error that occurred: FileNotFoundError, ZeroDivisionError, and ValueError are examples (this last occurs if you try to parse a non-integer string as an integer). If an exception occurs and it isn't "caught," the program immediately terminates, reporting where the error occurred in a way you're familiar with from debugging.

If an exception is caused by a bug, you should fix the bug instead of catching the exception. But if the exception can happen because of bad input or bad circumstances, then your program should catch it and respond gracefully.

The "try" keyword comes before a block of code that could throw an exception. If an exception is thrown, it can be "caught" with an "except" block after the try block. Execution will jump to the "except" when an exception in the try block occurs.

Our JSON reader could have two obvious errors: the file doesn't exist, or the file isn't JSON. We can catch these errors in this way:

In [None]:
def safe_read_json(filename):
    try:
        with open(filename, 'r') as myfile:
            my_dict = json.load(myfile)
    except FileNotFoundError:
        print("File not found: " + filename)
        return None
    except json.decoder.JSONDecodeError:
        print("JSON error in " + filename)
        return None
    return my_dict

safe_read_json('not_found.json')

The behavior may seem very similar, but the program doesn't crash this way, and it returns a more informative error to the user.

(Note that "except" without a specific exception following it will catch all exceptions.)


Two other keywords associated with exceptions are "else" and "finally."  "Else" can appear after except blocks to say what should happen if there weren't errors.  And "finally" can appear after all of that to give code that should happen regardless - probably some kind of cleanup.  Both are used somewhat infrequently.

In [None]:
def safe_read_json_last_call(filename):
    try:
        with open(filename, 'r') as myfile:
            my_dict = json.load(myfile)
    except FileNotFoundError:
        print("File not found: " + filename)
        return None
    else:
        print("No errors!")
    finally:
        print("End of demo!")
        
safe_read_json_last_call('sample.json')

You should never use exceptions for the normal, expected running of your program - they're for use in surprising, unexpected situations.  The way they jump around in the code is undesirable, but it's better than the program crashing.

# Exercise

Modify the CSV reading function demo_read_csv so that it catches a FileNotFoundError and a ValueError (for wrong number of items in a line). Print an error message in each case.

In [None]:
def demo_read_csv(filename):
    with open(filename, mode='r') as my_csv:
        reader = csv.reader(my_csv)
        for record in reader:
            first, second, third = record
            print(f'{first};{second};{third}')

demo_read_csv("fruits.csv")