<a href="https://colab.research.google.com/github/nicsim22/DS110-Content/blob/main/Lecture17FilesAndExceptions_nosol.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Files and Exceptions

<i>"So, you break into the networks of companies that have the data you need?" Cynthia said, looking up from her laptop nervously.  "I mean, <b>we</b> break in to those networks?" </i>

*Aubin shrugged.  "Facebook, Google, Amazon ... they're all sitting on treasure troves of data for tracking diseases, but they lack the incentive to collaborate on that particular issue.  So we liberate the data and put it to good use.  There's a reason SAGE is so secretive.  We break the law for good."*

<i>"And so when my machine says 'Connecting to Google,' what it means is 'Currently breaking into Google.'"</i>

<i>"Well, yes.  So your code might have to quit in a hurry, gracefully.  Can you be sure to read files just a bit at a time, and close everything up in a hurry if you get interrupted?"</i>

# Reading and writing files

We breezed through reading files when we talked about Pandas.  Now we'll try to go back and more carefully explain what's going on, and also use this opportunity to talk about Exceptions.

First, we can create a CSV (comma separated values) manually to work with.  The format isn't much more complex than comma separated values on each line, although we'd need to put strings in quotes if there were non-separator commas.  Instead, we can create a file like this:

```
Gold,Kevin,DS340
Chatterjee,Tanima,DS120
Goldner,Kira,DS320
```

This file is readable in any text editor, which is generally true of CSV files.



Next, if you're working in Google Colab, you need to upload this file to Google Drive before we can use it.  On the other hand, if you're working locally, it just needs to be in the same directory that you launched Jupyter notebook from.

In [None]:
# Skip this cell if not working in Google Colab
from google.colab import files

uploaded = files.upload() # pick small_csv.csv

Saving small_csv.csv to small_csv.csv


To open a file like this, we can use open() and "with."  open() takes a filename string and returns a file object if it was found.  "with" is a handy keyword that cleans up everything associated with an object once its indented block is done.

In [None]:
import csv

def demo_read_csv(filename):
    with open(filename, mode='r') as my_csv: #with --> signals that we are gg to do smth that needs to be cleaned up later  #mode='r' is read a file, mode ='w' is write to a file
        reader = csv.reader(my_csv) #csv.reader -- reads csv file a little bit at a time
        for record in reader:
            out = ''
            for item in record: #basic iteration through a list
                out += item + ';'
            print(out)
    # my_csv would be cleaned up here

demo_read_csv('small_csv.csv')

Gold;Kevin;DS340;
Chatterjee;Tanima;DS120;
Goldner;Kira;DS320;


The reader object returns the file one line at a time, with each line returned as a list of strings.  (Here, we're assuming there are only 3 columns, but there's no such limitation in general.)

We could write instead of read a CSV file.  We need to change the mode from "r" for read to "w" for write, and change our reader object to a writer.  The CSV format is rather simple if you don't have to deal with values that include commas, but the CSV writer will handle these, too.


In [None]:
# vals1, vals2 are lists of strings
def demo_write_csv(filename, vals1, vals2):
    with open(filename, mode = 'w') as my_csv: #mode='w' for file writing
        writer = csv.writer(my_csv) #create a writer instead of reader, file that is writen  to
        writer.writerow(vals1) #hand a list of what to write for a whole row , row 1 of new csv
        writer.writerow(vals2)  #row 2 for new csv

vals1demo = ['peach','pear','plum']
vals2demo = ['strawberry','orange','grape']

demo_write_csv('fruits.csv',vals1demo,vals2demo)

!ls
!cat fruits.csv #echos back what is in that csv --> row 1 = vals1demo, row 2 = vals2demo

fruits.csv  sample_data  small_csv.csv
peach,pear,plum
strawberry,orange,grape


In both the read and write cases, the **"with" keyword closes the file** for us when we exit the block, freeing it up for the operating system to allow another program to use the file.  (*with* will, in general, call the method named \_\_exit\_\_() for its object when the block is done.)


We can see the file in its directory with the !ls command.  ! alerts Google Colab that what follows is a system command of the kind you could use at the command line (Terminal in Mac), and ls is a command to list the contents of the current directory.

In [None]:
!ls

fruits.csv  sample_data  small_csv.csv


CSV is a very popular format for datasets that are in the public domain, such as those available on Kaggle.  Here, we load a truncated version of a CSV that contains data for 911 calls in Montgomery County, Pennsylvania.

In [None]:
from google.colab import files

uploaded = files.upload()

Saving 911.csv to 911.csv


In [None]:
import csv
# 911.csv from https://www.kaggle.com/datasets/mchirico/montcoalert?resource=download --> website w bunch of data sets
# Truncated to 20000 lines so the file isn't so big
with open('911.csv', mode='r') as my_csv:
    reader = csv.reader(my_csv)
    count = 0   # Print just first real line as an example - first line is header
    for record in reader:
        count += 1
        if count == 2:
            for item in record:
                print(item)
            break
# Fields are latitude, longitude, description, zip code, title, timestamp, township, address,
# and a column that's always 1 for some reason

40.2978759
-75.5812935
REINDEER CT & DEAD END;  NEW HANOVER; Station 332; 2015-12-10 @ 17:10:52;
19525
EMS: BACK PAINS/INJURY
2015-12-10 17:10:52
NEW HANOVER
REINDEER CT & DEAD END
1


CSV isn't the only viable format for writing out data.  Another popular option is **JSON (JavaScript Object Notation)**.  JSON is popular as a platform-independent way to transfer key-value pairs; the Twitter API uses it, for example.  A JSON object is **like a dictionary** in that it **stores property names (keys) and values associated with those keys**.

//JSON is kinda like the file equivalent of a dictionary//

To write a JSON object to file, you need only **call json.dump** on a dictionary holding the key-value pairs, also providing the file to dump the JSON into.


In [None]:
import json

def demo_dump_json(filename, dict):
    with open(filename, 'w') as myfile:
        json.dump(dict, myfile)

release_years = {
    'Metroid': 1986,
    'Persona 3': 2006,
    "Baldur's Gate 3": 2023
}

demo_dump_json('release.json', release_years)

# to target file:
# {"Metroid": 1986, "Persona 3": 2006, "Baldur's Gate 3": 2023}


In [None]:
#!ls
!cat release.json

{"Metroid": 1986, "Persona 3": 2006, "Baldur's Gate 3": 2023}

Values in JSON objects can be strings, numbers, Boolean values (but lowercase), null, arrays, or other JSON objects.  Thus they can potentially communicate richer structure than a CSV.  They're often how a variety of cloud-based services communicate.



The fancy word for **committing data to a file** is **serialization**.

Python used to use another, python-specific method of serialization called pickling -- but, it was a little too powerful, as unpacking a pickle could cause arbitrary code to execute.  Now, pickling seems to be less used in favor of platform-independent formats.



JSONs can be read into dictionaries as well.


In [None]:
def demo_read_json(filename):
    with open(filename, 'r') as myfile:
        my_dict = json.load(myfile)
    return my_dict

my_dict = demo_read_json('release.json')
my_dict

{'Metroid': 1986, 'Persona 3': 2006, "Baldur's Gate 3": 2023}

The dictionary we wrote to file in the previous step can be read right back as a dictionary.

# Directly to DataFrame

As we did in a previous lecture, you can load a CSV into a DataFrame without using the methods discussed in this lecture; pd.read_csv() reads directly into a DataFrame.

In [None]:
import pandas as pd
df = pd.read_csv("fruits.csv", names = ["fruit1", "fruit2","fruit3"])
df.head()

Unnamed: 0,fruit1,fruit2,fruit3
0,peach,pear,plum
1,strawberry,orange,grape


Writing a DataFrame to CSV is similarly straightforward.

In [None]:
df.to_csv('fruits2.csv') #will write the data frame to the file
!ls #Lists files and directories in the current directory

911.csv  fruits2.csv  fruits.csv  release.json	sample_data  small_csv.csv


# Intro to Exceptions

Input and Output (IO) is generally a place where things may not work as expected -- the looked-for file isn't there, or we didn't get permission to write. When things don't go as planned, an exception is thrown.

Exceptions are objects, and there are multiple kinds depending on the error that occurred: FileNotFoundError, ZeroDivisionError, and ValueError are examples (this last occurs if you try to parse a non-integer string as an integer). If an exception occurs and it isn't "caught," the program immediately terminates, reporting where the error occurred in a way you're familiar with from debugging.

If an exception is caused by a bug, you should fix the bug. But if the exception can happen because of bad input or bad circumstances, then your program should catch it and respond gracefully.

The **"try" keyword comes before a block of code that could throw an exception**. If an exception is thrown, it **can be "caught" with an "except" block** after the try block. Execution will jump to the "except" when an exception in the try block occurs.

Our JSON reader could have two obvious errors: the file doesn't exist, or the file isn't JSON. We can catch these errors in this way:

In [None]:
def safe_read_json(filename):
    try:  #when opening a file, usually put it within a try block
        with open(filename, 'r') as myfile:
            my_dict = json.load(myfile)
            print("File successfully loaded:")
            print(my_dict)
    except FileNotFoundError:  #potential error that can be caught // kind of like if FileNotFoundError appears,
        print("File not found: " + filename)
        return None
    except json.decoder.JSONDecodeError: #potential error 2 that can be caught
        print("JSON error in " + filename)
        return None
    return my_dict

safe_read_json('release.json')  # Try release.json, small_csv.json, not_found.json

File successfully loaded:
{'Metroid': 1986, 'Persona 3': 2006, "Baldur's Gate 3": 2023}


{'Metroid': 1986, 'Persona 3': 2006, "Baldur's Gate 3": 2023}

The behavior may seem very similar, but the program doesn't crash this way, and it returns a more informative error to the user.

(Note that "except" without a specific exception following it will catch all exceptions.)


Two other keywords **associated with exceptions are "else" and "finally."** "Else" can appear after except blocks to say what should happen if there weren't errors.  And "**finally**" can appear after all of that to **give code that should happen regardless** - probably some kind of cleanup.  Both are used somewhat infrequently.

In [None]:
def safe_read_json_last_call(filename):
    try:
        with open(filename, 'r') as myfile:
            my_dict = json.load(myfile)
            print("File successfully loaded:")
            print(my_dict)
    except FileNotFoundError:
        print("File not found: " + filename)
    else:
        print("No errors!")
    finally:
        print("End of demo!")

safe_read_json_last_call('small_csv.json')

You should not use exceptions for typical situations.  Don't index out of bounds at the end of a loop and then catch it, for example.  They're mostly for use when reality isn't cooperating with the program, like with missing or bad files.  The way they jump around in the code is undesirable, but it's better than the program crashing.

# Exercise (4 min)

Modify the CSV reading function demo_read_csv so that it catches a FileNotFoundError, printing an error message in that case.

In [None]:
def demo_read_csv(filename):
    with open(filename, mode='r') as my_csv:
        reader = csv.reader(my_csv)
        for record in reader:
            out = ''
            for item in record:
                out += item + ';'
            print(out)

demo_read_csv("small_csv.csv")

In [None]:
def demo_read_csv(filename):
  try:
    with open(filename, mode='r') as my_csv:
        reader = csv.reader(my_csv)
        for record in reader:
            out = ''
            for item in record:
                out += item + ';'
            print(out)

  except FileNotFoundError:
    print("File not found")

demo_read_csv("small_csv.csv") #try small_csv.csv and x_csv.csv

File not found
