<a href="https://colab.research.google.com/github/ksariash/Python_Crash_Course/blob/main/Ch_10_Files_Exceptions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 10 - Files and Exceptions

This chapter will focus on getting information from a file, creating a new file whie running a program, storing information into a file, and managing errors in the code so that it does not crash the program.

## Reading from a File

A lot of data is stored in text files, and the first step to working with the text information from the file is to read it into [the computer's] memory. You can read in the entire contents of a file, or read in information one line at a time.

### Reading an Entire File

From the [author's website](https://ehmatthes.github.io/pcc_2e/) download the zip file of all the course material. In the folder labeled "chapter 10", there is a file called `pi_digits.txt` that will be used for the following example. It is easier if a copy of the file is in the same directory (folder) as the program file you are running.

To open a file in Python, the compound statement `with open(filename) as file_variable_name:` does several steps. First, the `open(filename)` function allows a connection into to the file to access the contents. By default, the `open` function is in ***read mode***. The `with` statement, when used along with `open()` makes sure that the file is closed when the code tasks are done executing. Lastly, before starting the code block, `as file_variable_name:` assigns an alias to the open file while working within it until it is closed.

The `.read()` function will get an in-memory copy of the contents of the file (it does not remove information out of the file) and then store it as a single string value.

In [None]:
# file contains the number pi to 30 decimal places
# 3 lines in the file with 10 decimal places per line

# open the file "pi_digits.txt" with the alias "file_object"
with open("pi_digits.txt") as file_object:
    
    # read the information in the file as a string
    # store it in the variable contents
    contents = file_object.read()
    print(contents)

In [None]:
# verify that "contents" is a string
type(contents)

In the output shown, lines 2 and 3 have extra spaces to format the number. What can't be seen in the `print` output is a newline at the end of each line in the file.

In [None]:
# newline (\n) at end of each line
# there is a 4th line in the file that is blank
contents

In [None]:
# remove the newline from last line

with open("pi_digits.txt") as file_object:
    
    contents = file_object.read()
    # newline is located on the right side the last decimal sequence
    print(contents.rstrip())

In [None]:
# last newline is removed
contents.rstrip()

### File Paths

In the example above, the `pi_digits.txt` file was in the same directory (folder) as the file that is running the program. So in the `with open(filename)` statement, only the name of the file was passed into the function as a string. However, sometimes the file that will be accessed is located in a different directory, so Python will need the ***file path***, which is the file's location. 

A ***relative filepath*** is the location of a file that is in a sub-level folder from the program file's location. For example, there is a file called `filename.txt` located in a folder called `text_files`, and that folder is located in the same directory as the program that needs access to `filename.txt` (let's call the program file `program.py`). So the structure looks like this:

- Parent_Folder
    - program.py
    - text_files (folder)
        - filename.txt
        
To access the file in another folder, then the filepath would be `text_file/filename.txt`. This filepath could also be used in the `with open()` statement to tell it where to look for the file and access its contents. (Example: `with open(text_file/filename.txt)`)

An ***absolute file path*** is the location of a file located several levels above where the program is running, and sometimes on another drive. There are differences in the structure of an absolute file path on Linux-based vs Windows computers but the concept is still the same.

Using the same `filename.txt` example, let's say that the `text_files` folder is now located two levels above the `Parent_Folder` with the `program.py` file. This is the structure of the relationship:

- Some_Folder
    - text_files (folder)
        - filename.txt
    - Another_Folder
        - Parent_Folder
            - program.py
            
In Linux-based systems, the file path structure would start from the root location of the local computer's drive, usually called `home`. To find the file, the file path would be `/home/any_other_folders/Some_Folder/text_files/filename.txt`. On a Windows computer, the local drive is usually the C: drive and so the file path would look like `r"C:\Users\your_username\any_other_folders\Some_Folder\text_files\filename.txt"`. The "r" used before the string quotations is to read the string as ***raw***, meaning to not interpret any special sequence of characters (such as `\n` or `\t`). 

**NOTE**: Linux-based systems use a forward slash (`/`) and Windows computers use a back slash (`\`).

### Reading Line by Line

When reading the contents of a file, it can be loaded as a whole element, as in the previous example, or the file can read in each line as their own individual elements. One method of reading information line by line is to iterate using a `for` loop.

In [None]:
filename = "pi_digits.txt"

with open(filename) as file_object:
    
    # read each line in the "pi_digits.txt" file, then display
    # there are 4 lines in the file (including blank newline)
    for line in file_object:
        print(line)

In [None]:
with open(filename) as file_object:
    
    for line in file_object:
        # strip the newline character from each line
        print(line.rstrip())

### Making a List of Lines from a File

Even though each line can be treated as its own element using a `for` loop, it is only useable within the `with open()` statement. To use the lines outside of the code block, they can be assigned to a variable as a list, then iterated through using a `for` loop.

In [None]:
# using same "pi_digits.txt" file from previous example

with open(filename) as file_object:
    
    #assign list created by readlines() function to variable
    lines = file_object.readlines()
    
# iterate through list variable
for line in lines:
    
    # display line with right whitespace removed
    print(line.rstrip())

### Working with a File's Contents

After a file has been read into memory as a string , the data can then be used in many ways.

In [None]:
# build a string of the numbers concatenated together with no newlines

with open(filename) as file_object:
    lines = file_object.readlines()
    
# empty string to hold the concatenated numbers
pi_string = ''

for line in lines:
    
    # remove right whitespace, then concatenate line with pi_string and reassign pi_string with the new value
    pi_string += line.rstrip()

print(pi_string)
print(len(pi_string))

In [None]:
# build a string of the numbers concatenated together with no spaces
# use "lines" variable from previous example

pi_string = ''

for line in lines:
    
    # strip all whitespace, then concatenate line with pi_string and reassign pi_string with the new value
    pi_string += line.strip()
    
print(pi_string)
print(len(pi_string))

### Large Files: One Million Digits

In the previous examples, the file had only three lines of data but the same code will also work on larger files.

In [None]:
# use file "pi_million_digits.txt" on code block from previous example
# will limit output to first 50 decimal places (to save screen space)

filename = "pi_million_digits.txt"

with open(filename) as file_object:
    lines = file_object.readlines()

pi_string = ''

for line in lines:
    pi_string += line.strip()

# display first 52 characters of pi_string
print(pi_string[:52] + "...")
print(len(pi_string))

### Is Your Birthday Contained in Pi?

Let's use the program created in the previous example to check if a person's birthday within the first million digits of pi.

In [None]:
# use the value stored in pi_string variable from previous example

# prompt user for their birthday
birthday = input("Enter your birthday, in the form mmddyy: ")

# check if string value of birthday is in pi_string
if birthday in pi_string:
    print("Your birthday appears in the first million digits of pi!")
else:
    print("Your birthday does not appear in the first million digits of pi.")

## Writing to a File

One of the easiest ways to save data is to **write** (store) it to a file. Once data is written to a file, it can be accessed even after the program script that created it has been closed. The file can also be shared with other people and if the data is needed again in the program script, then the file can be read into the program. Text files created by a program behave exactly as any other text file does - you can type new text into it, copy text from it, and paste text into it.

### Write to an Empty File

To write information to a file, the `with open(file, 'w')` statement is used, which includes the argument `'w'` to open the file in write mode. If no argument is passed through the `open()` function, then the default mode is *read*.

**Caution!** `open()` will automatically create a new file if the filename does not exist. However, if the filename does exist then Python will erase the previous file and its contents to create the new file.

***Note:*** Python can only write strings to a text file. If you want to store numererical data in a text file, the data should first be converted to a string format using the `str()` function.

In [None]:
# save a sentence into a text file

# this will be the name of the new file
filename = "programming.txt"

# create the file and open it in 'write' mode
with open(filename, 'w') as file_object:
    
    # write the string message into the file
    file_object.write("I love programming.")

### Writing Multiple Lines

The `.write()` function does not add any newlines to the text that is typed. If there is more than one line for the text but newlines were not explicitly added, then the contents of the file may not look as expected.

In [None]:
# use same 'filename' variable from previous example

with open(filename, 'w') as file_object:
    file_object.write("I love programming.")
    file_object.write("I love creating new games.")

***What happened?***

The contents of the file display `I love programming.I love creating new games.` on the same line, despite using two separate `.write()` functions to add text to the file. However, each string value written to the file did not contain a newline character, so they were saved in the file subsequently, even without spacing between the sentences. 

In [None]:
# add newline characters to strings when writing to file

with open(filename, 'w') as file_object:
    file_object.write("I love programming.\n")
    file_object.write("I love creating new games.\n")

The contents of the file now look like <br>
`I love programming.
I love creating new games.`

### Appending to a File

Because a file's contents are completely erased each time it is opened in write mode, using a file in **append** mode (`with open(file_variable, 'a')`) allows a user to add new content to the file without deleting any previously existing text.

In [None]:
# use the current "programming.txt" file that has 2 lines of text in it

with open(filename, 'a') as file_object:
    
    # the following lines will be added to the file, for a total of 4 lines in the file
    file_object.write("I also love finding meaning in large datasets.\n")
    file_object.write("I love creating apps that can run in a browser.\n")

The results of the file look like this: <br>
`I love programming.
I love creating new games.
I also love finding meaning in large datasets.
I love creating apps that can run in a browser.`

## Exceptions

Whenever an error occurs in a Python script, the program usually stops and shows a ***traceback*** (report of the error) due to the ***exception object*** (the error itself) that happened. Instead of the program always crashing (stopping) whenever there is an error, a `try-except` block can be incorporated into the script to handle errors more gracefully. The `try` statement will test to see if a task in the proceeding code block produces an error. If it does not raise an error, then the program can successfully move on to other remaining tasks. However, if the `try` code block generates an error, then `except` will execute a code block that handles the error and then will allow the program to continue running.

### Handling the ZeroDivisionError Exception

In math, it is impossible to divide any number by zero. Python has a `ZeroDivisionError` that is produced when a user attempts to divide by zero. 

In [None]:
# try to divide by zero

print(5/0)

### Using `try-except` Blocks

If the previous example's code block were in a children's math app, then it would not be good aesthetic design to show the traceback. In the case of malicious attackers, then they would know the vulnerabilities that exist in the program.

Let's add a `try-except` block to tell the user that they cannot divide by zero.

In [None]:
# try statement will see if the code block raises an error
try:
    print(5/0)
    
# except statement will show message to user instead of traceback
except ZeroDivisionError:
    print("You can't divide by zero!")

### Using Exceptions to Prevent Crashes

Handling errors is important when the program has other tasks to execute beyond the code block that caused the crash.

In [None]:
# simple calculator function that only does division
# request user to input two numbers, and they will be divided

print("Give me two numbers, and I'll divide them.")
print("Enter 'q' to quit.")

#continue to prompt the user for info unless they quit
while True:
    
    first_number = input("\nFirst number: ")
    if first_number == 'q':
        break
    
    second_number = input("Second number: ")
    if second_number == 'q':
        break
    
    # divide the first number by the second number
    answer = int(first_number) / int(second_number)
    print(answer)

### The `else` Block

In the example, the program never executes `print(answer)` because the line before it produced an error from attempting to divide by zero. When there are tasks that should be run only if the `try` statement does not generate an error, then the `else` statement is added after `except`.

In [None]:
print("Give me two numbers, and I'll divide them.")
print("Enter 'q' to quit.")

while True:
    
    first_number = input("\nFirst number: ")
    if first_number == 'q':
        break
    
    second_number = input("Second number: ")
    if second_number == 'q':
        break
    
    try:
        answer = int(first_number) / int(second_number)
    except ZeroDivisionError:
        print("You can't divide by zero!")
    
    # will execute if the "try" is okay
    else:
        print(answer)

### Handling the `FileNotFoundError` Exception

A common difficulty of using files in a program is when the file is "missing". The file may be in a different location, the file name could be misspelled, or the file does not exist at all. Whenever Python cannot find a file that it is trying to access, it raises a `FileNotFoundError` exception. 

***Note:*** On some systems, the `FileNotFoundError` is called an `IOError`.

When accessing a file, sometimes the default encoding of the computer system does not match the encoding of the file.  For that case, in the `with open()` statement, an argument called `encoding` can be used with a string value of the encoding type of the file.

In [None]:
# this is the file that will be used in the program
filename = "alice.txt"

# text files can have different encoding types such as "utf-8"
# encoding is used when the system's default encoding does not match the file's encoding
with open(filename, encoding='utf-8') as f_obj:
    contents = f_obj.read()

In [None]:
filename = 'alice.txt'

# "try" statement will check if opening the file causes an error
try:
    with open(filename, encoding='utf-8') as f_obj:
        contents = f_obj.read()
        
# if there is a FileNotFoundError, then display a message to the user
except FileNotFoundError:
    msg = "Sorry, the file " + filename + " does not exist."
    print(msg)

### Analyzing Text

Although much more difficult to do, text data can be analyzed like numerical data. One of the simplest methods of analyzing text is to count the number of words. The `.split()` function in Python separates a string value into individual items, based on a delimiter (by default, the delimiter is a whitespace) and then stores them into a list. Other string values can be used as a delimiter - punctuation, a single alphanumeric character, or even a set of alphanumeric characters. 

***Setup***: From the files that are used for this book, add a copy of the `alice.txt`, `moby_dick.txt`, and `little_women.txt` files to the same directory as this script.

---
The texts in the following examples are from the Project Gutenburg website, which contains a collection of public domain literary works. More texts can be found at https://www.gutenberg.org/.

In [None]:
title = "Alice in Wonderland"

# will separate string into items in list (whitespace is delimiter)
title.split()

In [None]:
# this file should now be in the directory
filename = "alice.txt"

# same code block that will try to read the file
try:
    with open(filename, encoding='utf-8') as f_obj:
        contents = f_obj.read()
except FileNotFoundError:
    msg = "Sorry, the file " + filename + " does not exist."
    print(msg)
else:
    # separate the string value from the file using whitespace as delimiter
    # "words" is a list
    words = contents.split()
    
    # get the length of the "words" list
    num_words = len(words)
    
    # it is an approximation because not all items separated by a whitespace are words
    # could be punctuation or extra publisher information
    print("The file " + filename + " has about " + str(num_words) + " words.")

### Working with Multiple Files

In the previous example, only one book was analyzed for word count. However, if there were multiple works that needed to be analyzed, then a function should be created for reuse.

In [None]:
# create the function
def count_words(filename):
    """Count the approximate number of words in a file."""
    
    try:
        with open(filename, encoding='utf-8') as f_obj:
            contents = f_obj.read()
    except FileNotFoundError:
        msg = "Sorry, the file " + filename + " does not exist."
        print(msg)
    else:
        words = contents.split()
        num_words = len(words)
        
        print("The file " + filename + " has about " + str(num_words) + " words.")

In [None]:
filename = "alice.txt"

# use the function - output is either word count or FileNotFoundError message
count_words(filename)

In [None]:
# try the count_word function on multiple files
filenames = ['alice.txt', 'siddhartha.txt', 'moby_dick.txt', 'little_women.txt']

# iterate through each file name and use count_words on it
for filename in filenames:
    count_words(filename)

The `FileNotFoundError` caused by the `siddhartha.txt` file displayed the user-friendly message set in the program.

### Failing Silently

In the previous examples, the user was shown a message whenever a file was not available. But all exceptions do not need to be reported. The program can ***fail silently***, which will do nothing when an error occurs but still continue running the rest of the tasks in the program. The `pass` statement can be used in a code block to skip over an item in the program when it meets a particular condition.

In [None]:
def count_words(filename):
    """Count the approximate number of words in a file."""
    
    try:
        with open(filename, encoding='utf-8') as f_obj:
            contents = f_obj.read()
    except FileNotFoundError:
        # do nothing when the file does not exist
        pass
    else:
        words = contents.split()
        num_words = len(words)
        
        print("The file " + filename + " has about " + str(num_words) + " words.")

In [None]:
filenames = ['alice.txt', 'siddhartha.txt', 'moby_dick.txt', 'little_women.txt']

for filename in filenames:
    count_words(filename)

No message was shown for the `siddhartha.txt` file because although it caused a `FileNotFoundError` exception, the `pass` statement skipped executing any tasks for that file and continued with the remainder of the program.

### Deciding Which Errors to Report

How do you know when to report an error to your users and when to fail silently? Giving users information that they don't need can decrease the usability of a program. The error-handling structures in Python allow developers to control how much information should be shared to users when things go wrong. 

Well-written, properly tested code is not very prone to internal errors, such as syntax or logical errors. But if the program depends on something external, like user input, the existance of a file, or the availability of a network connection, there is a possibility of an exception being raised. With more experience it will become easier to decide where to include exception handling blocks in a program.

## Storing Data

As seen in the previous lessons, users are sometimes prompted to input information in a program. Until now, that information has only been used while the program is still running. However user information may need to be stored for later use when the program is run again at a later time. A simple way to do this is by using the **JSON (JavaScript Object Notation)** data format. JSON is a data structure that integrates well into many programming languages, so it is easy to send and share data.

In Python, the `json` library is most commonly used to read and write JSON data.

### Using `json.dump()` and `json.load()`

The `json.dump(json_object, file_object)` function is used to write JSON information to a file. `json.load(file_object)` reads in the information as a JSON object that was stored in a JSON file.

In [None]:
# Example: Store numbers into a file as a JSON object

# import the JSON library
import json

# list of numbers that will be stored
numbers = [2,3,5,7,11,13]

filename = "numbers.json"

# create a new file and open it in "write" mode
with open(filename, 'w') as f_obj:
    
    #  write the "numbers" list to the file as JSON
    json.dump(numbers, f_obj)

In [None]:
# Example: Load the information stored in file from previous example

with open(filename) as f_obj:
    # read in the contents from file as JSON
    numbers = json.load(f_obj)
    
print(numbers)

### Saving and Reading User-Generated Data

If the information requested from a user during a program is not stored, then it is lost once the program finishes. Storing data makes it efficient to use so that a user is not prompted to input the same information again.

In [None]:
# Example: Prompt user for name on first use, then remember it when they use the program again

username = input("What is your name?")

filename = "username.json"
with open(filename, 'w') as f_obj:
    
    # store username in file
    json.dump(username, f_obj)
    
    # display message to user
    print("We'll remember you when you come back, " + username + "!")

In [None]:
# Example: Greet user that already has stored username

with open(filename) as f_obj:
    username = json.load(f_obj)
    print("Welcome back, " + username + "!")

***NOTE:*** For the next example, delete the `username.json` file created from the previous example.

In [None]:
# Example: Combine code for storage and recall of username
#          Prompt for new username if there is no username stored

# see if the file can be opened and read
try:
    with open(filename) as f_obj:
        username = json.load(f_obj)

# if there is no file, then create and prompt for name
except FileNotFoundError:
    username = input("What is your name? ")
    with open(filename, 'w') as f_obj:
        json.dump(username, f_obj)
        print("We'll remember you when you come back, " + username + "!")

# if the "try" was successful, greet the user
else:
    print("Welcome back, " + username + "!")

**NOTE:** Run the previous code again to display the greeting meesage to the user.

### Refactoring

Although programs are created with an overall task or functionality, it is good practice to break up sections of the code into functions that have specific jobs. This process is called ***refactoring***. Refactoring makes the code cleaner, easier to understand, and easier to extend.

In [None]:
# Example: Refactor the previous example into a function

def greet_user():
    """Greet the user by name."""
    
    filename = "username.json"
    
    try:
        with open(filename) as f_obj:
            username = json.load(f_obj)

    except FileNotFoundError:
        username = input("What is your name? ")
        with open(filename, 'w') as f_obj:
            json.dump(username, f_obj)
            print("We'll remember you when you come back, " + username + "!")

    else:
        print("Welcome back, " + username + "!")

In [None]:
# use the "greet_user" function created
greet_user()

In [None]:
# Example: Refactor the code to create a function for retrieving a stored username
#          Update "greet_user" to include new function

def get_stored_username():
    """Get stored username if available."""
    
    filename = "username.json"
    
    try:
        with open(filename) as f_obj:
            username = json.load(f_obj)

    except FileNotFoundError:
        return None
    else:
        return username


def greet_user():
    """Greet the user by name."""
    
    # username will store whatever value is returned from "get_stored_username" function
    username = get_stored_username()
    
    # check if there is at least one item/character (not None) in "username"
    if username:
        print("Welcome back, " + username + "!")
    
    # if there was "None", then prompt for new name and store it in file
    else:
        username = input("What is your name? ")
        
        filename = "username.json"
        with open(filename, 'w') as f_obj:
            json.dump(username, f_obj)
            print("We'll remember you when you come back, " + username + "!")

In [None]:
# try program with "get_stored_username()" refactoring
greet_user()

In [None]:
# Example: Refactor section of code that prompts for a new username and stores information in file; update "greet_user"

def get_stored_username():
    """Get stored username if available."""
    
    filename = "username.json"
    
    try:
        with open(filename) as f_obj:
            username = json.load(f_obj)

    except FileNotFoundError:
        return None
    else:
        return username

    
def get_new_username():
    """Prompt for a new username."""
    
    username = input("What is your name? ")
        
    filename = "username.json"
    with open(filename, 'w') as f_obj:
        json.dump(username, f_obj)
        
    # after "username" has been stored, send the value to where the function is called
    return username


def greet_user():
    """Greet the user by name."""
    
    username = get_stored_username()
    
    if username:
        print("Welcome back, " + username + "!")
    
    # username will store value returned from "get_new_username" function; used to display message to user
    else:
        username = get_new_username()
        print("We'll remember you when you come back, " + username + "!")

In [None]:
# try fully refactored program
greet_user()

**NOTE:** Try deleting the `username.json` file to test that the program correctly prompts for a new username when the file does not exist.

Each function in the final version of this program has a single, clear purpose. `greet_user()` will welcome back an existing user or greets a new user. It does this by first calling `get_stored_username()` which is responsible for retrieving a username if one exists in the file, otherwise it will return `None`. `greet_user()` will do a check to see if a value (other than `None`) was returned from `get_stored_username` and if so, it will welcome back the user. However, if the value returned was `None`, then `greet_user()` will call `get_new_username` and prompt the user for a mew username and then store it in a file. Finally, `greet_user` will then used the value returned from `get_new_username` to let the user know that it will remember them when they return.