In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("filesExceptions.ipynb")

## Lecture Section

As statisticians, data scientists, researchers, academics, etc., we often turn to a programming language to work with data. This lecture will cover:
* How to load in text (.txt) and csv (.csv) files
* How to read/iterate through the files
* How to handle exceptions & errors

In the next lesson, we will learn how to manipulate the data we read/load in!

### Files

When we read in a file, we need to open it first. There are two common methods to do this, but both use the same `open()` function.

The first creates a `pointer` variable. This `pointer` *points* at the beginning of the file. We use `fp` as the variable name for the `file pointer`.

If we are only reading from our file, we open it with the argument `r`. If we are writing to the file, we open it with the argument `w`.

In [None]:
from numpy.lib.function_base import place

fp = open("data/couple_sentences.txt", 'r')

This fp is **iterable**, just like a list or string, but it is **not** directly useful.

In [None]:
print(fp) # not useful

There are several ways to iterate through the pointer. We can do it line-by-line with `.read()`.

In [None]:
fp.read()

Once we read the first line, the `pointer` moves to the second. Let's see if there is something there:

In [None]:
fp.read()

Nope! How do we get it back to the beginning, then?

We use the **method** `.seek()` with an argument of `0`!

In [None]:
fp.seek(0)

Now we can iterate through our file in a different way. We can use a `for`-loop!

In [None]:
for line in fp:
    print(line)

fp.seek(0)

If you want to skip a line, you can use the `.readline()` method. It will return the skipped line, too!

In [None]:
first_line = fp.readline()
first_line

Whenever we read or write files, we need to close them at the end. This is even more important when we write to one. You can find the file we have written to in the 'data' folder after the code has run.

In [None]:
file_obj_out = open('data/write_ex.txt',"w")
file_obj_out.write("you're doing great!")
file_obj_out.close()

Another way of reading data is to do it with a `with` statement. `with` automatically closes the file for you, so it's often preferred. In the next example, we try to iterate through the file after the `with`, but it won't let us because it automatically closed!

The `.strip()` method is just removing extra white-space from the beginning and end of our line, if it exists.

In [None]:
with open('data/write_ex.txt', 'r') as file:
    for line in file:
        print(line.strip())

for line in file:
    print(line.strip())


Everything above works for most file-types: not just `.txt`. There are libraries that make reading some file types, like `.csv` easier, though. Before we get there, we will talk about some specific `.txt` items.

#### `.txt`

We can loop through by word and character, too.


In [None]:
fp.seek(0)
for line in fp:
    print(line)
    for character in line:
        print(character)

fp.seek(0)

If we want to grab the word, we can use the `string` method `.split()` with a space ` ` as the argument!

In [None]:
for line in fp:
    #print(line)
    for word in line.split(" "):
        print(word)
        for character in word:
            pass
            #print(character)

fp.seek(0)

If we want to grab the word, we can use the `string` method `.split()` with a space as the argument!

In [None]:
for line in fp:
    #print(line)
    for word in line.split(" "):
        print(word)
        for character in word:
            pass
            #print(character)

fp.seek(0)

This method of reading data will also work for `.csv` files; however, it is not very efficient. You will need to split the data on the coma `,`, and it can get messy very quickly. If we try to do it without splitting, we don't get the results we expect.

In [None]:
fp2 = open("data/random_data.csv", 'r')
fp2.read()

#### `.csv`

With `.csv` files, we can use the `csv` library and the `csv.reader()` method. We still need to `open()` the file, though.

In [None]:
import csv
fp = open("data/random_data.csv", 'r')
csv_reader = csv.reader(fp)
csv_reader

We can iterate through the file like usual, now

In [None]:
for row in csv_reader:
    print(row)

We have two problems here:

**All data is read through as a `string`, regardless of what it *should* be.**

This is true for all file-types; however, it's easier to forget when reading in `.csv` files. Always convert numerical data when you read it in.

Our other issue is with our colors. If you look in the original file, you'll see the colors are quoted but have a coma within them. They should not be split. Since quoted variables are accepted, and the quotes are removed, we just need to set the argument `skipinitialspace` in `csv.reader()` to `True`.

In [None]:
fp.seek(0)
csv_reader = csv.reader(fp, skipinitialspace=True)
data_list = []
for row in csv_reader:
    data_list.append([row[0], int(row[1]), float(row[2]), row[3]])
print(data_list)


If your file is deliminated by something other than a coma, which is the standard for coma-seperated csv files, you can change the deliminator argument in `csv.reader()`.

### Exceptions

This is a great time to introduce error-handling. Up to this point, we've only talked about avoiding exceptions/errors. Now, we will learn how to handle them!

When we try to open a file that does not exist, we get `FileNotFoundError`.

In [None]:
fp = open("data/not_real.txt", 'r')

#### Try/Except

Luckily, we can use `try` and `except` to handle our error.

We begin with `try:`, and we put the code we are worried about beneath it.
After, we put `except:`, which is the code we run if an error or exception occurs in the code we `try:`

In [None]:
try:
    fp = open("data/not_real.txt", 'r')
    fp.close()
except:
    fp = open("data/couple_sentences.txt", 'r')
    fp.close()

Your IDE might give you a warning about the `except:` above. This is because it will catch *any* error that occurs, and we should be more specific.

In [None]:
try:
    fp = open("data/not_real.txt", 'r')
    fp.close()
except FileNotFoundError:
    fp = open("data/couple_sentences.txt", 'r')
    fp.close()

In [None]:
try:
    y = x + 5
    fp = open("data/not_real.txt", 'r')
    fp.close()
except FileNotFoundError:
    fp = open("data/couple_sentences.txt", 'r')
    fp.close()


You still need to be careful - once an error is caught, the rest of teh code under `try` won't run.

In [None]:
try:
    fp = open("data/not_real.txt", 'r')
    fp.close()
    y = 7
except FileNotFoundError:
    fp = open("data/couple_sentences.txt", 'r')
    fp.close()
y

If the code doesn't run, we won't catch other errors, either...

In [None]:
try:
    fp = open("data/not_real.txt", 'r')
    fp.close()
    y = x + 5 # an error, but doesn't run!
except FileNotFoundError:
    fp = open("data/couple_sentences.txt", 'r')
    fp.close()

## Assignment Section

**Question 1.**

We will borrow a dataset from Kaggle for this exercise. I have edited it for the purposes of this course.
https://www.kaggle.com/datasets/msjahid/colorado-motor-vehicle-sales-data

For this problem, you will use `csv.reader()` to read the `"data/colorado_motor_vehicle_sales.csv"` file. You will extract the year [0], quarter [1], place [2] and sales [3] from each line and build a nested dictionary called `col_dict`. The dictionary format will be:

`col_dict[year (str) ] = {place (str) : sales (int)}`

If the place is equal to `xxx`, skip that line. If the sales cannot be converted to an `int` or are 0, skip that line. There will be more than one sale for most places and years: add the current sale to the one in the dictionary.

Hints:
1. Start by checking the skip conditions.
2. Check if year exists, then the place. If place exists, sales must exist.
3. If something doesn't exist, you know you need to add it!

Example:

`{ "2009" : {"Adams" : 387302, "Knox" : 823470}, "2010" : {"Adams" : 923740}}`





In [None]:
import csv


fp = open("data/colorado_motor_vehicle_sales.csv", 'r')
csv_reader = csv.reader(fp)
col_dic = {}
...

print(col_dic)


In [None]:
grader.check("q1")

**Question 2.** For this problem, you will create a function called `fix_list()`.
* Parameters: 1. A list.
* Returns: A new list.
* Goal: You are given a sample list to test the function. The function should iterate through a list, convert the items to an integer, then append the integer list items to new list, and return the new list. If a string has a single decimal in it, it needs to be converted to a float, first. If it has two decimals, remove the second decimal and convert. You are only converting ValueErrors. If the value is `None` or results in a TypeError, append 0 and move on.

In [None]:
l = ["1", "1.2", "3..2", "4.3.2", "2001", None] # should be [1, 1, 3, 4, 2001, 0] when converted to integers
def fix_list(l):
    ...




In [None]:
grader.check("q2")

**Question 3.** For this problem, you will create a function called `error_dictionary()`
* Parameters: 2. The first is a dictionary, and the second is a potential key value in the dictionary.
* Returns: Either "Found!" if the key variable is in the dictionary, or "Not Found!" if it is not.
* Example: `error_dictionary({'name':'Alex', 'color': 'green'}, 'music')` should return `Not Found!` since `music` is not a key in the given dictionary.

In [None]:
...

In [None]:
grader.check("q3")

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()