# Python Fundamentals - Part 3

### BONUS More About Modules

### Importing modules
A quick bonus lesson about importing modules. Later in this notebook we are going to be using the `mean()` function again, from the `statistics` module. We learned in Part 1 that we can import a package like this:

In [None]:
import statistics

In [None]:
statistics.mean([6, 4, 2, 7, 6])

This can sometimes make our function names long, like `statistics.variance()`.

We can also import modules with a shortened **nickname** so that we don't have to type out the full module name every time we use a function:

In [None]:
import statistics as st

In [None]:
st.mean([6, 4, 2, 7, 6])

OR, if we know that we are only going to use one or two functions from a module, we can import only those functions. *When we do this, we do not have to include the module name when calling the function*:

In [None]:
from statistics import mean

In [None]:
mean([6, 4, 2, 7, 6])

In [None]:
from statistics import mean, mode

In [None]:
mean([6, 4, 2, 7, 6])

In [None]:
mode([6, 4, 2, 7, 6])

### <br>A very quick lesson on installing modules

There are several ways to install and update Python packages, depending on if you are working on your local computer, on a computing cluster, or on a cloud platform. One option is to do it from inside your Jupyter Notebook.<br><br>If you use `!` directly before a command in a Jupyter Notebook, it tells the computer that you are going to be speaking to the computer in your command line language instead of Python. We will practice by:
- installing the `pandas` package, which is a very commonly-used package for working with dataframes (it is already installed on your current system, but we will install it for practice here)
- making sure the `statistics` package is upgraded

We will use the `pip` package manager.

In [None]:
!pip install pandas

In [None]:
!pip install statistics --upgrade

You may need to restart your notebook kernel to use installed or upgraded packages. You do not need to restart your notebook now.

## 3.0 Part 3: Files
### Objects in Part 3
- Files

### Functions in Part 3
- File methods

### Concepts in Part 3
- reading and writing files
- importing and installing modules

#### First, where are the files we are working with today?

**If you are working on the cloud (for example Google Colab):** You will need to first run the line of code below. It will pull the files into your workspace. 

In [None]:
!wget https://raw.githubusercontent.com/nuitrcs/python_workshops_datarepo/refs/heads/main/alice.txt
!wget https://raw.githubusercontent.com/nuitrcs/python_workshops_datarepo/refs/heads/main/dogs.txt
!wget https://raw.githubusercontent.com/nuitrcs/python_workshops_datarepo/refs/heads/main/gradebook.csv

The files should now appear in your filetree in the same directory where this notebook is located. You should see them in your filetree. Look for `alice.txt` and `dogs.txt`.
<br><br>**If you are working locally on your own computer:** The files are here in the same repo where this notebook is located. You should see them in your filetree. Look for `alice.txt` and `dogs.txt`.

### 3.1 Reading files

We will be working with two files: "alice.txt" and "dogs.txt". We can first store the filenames of the files we will be working with as strings. I've chosen some basic variable names:

In [None]:
alice_filename = "alice.txt"
dog_filename = "dogs.txt"

It is best practice to define all filenames (both input and output filenames) at the top of your notebook, right under where you import your packages. We are breaking that rule today.

Python has a basic way to open files, but I'm going to teach you the better way. The way I teach you is the way all Python coders open files. You may someday encounter a logic situation where you need to use the old way to open a file, so I'll show the syntax to you briefly.

`f = open(filename, "r")`
<br>*`#do something with the file`*
<br>`f.close()`

This leaves the file needlessly open until you close it, which takes up memory. It also leaves you open to potentially forgetting to close the file.
<br><br>Files tend to take up more memory inside Python than other Python objects like strings, lists, and dictionaries.

#### with/as statement: The better way to open files 

Here is the syntax to read a file. (This code isn't ready to run, it's just to look at to see the syntax.)

`with open(filename, "r") as f:`
<br>`    #save file as some other object`
<br>`    #or save part of a file`

The first line is the **with/as** statement. The `f` is a temporary variable that will store the **file object**. Just like in a for loop, you can use anything for the temporary variable, but `f` is commonly used.
<br><br>**Inside** the with/as statement, you want to save the file as a different object type - something that doesn't take up as much memory as a file.
<br><br>The file will automatically close when we **exit** the with/as statement (meaning when we exit the indentation).

<br>**The open function**
<br>The `open()` function takes two arguments: the filename and the mode.

Mode options:
- "r"  read
- <span style="color:red">"w"  write (wipes the file clean if it already exists)
- "a"  append (add to the end of whatever is already in the file)


<br>**Filenames**
<br>If you are accessing a file in your current working directory, you can just include the filename, but if the file is in a different directory, you must include either the relative or absolute path.

<br><br>Let's try opening the file "alice.txt" and printing it to see what it looks like. We will use the read mode:

In [None]:
with open(alice_filename, "r") as f:
    print(f)

<br>**The file object isn't directly readable**, so we need to change it into another object before exiting the with/as statement.

### 3.2 Storing a file as a string

We can use a file object method function, `read()`, to change the file object into a string:

In [None]:
with open(alice_filename, "r") as f:
    alice_text = f.read()

We have now exited the with/as statement, so the file is closed. Let's try to access `f` again without reopening the file:

In [None]:
f.read()

We can also check the status of `f` using an object **attribute**. Some Python objects have attributes in addition to methods. Methods are functions that can be used only with instances of a particular object class, while attributes are like metadata that can be stored for instances of a particular object class. We can check an attribute to view the stored metadata. Attributes follow the object, just like methods, but they are not followed by parentheses because they are not functions.

<br>The `file` object has an attribute called `closed` that returns a boolean telling us if the file is closed:

In [None]:
f.closed

<br>The object we created inside our with/as statement, `alice_text`, is now stored in memory as a string.

In [None]:
type(alice_text)

In [None]:
print(alice_text)

<br><br>The `f.read()` method stored `alice_text` as one long string. Sometimes you will want that. Other times it will be convenient to instead store your text as a list of individual lines instead of one big string.

### 3.3 Storing a file as a list of strings (AKA list of lines)

To store the text as a list of strings, use the file method `readlines()`. *Note the `s` on the end of the function name.* This will break the whole text up by any **new line characters**.

In [None]:
with open(alice_filename, "r") as f:
    alice_list = f.readlines()

Now we can work with our file text as a list of lines.

In [None]:
type(alice_list)

In [None]:
len(alice_list)

In [None]:
for line in alice_list:
    print(line)

<br>**Question:** The `len()` function told us that the list was 7 lines long, but when we print it it looks like there are only 4 lines. What do you think is causing that? What code could you run to test your theory?

<br><br>We can now do anything with this list that we could do with any other list:

In [None]:
for line in alice_list:
    if "Alice" in line:
        print(line)

<br><br>As a reminder, the `f` variable I've been using in the with/as statement is a temporary variable that can be anything, just like when writing a for loop. `f` is just a commonly used shorthand in with/as statements. 

In [None]:
with open(alice_filename, "r") as FN_2187:
    alice_list = FN_2187.readlines()
len(alice_list)



### <br>Exercise: Reading a file

We saved another filename as `dog_filename`. Write a with/as statement to open the file in read mode. Inside the with/as statement, save the file as **one long string** called `dog_string`. Then, outside the with/as statement, print the string.

Write a with/as statement to open the `dog_filename` file in read mode. Inside the with/as statement, save the file as **a list of lines** called `dog_list`. Then, outside the with/as statement, print the list.

<br><br>*Note: there is another file method called `readline()` (without the "s" on the end). It does something different. We will go over that function briefly later in the lesson.*

### 3.4 Writing files

*Remember that when you open a file in write mode, it will first create a new empty file. If you already have a file with the same name, it will empty that file.*

Let's work with our `alice_list`:

In [None]:
for line in alice_list:
    print(line)

<br>Let's open a new file and write the Alice text without those extra empty new lines.

First, we'll save the filename we want for our new file as a string. I often find it helpful to use a pattern for naming my variables for filenames to easily distinguish between input and output files. `_in`, `_out`, `_input`, `_output` are some common tags, but there isn't a Pythonic suggestion.

In [None]:
alice_out = "alice_clean.txt"

Now we will open this output file in **write mode** using a with/as statement. Inside that statement, we will loop through each line of the `alice_list`. **If** the line contains more than just the new line character, we will **write** that line to our file. To write, we use the file method `write()`.

In [None]:
with open(alice_out, "w") as f:
    for line in alice_list:
        if line != "\n":
            f.write(line)

Wait a few seconds and the new file will show up in your file tree.
<br><br>To check the file, we can open it in read mode. We will just print the file inside the with/as statement without even saving it as a string or list:

In [None]:
with open(alice_out, "r") as f:
    print(f.read())

### <br><br>3.5 Turning a file into a clean list of lines

Let's read in the dog file and see what it looks like:

In [None]:
with open(dog_filename, "r") as f:
    print(f.read())

<br>We learned that we can save this text as a list:

In [None]:
with open(dog_filename, "r") as f:
    dog_list = f.readlines()

In [None]:
print(dog_list)

<br>Each item in the list is a string. Most lines end in a new line character, which we would like to remove.

We can combine what we learned today about opening files with what we learned yesterday about making new lists in a for loop with what we learned on Day 1 about string functions.

First, make an empty list. Now, inside the with/as statement, you can loop through the lines in the file and append them to the empty list. But you also need to use a string function to remove the new line characters:

In [None]:
dog_list = []
with open(dog_filename, "r") as f:
    for line in f.readlines():
        dog_list.append(line.rstrip("\n"))
print(dog_list)

<br>A clean list of dogs!

### <br><br>Exercise: Cleaning files as we read them

Make a clean list of dogs from the dog_file that only includes dogs with the word "terrier" in their names. I've pasted the code we wrote to make a list of all dogs. You need to edit this code to add an if statement inside the for loop to only append the terriers.  *Bonus: while you're at it, make the dog names all lowercase.*

In [None]:
dog_list = []
with open(dog_filename, "r") as f:
    for line in f.readlines():
        dog_list.append(line.rstrip("\n"))
print(dog_list)

### <br><br>Exercise: Writing to a file

Let's write our new dog_list (which should only contain terriers) to a file. First, check the list to confirm that it looks okay.

In [None]:
dog_list

Now create the variable `terrier_out` and save the filename you want to use for your list of terriers.

In [None]:
terrier_out = "" #Enter your filename between the quotation marks.

Use a with/as statement to open the `terrier_out` file in **write mode**. Inside the with/as statement, you need to loop through the `dog_list` and write each item to `f`. You will also need to write a new line character after each dog.

### <br><br>3.6 Turning a file into a dictionary

Store the filename that we will be working with in the next few examples:

In [None]:
gradebook_in = "gradebook.csv"

We'll work through this example together. Let's open the gradebook file and see what it looks like:

In [None]:
with open(gradebook_in, "r") as f:
    print(f.read())

<br>**<span style="color:crimson">LOGIC** **Our end goal is to have a dictionary with the student's name as the key and a list of their grades as the values.**

<br>Ok, first let's store it as a list, but we want to leave out the first line of headers. When we call `f.readlines()` it turns the file into a list. We can index a list, so let's take all the lines except the first one:

In [None]:
with open(gradebook_in, "r") as f:
    gradebook = f.readlines()[1:]

Now let's view our new list:

In [None]:
for line in gradebook:
    print(line)

We can see that there are new line characters at the end of each line (because it is printing extra empty lines between the lines). Let's make a note of that.

<br>We can apply what we know about lists and strings to make a list of what we need to code:
- make an empty dictionary
- loop through the gradebook list
- remove the new line characters from the end
- split the line on the commas
- separate the first item to be the key
- store the rest of the items as a list
- assign the key:value pairs to our dictionary

I've put **comments** in the code to remind us of what we need to do. Comments start with a `#` symbol and are ignored by the computer.

In [None]:
#make an empty dictionary
#loop through the gradebook list
#remove the new line characters from the end
#split the line on the commas
#separate the first item to be the key
#store the rest of the items as a list
#assign the key:value pairs to our dictionary

In [None]:
print(grade_dict)

<br>**Question:** We wrote out all the steps and wrote the code with one line per step. This code could be condensed into fewer lines or left how it is - as explicit as possible. **Can you think of ways that the code could be condensed to fewer lines?**

### <br><br>Exercise: Reformatting data as we write it to a file

**<span style="color:crimson">LOGIC** We just created a dictionary, `grade_dict`. The first three items in each value are homework grades. **Write a new file** that includes one complete sentence for each student in the dictionary. The sentence should include the student's name and their three homework grades. A sample sentence is "Mary's homework grades are 10, 7, and 9."

<br>You will need to:
- store a variable that contains the filename of the new file you want to create (I recommend you use the extension ".txt".)
- write a with/as statement to open the new file in write mode
- loop through both the keys and values in the key:value pairs in the `grade_dict` dictionary
- store a string that composes a sentence that uses the key (student's name) and indexes each of the first three items in the value
- add a new line character to the end of the string so that each student will have their own line in the file
- use `f.write()` to write the string to the `f` file object.

### <br><br>3.7 Reading files line by line - demo

Sometimes you might be working with a very large file, with millions of lines, and you don't want to read it all into memory as a string or list.

There is a file method, `readline()`, that reads in only one line at a time. **I don't expect you to practice this method here, but I will give an example, so that you know it exists if you ever need to look it up.**

Let's imagine that there are millions of types of dogs (if only!) and our dogs.txt file is millions of lines long. We can use readline to loop through it and only store the dogs that we need for this notebook or script. This doesn't work the same way as `readlines()` because `readlines()` is a list and `readline()` is the string of only the first line. We need to use a while loop, which is something we aren't learning this week:

In [None]:
dog_file = "dogs.txt"

In [None]:
hounds = []
with open(dog_file, "r") as f:
    line = f.readline()
    while line:
        if "hound" in line:
            hounds.append(line.rstrip("\n").lower())
        line = f.readline()

In [None]:
print(hounds)