# Managing file paths with pathlib

<br>*Reminder: click in a code cell and type shift+return (shift+enter on a PC) to run a code cell.*

<br>To **read** and **write** files, we have to give the computer a **file path** so the computer knows: 
- where to get or put the file
- what the file is called or what it should be called
- what the file extension is or should be

If the file to read is in our **current working directory** (or if we want to write the file to our **current working directory**), our file path can just be the file name and extension - my_file.txt. When working in a Jupyter Notebook like this, the current working directory is the directory where this notebook is saved (though you can change the working directory in your code if you need to).

### <br><br>Saving the file path as a string
You can save a file path as a string.

In [None]:
PC_filepath = "sample_data\anscombe.json"
Mac_filepath = "sample_data/anscombe.json"

<br>Run the cell above to save the file path as a string using both PC and Mac formatting. Then run the two chunks below. **Which code chunk works for you? Let us know in the Zoom chat.**

**Code chunk 1: PC file path**

In [None]:
with open(PC_filepath, "r") as f:
    print(f.read())

<br>**Code chunk 2: Mac and Linux file path**

In [None]:
with open(Mac_filepath, "r") as f:
    print(f.read())

<br>You never know when you'll need to run your code on a different OS, or someone who inherits your code may have a different OS.

#### <br><br>Exercise 1.

Run the cell below to store this file path as a string. It uses PC formatting of backslashes instead of forward slashes.    *Note: PC users also often put spaces in their folder and file names; spaces are a real pain when you work on the command line - stop using them now, please!*

In [None]:
PC_example = "~\data\experiments\October 05 22\first run\results.txt"

<br> Using what you know about string formatting, write code to change the backslashes to forward slashes in the `PC_example` file path so that the code will work on a Mac.

<br>Run the cell below to store this Mac file path as a string. Macs allow you to use a backslash in the file name.

In [None]:
Mac_example = "~/data/experiments/October_5\6_2022/first_run/results.txt"

<br> Write code to make `Mac_example` PC friendly. You'll need to replace the forward slashes with backslashes and you'll need to add the Python escape character in front of the existing backslash so that the PC computer won't interpret it as new directory. 

### <br><br><br>The Path object
A file path can be saved as a string, but it isn't just a string. It has unique properties that make it special - it's an object on its own. The **pathlib** module contains a Python **object** called **Path** that has its own **attributes** and **method functions** that are unique to the Path object. 

<br>*Note: There are other text characters that can be stored as strings but have earned their own status as unique objects, like dates and times in the datetime package or gene sequences in the Biopython package. If you have special objects in your research, you can even define your own object classes - we'll cover that later this quarter.*

<br>There are other packages you can use to get some of the features of pathlib, but pathlib is the best package for working with files and file names.

<br><br>**pathlib** is a built-in Python package, so you don't have to **install** it. This means it is included with every installation of Python, but it isn't automatically loaded into your notebook - you need to **import** it first. To avoid having to type pathlib.Path everytime we want to use the Path object, we're going to do what most Python coders do, and import only the Path object:

In [None]:
from pathlib import Path

<br><br>Before we start using the Path object, there's one more Python trick you should learn. Consider this a bonus lesson! 

#### raw string literal

A raw string literal is a string that *does not interpret any special characters*.

**Regular string**

In [None]:
print("This is line 1.\nThis is line 2.")

**Raw string literal**

In [None]:
print(r"This is line1.\nThis is line 2.")

You tell Python that you're making a raw string literal by putting a lower case `r` directly in front of the opening quotation mark of your string. 

<br><br>The raw string literal comes in handy when writing PC file paths, so that you don't have to worry about those backslashes, which Python interprets as an escape character.

**Regular string**

In [None]:
print("~\data\newspaper_articles\tribune\02031918.pdf")

**Raw string literal**

In [None]:
print(r"~\data\newspaper_articles\tribune\02031918.pdf")

### <br><br><br> Creating a Path from a string
Let's try to read the anscombe.json file again with a Path object instead of a string. The `anscombe.json` file is in the `sample_data` folder. To create a Path object, we will pass a string to the Path function. **Use forward slashes in your string, even if you are on a PC.** If you use forward slashes with the Path function, your path will work on any OS, including your PC. 
<br><br>Don't think of it as "using Mac formatting". If you get in the habit of always using Path in all your Python code (recommended) and using forward slashes, you can think of it just as "using Python formatting". In reality, it's a format called Posix. From Wikipedia: "The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems."

In [None]:
anscombe_path = Path("sample_data/anscombe.json")
anscombe_path

Run the cell above. What did it return to the screen? Copy what it returned and paste it into the Zoom chat.

<br>If you're on a PC, run this code below. (Notice we're using the raw string literal.) What does this one return to the screen? Copy what it returns and paste it into the Zoom chat.

In [None]:
anscombe_PC_path = Path(r"sample_data\anscombe.json")
anscombe_PC_path

<br>Run the code cell below to view the json object.

In [None]:
with open(anscombe_path, "r") as f:
    print(f.read())

#### <br><br>Exercise 2.
There is a csv file in the sample_data folder called `california_housing_test.csv`. Write code to create a Path object for this file. 

In [None]:
cali_path = 

<br>If you created the Path object correctly, the following code should print out the first line of the file. We don't want to print the whole file because it is 3,000 lines long.

In [None]:
with open(cali_path, "r") as f:
    print(f.readline())

### <br><br>Creating a path by joining together multiple parts
We've learned how to solve Problem One from our PowerPoint (Moving code between Operating Systems). Now let's learn how to solve Problem Two (Building file paths to scale up our code). Let's look at the example from the slides and try to handle it in code.
<br><br>pathlib has an easy way to combine different pieces into one path: you **use a forward slash to combine Path objects**.

In [None]:
new_path = Path("/Users/Colby/Documents/Research/Data") / Path("Survey1.txt")
new_path

<br>Now we want to create a path for each of the three Survey files.

In [None]:
survey_files = ["Survey1", "Survey2", "Survey3"]
for i in survey_files:
    new_path = Path("/Users/Colby/Documents/Research/Data") / Path(i + ".txt")
    print(new_path)

<br><br>The Path object also has a method function called `joinpath()`. It is a Path method function, so it always has to go after a Path object.

In [None]:
new_path = Path("/Users/Colby/Documents/Research/Data").joinpath("Survey1.txt")
new_path

In [None]:
survey_files = ["Survey1", "Survey2", "Survey3"]
for i in survey_files:
    new_path = Path("/Users/Colby/Documents/Research/Data").joinpath(i + ".txt")
    print(new_path)

<br>`.joinpath()` can also join together multiple pieces of the path. You pass the multiple arguments to the `.joinpath()` method function, separated by commas. The path will be joined in the order given. You still have to start with a Path object because it is a method function.

In [None]:
survey_files = ["Survey1", "Survey2", "Survey3"]
for i in survey_files:
    new_path = Path("/").joinpath("Users", "Colby", "Documents", "Research", "Data", i + ".txt")
    print(new_path)

#### <br><br>Exercise 3.
Let's try to solve the second example. Look at the diagram and fill in the blank on the PowerPoint slide. This time all our files are called Survey.txt, but they are each within a data folder inside a survey folder. There are three Survey folders - I've started the loop for you. You can use `/` or `joinpath`.

In [None]:
survey_folders = ["Survey1", "Survey2", "Survey3"]
for i in survey_folders:
    

### <br><br>Path attributes
Attributes hold data specific to each object. Like methods, they follow the object, but they do not use parenthases because they are not functions. Path attributes can return the different pieces of a Path. The `.name` attribute gives us the file name with the extension:

In [None]:
my_path = Path("/Users/Colby/Documents/Research/Data/Survey1.txt")

In [None]:
my_path.name

<br>Try out these attributes on the `my_path` object to see what they do:
- .stem
- .anchor
- .suffix
- .parent
- .parent.parent
- .parts

### <br><br>Path methods
The Path object has many other attributes and methods. You can do many things related to opening files inside Python AND related to moving, renaming, and making files and directories in your file system from within Python, without having to leave Python and use the command line. Check out the documentation for pathlib here: https://docs.python.org/3/library/pathlib.html#.
<br><br>Here are a few Path methods that might be of interest:

<br>`.cwd()` will return the file path to your current working directory.

In [None]:
Path.cwd()

<br>`.home()` will return the file path to your home directory.

In [None]:
Path.home()

<br>We can use these methods when building file paths, either using the backslash technique or .joinpath(). To create an absolute path to the anscombe.json file, I can do:

In [None]:
Path.cwd() / Path("sample_data/anscombe.json")

OR

In [None]:
Path.cwd().joinpath("sample_data", "anscombe.json")

<br>`.exists()` will tell us if a path exists in our file system or not.
<br><br>We know that `sample_data/anscombe.json` exists for all of us, and `/Users/Colby/Documents/Research/Data/Survey1.txt` is a made-up path that does not exist for any of us. Let's test this.

In [None]:
Path("sample_data/anscombe.json").exists()

In [None]:
Path("/Users/Colby/Documents/Research/Data/Survey1.txt").exists()

#### <br><br>Exercise 5.
Use Path.cwd() to create an absolute file path to the california_housing_test.csv file. Then check to see if that file exists on your laptop or on Google Colab.

<br><br><br>`.with_stem()`, `with_name()`, and `with_suffix()`

These methods allow you to change the stem, name, or suffix of your Path object!

In [None]:
base_path = Path("/Users/Colby/Documents/Research/Data/some_file.txt")
survey_files = ["Survey1", "Survey2", "Survey3"]
for i in survey_files:
    print(base_path.with_stem(i))