In [1]:
from IPython.display import HTML
css_folder = '.folder {color: #3182bd; font-weight: bold;}'
css_md = '.jp-MarkdownOutput {width: 50%; margin-right: 20%; line-height: 1.75rem; font-size: 1rem;}'
HTML('<style>' + css_folder + css_md + '</style>')

In [None]:
# Dont forget to run this cell!

import os
import pandas as pd

# Understanding Relative Filepaths in Jupyter

When you open a Jupyter Notebook, Jupyter creates a kernel running from the folder that contains the notebook. From then, it will always assume that every file you need to do your analysis is somewhere inside that folder, which is known as the **current working directory** (cwd).

The `os` library gives us the ability to check where Jupyter believes that folder is:

In [None]:
print(os.getcwd()) # if you run this code, Python will print the location of the file you're reading right now!

***

## How does this impact how I load datasets?

Python will assume that any path you give it is **relative** to the current working directory.

To demonstrate, let's create a string representing a **relative path**, pointing to the Titanic training dataset:

In [None]:
filepath = 'data/titanic_train.csv' # this is the location of the file we want to load, relative to the CWD

If we pass this string to `pd.read_csv()`, Python should load the Titanic dataset, as long as it can find a folder called `"data"` and a file called `"titanic_train.csv"` inside it, all contained within the current working directory.

In [None]:
pd.read_csv(filepath).head() # our relative path

## How does this work?

Let's look at where Python assumed where our dataset was, as an **absolute path**. 

- An absolute path is one that points to the *exact* location of a folder or file on our system, relative to the disk. 
- If you're working on Windows, an abosolute path will probably begin with "C:\\".


All we have to do is paste the current working directory path and the relative path we used above together, with an extra `/` in the middle:

In [None]:
filepath_with_cwd = os.getcwd() + '/' + filepath
print(filepath_with_cwd)

We can prove that this is what Python is doing with relative paths by loading the dataset using our new absolute path:

In [None]:
pd.read_csv(filepath_with_cwd).head() # an absolute path we created with os.getcwd() and our relative path

Both uses of `pd.read_csv()` in this notebook so far should have exactly the same output.

***

## Where might I go wrong?

Beginners will often assume that you need to give a more specific path than is actually necessary. For example:

- The 'Documents' folder has special significance for Windows users and you're probably storing your project notebooks and data somewhere inside it, so you might intuitively expect your path needs to begin with `"Documents/"`. 

- Similarly, if you use the 'Copy Path' feature in Jupyter Lab's file browser, you'll often get a longer path than you need.

### An Example

Let's see what happens when you use a file path that includes too much information. We'll create a path that won't work and store it as `broken_filepath`:

In [None]:
 # an example of where you might keep this project's data in your Username's `Documents` folder:
broken_filepath = 'Documents/projects/titanic/data/titanic_train.csv'

- Remeber, Jupyter assumes you want to run your code from the directory the notebook is located in, so it sets that folder as the **current working directory**.
- Therefore, Python will assume that any paths you give it are relative

#### Why it breaks

Let's use string concatenation again to show what Python is doing:

In [None]:
broken_filepath_with_cwd = os.getcwd() + '/' + broken_filepath

print(broken_filepath_with_cwd)

- Notice that there's some duplication in this output - `titanic/` appears twice - once from the working directory path, and once from the path we gave. This is a good clue we've done something wrong.

- If we try to use this `broken_filepath` in `pandas.read_csv()` we get an error, because it will assume it should look for that path inside the current working directory.

#### Error messages

Let's try running `pd.read_csv()` with our broken file path. 

In [None]:
pd.read_csv(broken_filepath) # this will throw an error

- Python gave us a lot more information in the error message than we actually need to understand what went wrong
- Try not to be intimidated
    - But, if you are, remember that it's **both ok and normal!**
    - Error messages are rarely written for us users
    - Instead, they're for the developers who wrote the program
- Just scroll to the last line of the error message.
- At the last line, it should say: `FileNotFoundError: [Errno 2] No such file or directory:`, followed by the path we gave Python.

This is telling us that `'Documents/projects/titanic/data/titanic_train.csv'` could not be found inside the current working directory!

***

## How fix to errors

Now let's imagine you're working on a project at work, and you got the `FileNotFoundError: [Errno 2] No such file or directory:` error. 

How do we fix this?

Some simple checks will normally help you understand the problem:

1. Check where your data are stored! Are are the files somewhere inside your current working directory, or elsewhere?

2. Use `os.getcwd()` to check where your notebook is running from (the **current working directory**)

3. Is the path you gave Python **relative** to your current working directory?

4. Have you spelled all components of the path correctly? Look out for spaces and special characters! Python is also case-sensitve, remember!

5. Try using string concatenation like we used above - e.g. `os.getcwd() + '/' + your_path`. Look for duplication in the new path, and try loading your dataset with it.


***

## Avoiding the problem: some good practices for project organisation

### Structure

Keep each project in its own folder, and store your data for each project in either the project's folder, or in a subfolder of that folder - ideally called `data`.

This not only limits your chances of getting this error, but it also makes it much easier to share your work with colleagues who might want to run your code too!

For example, your Documents folder could look like this (folders shown in <span class='folder'>blue</span>):


- <span class='folder'>Documents</span>
    - <span class='folder'>data-projects</span>
        - <span class='folder'>2020-customers</span>
            - 2020-customers-eda.ipynb
            - 2020-customers-model.ipynb
            - <span class='folder'>data</span>
                - customers.csv
        - <span class='folder'>2021-competitors</span>
            - 2021-competitors-eda.ipynb
            - 2021-competitors-report.ipynb
            - <span class='folder'>data</span>
                - competitors.csv
                
### Names

For naming files and folders, you'll need to follow any standards your workplaces requires of you first and foremost. General good practice, however, includes:

- Avoid spaces - en-dashes `-` and underscores `_` are normally better
- Avoid characters that are not an en-dash, underscore, a number, or a letter
- The more you keep to the ASCII character set (https://en.wikipedia.org/wiki/ASCII), the fewer problems you'll have
- If you need to include a date, put it at the start of the name and use the YYYY-MM-DD format (ISO 8601 - https://en.wikipedia.org/wiki/ISO_8601)

I also like to stick exclusively to lower case letters because:

- it makes paths easier to type (no extra mental effort to check spelling, and fewer key presses!) 
- it's far easier to use lower case consistently, than to establish an organisation-level or team-level standard for mixing lower and upper case.

***

## Exercises

### 1. Load the recipes dataset

Use a **relative path** to load the *titanic_test.csv* dataset contained within the *data/* folder included with this notebook. You'll need `pd.read_csv()` to do it!

In [None]:
# Write your code here!

<details>
    <summary>Click here for the answer!</summary>
    <pre>pd.read_csv("data/titanic_test.csv")</pre>
</details>

### 2. Quiz

Try to answer the following questions. Click on the arrows to reveal the answers!

<details>
    <summary><code>"C:\Users\MyUsername\Documents\sales.csv"</code> - <strong>Is this a relative or absolute path? Why?**<strong></summary>
    <p>Absolute - it gives the exact location of the file, starting with the disk name</p>
</details>

<details>
    <summary><code>"Documents/data-analysis/expenditure.csv"</code> - <strong>Is this a relative or absolute path? Will it work?</strong></summary>
    <p>It depends. If we're trying to load this from a notebook located in the folder <strong>above</strong> Documents, it's a working relative path.</p>
    <p>If, however, that's not the case, then it's broken.</p>
    <p>This path looks a bit like an absolute one, but it's incomplete because it doesn't include <code>"C:\Users\MyUsername\"</code>.</p>
</details>

### 3. Challenge: create a new project folder and try to load a dataset

1. Open your system file explorer and navigate to your "Documents" folder

2. Anywhere you like inside this folder (either in "Documents" or in a sub-folder of it), create a new folder called "relative-paths-challenge".

3. Find a data file you'd like practice opening and put it inside this "relative-paths-challenge" folder. It can be any file type you like, as long as you know which Python function you need to open it. 
    - CSV is probably the one you're most familiar with at this point, but feel free to stretch yourself


4. Use Jupyter to create and open a new notebook
    - Pay attention to where you create the notebook! If you go straight to `File > New > Notebook`, it'll probably be created in the same place as the notebook you're reading now.
    - Instead, use Jupyter Lab's file browser to navigate to your "relative-paths-challenge" folder before creating your new notebook to make sure it's in the same place as your data.


5. In your new notebook, use Python to load your dataset:

    1. Using a relative path
    2. Using an absolute path (you can use string concatenation or type it out by hand)


6. Did it work? Use the hints in this notebook to help at first, but ask if you're really stuck!

### 4. Reflection

Take 5 minutes to reflect on these points. Maybe you'll think of something you can put in your portfolio!

- Have you encountered a problem with loading a file before? How did you solve it?

- Have you been using good project structure and file naming practices at work? Does your organisation have a standard for this? If not, could you propose one?