# Pathlib Challenges

For these challenges we will be getting familiar with the `pathlib` library.

This topic was inspired by our own Organizer, Chris May. He wrote an article on [getting started with pathlib](https://everydaysuperpowers.dev/articles/stop-working-so-hard-paths-get-started-pathlib/) which also prompted him to write a [field guide](https://everydaysuperpowers.dev/documents/3/ES-Getting_Started_with_Pathlib.pdf) and [cheat sheet](https://everydaysuperpowers.dev/documents/2/pathlib_cheat_sheet-V1_200703.pdf). You can see his other articles and resources at https://everydaysuperpowers.dev. Feel free to look at the resources and the `pathlib` [documentation](https://docs.python.org/3/library/pathlib.html) as they serve as the basis for the exercises below.

There are two kinds of Paths. The documentation states, "*Path classes are divided between pure paths, which provide purely computational operations without I/O, and concrete paths, which inherit from pure paths but also provide I/O operations.*" In other words,

* **`PurePath`**: Performs path operations witout caring about what might actually be on the disk.
* **`Path`**: Allows you to interact with files.

Both `PurePath` and `Path` can be either Windows or Posix paths. Chances are that you will not need to worry about the operating system specific choice since `pathlib` will take care of all the specifics for you!

One common shortcut to get the path of the current file is with the `__file__` variable. Due to how Jupyter Notebooks work, the `__file__` variable is not available, so it is mimmicked in the imports block. If you're not familiar with `__file__`, I suggest you look at [`__name__`](https://docs.python.org/3/library/__main__.html) and how python uses [dunder](https://dbader.org/blog/python-dunder-methods) attributes and methods.

In [1]:
import json
import os
import string
from pathlib import Path
from slugify import slugify
from exercise import setup, cleanup

setup()
__file__ = os.path.join(os.getcwd(), 'Challenge.ipynb')
__file__

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  3.60it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:49<00:00,  4.02it/s]


'C:\\Users\\bcohan\\PycharmProjects\\coding-night\\pathlib\\Challenge.ipynb'

# Where am I?

The Zen of Python states

    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    
So naturally it would follow that there are many ways to access the current directory as a place to start. There are 11 different methods shown below for gettinging the current directory. Take some time and look over how each line works. Some things to note:

* `os.getcwd()` is shown, but only included for comparison purposes.
* In all major operating systems `.` refers to the current directory and `..` referrs to the directory above.
* Note the difference between lines with and without `.resolve()`
* Note how `__file__` requires the `parent` attribute.
* Note the difference between `.parent` and `.parents`.

While these are all interchangable here, they are not always interchangeable. Some will refer to the working directory while others refer to the location of the script. Be careful with which one you choose. 

In [2]:
[
    os.getcwd(),
    Path(os.getcwd()),
    Path.cwd(),
    Path(),
    Path().resolve(),
    Path('.'),
    Path('.').resolve(),
    Path(__file__) / '..',
    (Path(__file__) / '..').resolve(),
    Path(__file__).parent,
    Path(__file__).parents[0],
]

['C:\\Users\\bcohan\\PycharmProjects\\coding-night\\pathlib',
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib'),
 WindowsPath('.'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib'),
 WindowsPath('.'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib/Challenge.ipynb/..'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib')]

Okay, now that you've looked at the different ways of accessing the current directory, use one of them to set a variable `base_dir` to the base of the repo (one parent up)

In [3]:
base_dir = Path().resolve().parent

How many directories are we away from the file system root? What are they?

In [4]:
[*base_dir.resolve().parents]

[WindowsPath('C:/Users/bcohan/PycharmProjects'),
 WindowsPath('C:/Users/bcohan'),
 WindowsPath('C:/Users'),
 WindowsPath('C:/')]

# Who am I?

Often, you might want to access the home directory of the user. Create a variable `home_dir` that refers to the current user's home directory (In windows, this is `C:\Users\<user>`, in Linux it's usually `/home/<user>`). Does the Path look right? If you launched this notebook by clicking the Binder link in the repo, [this](https://github.com/jupyter/docker-stacks/issues/358#issuecomment-288844834) might answer a question you're now asking.

**You should not be hard coding the file path.**

In [5]:
home_dir = Path.home()
home_dir

WindowsPath('C:/Users/bcohan')

# What Have We Got Here?

Now that you know where you are, it's time to see what is there with you.

List all the items in the home directory.

In [6]:
[f for f in home_dir.iterdir()]

[WindowsPath('C:/Users/bcohan/.gitconfig'),
 WindowsPath('C:/Users/bcohan/.ipython'),
 WindowsPath('C:/Users/bcohan/.jupyter'),
 WindowsPath('C:/Users/bcohan/.matplotlib'),
 WindowsPath('C:/Users/bcohan/.PyCharmCE2018.2'),
 WindowsPath('C:/Users/bcohan/.streamlit'),
 WindowsPath('C:/Users/bcohan/AppData'),
 WindowsPath('C:/Users/bcohan/Application Data'),
 WindowsPath('C:/Users/bcohan/Contacts'),
 WindowsPath('C:/Users/bcohan/Cookies'),
 WindowsPath('C:/Users/bcohan/Desktop'),
 WindowsPath('C:/Users/bcohan/Documents'),
 WindowsPath('C:/Users/bcohan/Downloads'),
 WindowsPath('C:/Users/bcohan/Envs'),
 WindowsPath('C:/Users/bcohan/Favorites'),
 WindowsPath('C:/Users/bcohan/Links'),
 WindowsPath('C:/Users/bcohan/Local Settings'),
 WindowsPath('C:/Users/bcohan/Music'),
 WindowsPath('C:/Users/bcohan/My Documents'),
 WindowsPath('C:/Users/bcohan/NetHood'),
 WindowsPath('C:/Users/bcohan/NTUSER.DAT'),
 WindowsPath('C:/Users/bcohan/ntuser.dat.LOG1'),
 WindowsPath('C:/Users/bcohan/ntuser.dat.LOG2

Only list items in the home directory that are files.

In [7]:
[f for f in home_dir.iterdir() if f.is_file()]

[WindowsPath('C:/Users/bcohan/.gitconfig'),
 WindowsPath('C:/Users/bcohan/NTUSER.DAT'),
 WindowsPath('C:/Users/bcohan/ntuser.dat.LOG1'),
 WindowsPath('C:/Users/bcohan/ntuser.dat.LOG2'),
 WindowsPath('C:/Users/bcohan/NTUSER.DAT{b7f33f48-2ee4-11e9-a36b-c1fc09751946}.TM.blf'),
 WindowsPath('C:/Users/bcohan/NTUSER.DAT{b7f33f48-2ee4-11e9-a36b-c1fc09751946}.TMContainer00000000000000000001.regtrans-ms'),
 WindowsPath('C:/Users/bcohan/NTUSER.DAT{b7f33f48-2ee4-11e9-a36b-c1fc09751946}.TMContainer00000000000000000002.regtrans-ms'),
 WindowsPath('C:/Users/bcohan/ntuser.ini'),
 WindowsPath('C:/Users/bcohan/ntuser.pol')]

Only list items in the home directory that are directories.

In [8]:
[d for d in home_dir.iterdir() if d.is_dir()]

[WindowsPath('C:/Users/bcohan/.ipython'),
 WindowsPath('C:/Users/bcohan/.jupyter'),
 WindowsPath('C:/Users/bcohan/.matplotlib'),
 WindowsPath('C:/Users/bcohan/.PyCharmCE2018.2'),
 WindowsPath('C:/Users/bcohan/.streamlit'),
 WindowsPath('C:/Users/bcohan/AppData'),
 WindowsPath('C:/Users/bcohan/Application Data'),
 WindowsPath('C:/Users/bcohan/Contacts'),
 WindowsPath('C:/Users/bcohan/Cookies'),
 WindowsPath('C:/Users/bcohan/Desktop'),
 WindowsPath('C:/Users/bcohan/Documents'),
 WindowsPath('C:/Users/bcohan/Downloads'),
 WindowsPath('C:/Users/bcohan/Envs'),
 WindowsPath('C:/Users/bcohan/Favorites'),
 WindowsPath('C:/Users/bcohan/Links'),
 WindowsPath('C:/Users/bcohan/Local Settings'),
 WindowsPath('C:/Users/bcohan/Music'),
 WindowsPath('C:/Users/bcohan/My Documents'),
 WindowsPath('C:/Users/bcohan/NetHood'),
 WindowsPath('C:/Users/bcohan/OneDrive'),
 WindowsPath('C:/Users/bcohan/OneDrive - GHD'),
 WindowsPath('C:/Users/bcohan/Pictures'),
 WindowsPath('C:/Users/bcohan/PrintHood'),
 Window

Find all the notebooks under `base_dir` (`.ipynb`)

In [9]:
[f for f in base_dir.rglob('*.ipynb')]

[WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib/Answers.ipynb'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib/.ipynb_checkpoints/Answers-checkpoint.ipynb'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/venv/Lib/site-packages/nbconvert/exporters/tests/files/attachment.ipynb'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/venv/Lib/site-packages/nbconvert/exporters/tests/files/notebook2.ipynb'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/venv/Lib/site-packages/nbconvert/exporters/tests/files/pngmetadata.ipynb'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/venv/Lib/site-packages/nbconvert/exporters/tests/files/prompt_numbers.ipynb'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/venv/Lib/site-packages/nbconvert/exporters/tests/files/rawtest.ipynb'),
 WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/venv/Lib/site-packages/nbconvert/exporters/tests/files/svg.ipynb'),
 Win

What versions of python are installed on this machine? (If you're running linux or mac, look for files containing `python` under `/usr/bin`. If using windows, skip this and the next challenge.)

In [10]:
[f for f in Path('/usr/bin').glob('python*')]

[]

Which python files are just symbolic links to actual files?

In [11]:
[f for f in Path('/usr/bin').glob('python*') if f.is_symlink()]

[]

Print the contents of `requirements.txt` under `base_dir`. Do this with one line.

In [12]:
print(base_dir.joinpath('requirements.txt').read_text())

beautifulsoup4>=4.9.1
jupyterlab>=2.1.4
nbstripout>=0.3.8
python-slugify>=4.0.0
requests>=2.23.0
scrapy>=2.1.0
selenium>=3.141.0
tabulate>=0.8.7
tqdm>=4.47.0


# File Properties

What is the largest file in `home_dir`? How big is it in MB?

In [13]:
file_props = [
    {
        'name': f,
        'size': f.stat().st_size,
        'atime': f.stat().st_atime,
        'ctime': f.stat().st_ctime,
        'mtime': f.stat().st_mtime,
    }
    for f in home_dir.iterdir()
]

largest = sorted(file_props, key=lambda f: f['size'])[-1]
f"{largest['name']} is {largest['size'] / 1024 ** 2} MB"

'C:\\Users\\bcohan\\NTUSER.DAT is 8.25 MB'

Which file was most recently accessed?

In [14]:
sorted(file_props, key=lambda f: f['atime'])[-1]['name']

WindowsPath('C:/Users/bcohan/Cookies')

In [15]:
Which file in `home_dir` has gone the longest without being modified?

Object `modified` not found.


In [None]:
Which file in `home_dir` has gone the longest without being modified

Which file in `home_dir` has gone the longest without being modified

In [16]:
sorted(file_props, key=lambda f: f['mtime'])[0]['name']

WindowsPath('C:/Users/bcohan/NetHood')

# Rename Files

The `setup()` function at the top of the notebook downloaded images from the first 10 pages from http://books.toscrape.com/ and compiled the data into a json file.

The json file and images are located in a directory called `books`. List all the images in the directory. (They are all jpg)

In [17]:
[f for f in Path('books').rglob('*.jpg')]

[WindowsPath('books/images/01264865c12ed6d987d6f0858cd1d0ba.jpg'),
 WindowsPath('books/images/01726c619a05114dca75bd840095016d.jpg'),
 WindowsPath('books/images/0237b445efc18c5562355a5a2c40889c.jpg'),
 WindowsPath('books/images/0338682e76bad3216cd4c6c28b2b625a.jpg'),
 WindowsPath('books/images/038650c9e7517b4baf2a423cd8eed38f.jpg'),
 WindowsPath('books/images/03886a8502ca54dbce0d91c2568ab69d.jpg'),
 WindowsPath('books/images/061811c5845d0e13bc04b2a755f0830f.jpg'),
 WindowsPath('books/images/06a6cfcf89afd1601cbba1a16cda57fb.jpg'),
 WindowsPath('books/images/08044269fc197645268a6197c57e6173.jpg'),
 WindowsPath('books/images/084da0199a717cb6c1eda30f98d0ea4c.jpg'),
 WindowsPath('books/images/088995e862aac86c88c608d763f6390e.jpg'),
 WindowsPath('books/images/09a3aef48557576e1a85ba7efea8ecb7.jpg'),
 WindowsPath('books/images/0a1567cd04a6582d333db71337b4e2a6.jpg'),
 WindowsPath('books/images/0bbcd0a6f4bcd81ccb1049a52736406e.jpg'),
 WindowsPath('books/images/0d1f3f934460f5a50aaa8c366641234c.jp

Can you tell the titles of the different books? If not, load the json file and try to rename each file. You may want to use [slugify](https://github.com/un33k/python-slugify) to avoid bad file names.

In [18]:
data = json.loads(list(Path('books').rglob('*.json'))[0].read_text())
mapping = {Path(book['img']).stem: slugify(book['name']) for book in data}

for img in Path('books').rglob('*.jpg'):
    new_name = img.with_name(mapping[img.stem] + img.suffix)
    img.rename(new_name)    

Now check to see that you can figure out which image is which. (copy your code from a few cells up.)

In [19]:
[f for f in Path('books').rglob('*.jpg')]

[WindowsPath('books/images/a-court-of-thorns-and-roses-a-court-of-thorns-and-roses-1.jpg'),
 WindowsPath('books/images/a-fierce-and-subtle-poison.jpg'),
 WindowsPath('books/images/a-flight-of-arrows-the-pathfinders-2.jpg'),
 WindowsPath('books/images/a-light-in-the-attic.jpg'),
 WindowsPath('books/images/a-murder-in-time.jpg'),
 WindowsPath('books/images/a-piece-of-sky-a-grain-of-rice-a-memoir-in-four-meditations.jpg'),
 WindowsPath('books/images/a-world-of-flavor-your-gluten-free-passport.jpg'),
 WindowsPath('books/images/aladdin-and-his-wonderful-lamp.jpg'),
 WindowsPath('books/images/algorithms-to-live-by-the-computer-science-of-human-decisions.jpg'),
 WindowsPath('books/images/america-s-cradle-of-quarterbacks-western-pennsylvania-s-football-factory-from-johnny-unitas-to-joe-montana.jpg'),
 WindowsPath('books/images/avatar-the-last-airbender-smoke-and-shadow-part-3-smoke-and-shadow-3.jpg'),
 WindowsPath('books/images/behind-closed-doors.jpg'),
 WindowsPath('books/images/birdsong-a-s

# Let's Make Something

Under `base_dir`, Create a path to a file called `example.txt`.

In [20]:
file = base_dir / 'example.txt'

Check to see if the file exists

In [21]:
file.exists()

False

Write text to the file using one of the `pathlib` utility methods.

In [22]:
file.write_text('PyRVA')

5

See if the file exits now.

In [23]:
file.exists()

True

Get the text from the file using one of the `pathlib` utility methods.

In [24]:
file.read_text()

'PyRVA'

Under `base_dir`:

* Create a new directory called `new`. 
* In `new`, create 26 sub directories based on the letters of the alphabet (you can use `string.ascii_lowercase` to iterate if you'd like). 
* Under each letter directory, create 10 directories numbered `0` -> `9` (you can use `range(10)` to iterate if you'd like.) 
* In each numbered directory, create a file called `file.txt` and write `PyRVA is Awesome!` to each file.

You should only have one line in the code block that actually creates directories and one line in the code block that creates the file.

What happens if you run the code block twice?

In [25]:
for char in string.ascii_lowercase:
    for i in range(10):
        new = base_dir / 'new' / char / str(i)
        new.mkdir(parents=True)        
        (new / 'file.txt').write_text('PyRVA is Awesome!')

Find all the `file.txt` files under `base_dir` and check that the number matches what you expected (this can be done in one line). If not, you might need to debug something.

In [26]:
len([*base_dir.rglob('file.txt')])

261

In [27]:
[f for f in base_dir.rglob('file.txt') if 'new' not in str(f)]

[WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib/file.txt')]

Now delete the `new` folder.

In [28]:
def rmdir(top: Path):
    for file in top.iterdir():
        if file.is_file():
            file.unlink()
        elif file.is_dir():
            rmdir(file)
    top.rmdir()
            
rmdir(base_dir / 'new')

# Which Is Better?

For these exercises, see if you can figure out the pathlib replacement for the following code blocks. There is a quick reference table at the bottom of the `pathlib` documentation, but I encourage you to read through the documentation without looking at the table. Once you have verified the output to be the same (don't worry about the `PosixPath()` `__repr__` value, as long as the result is the same otherwise, you can use `str()` if you really care about it), use the [`%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) magic method to compare the speeds.

Once you have compared the different methods, think about which of the follow things are important to you:

* Speed of development (characters typed)
* Speed of maintenance (how easy is it to read)
* Speed of execution (how fast the code runs)

Getting the parent of the current folder.

In [29]:
os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

'C:\\Users\\bcohan\\PycharmProjects\\coding-night'

In [30]:
Path(__file__).parents[1]

WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night')

Get the basename of the current file.

In [31]:
os.path.basename(__file__)

'Challenge.ipynb'

In [32]:
Path(__file__).name

'Challenge.ipynb'

Get the basename of the current file without the extension.

In [33]:
os.path.splitext(os.path.basename(__file__))[0]

'Challenge'

In [34]:
Path(__file__).stem

'Challenge'

Get the file extension of the current file.

In [35]:
os.path.splitext(__file__)[1]

'.ipynb'

In [36]:
Path(__file__).suffix

'.ipynb'

Create a `PurePath` with a different name in the same directory (file doesn't have to exist).

In [37]:
os.path.join(os.path.dirname(__file__), 'myfile.txt')

'C:\\Users\\bcohan\\PycharmProjects\\coding-night\\pathlib\\myfile.txt'

In [38]:
Path(__file__).with_name('myfile.txt')

WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib/myfile.txt')

Create a `PurePath` of the same file with a different extension.

In [39]:
os.path.join(os.path.dirname(__file__), os.path.basename(os.path.splitext(__file__)[0]) + '.py')

'C:\\Users\\bcohan\\PycharmProjects\\coding-night\\pathlib\\Challenge.py'

In [40]:
Path(__file__).with_suffix('.py')

WindowsPath('C:/Users/bcohan/PycharmProjects/coding-night/pathlib/Challenge.py')

Get the home directory of the current user.

In [41]:
os.path.expanduser('~')

'C:\\Users\\bcohan'

In [42]:
Path.home()

WindowsPath('C:/Users/bcohan')

# Cleanup

Run the following line to remove files created by this exercise (assuming you followed the suggested file names).

In [43]:
cleanup()