# Python for SysAdmins – Interaction with the file system (2)

## `os.path` alternatives: the `pathlib` and `shutil` modules

The [`pathlib` module](https://docs.python.org/3/library/pathlib.html) works a bit differently than the `os.path` module. While the `os.path` module is purely functional (e.g. the path is a parameter), the `pathlib` transforms a given path into an **object** which then offers a number of **methods** that can _act on that object_.

The [`shutil` module](https://docs.python.org/3/library/shutil.html) offers a lot of high-level file operations for copying and archiving files.

### walk the tree with `pathlib`

As we have seen, recursively search a tree with the `os` module is a bit cumbersome:

In [None]:
import os
for dir_path, dir_names, filenames in os.walk('walk'):
    for filename in filenames:
        print(os.path.join(dir_path, filename))

The `pathlib` module offers a nice alternative to the `os.walk` method. The `rglob()` method takes a string `'*'` which is our filter:

In [None]:
import pathlib

for file in pathlib.Path("walk").rglob('*'):
    if file.is_file():
        print(file)

Do a classic `find . -name "*01"`:

In [None]:
!find walk/ -name "*01"

In Python, using the `rglob` method. `rglob` works like `glob` (see below) with  `**/` added in front of the given relative pattern

In [None]:
import pathlib

for file in pathlib.Path("walk").rglob('*01'):
    if file.is_file():
        print(file)

Recursively **delete the tree** `rm -rf walk`, using the `shutil` module

In [None]:
import shutil

shutil.rmtree("walk")

### match file patterns: `glob` and `rglob`

Globs, also known as glob patterns are patterns that can expand a wildcard pattern into a list of pathnames that match the given pattern. These patterns are related to regular expressions (which are much more powerful) in the sense that a mini-language is introduced to search for certain strings (in our case: filenames). Typical patterns are:

* `?` match any single character
* `*` mactch any number of characters
* `[A-Z]` match any character in the alphabet (character class), uppercase
* `[0-9]` match the numbers 0-9
* `[abc]` match the characters a, b or c.
* `[!def]` match all characters that are _not_ d, e or f

More information: https://en.wikipedia.org/wiki/Glob_(programming)

For **recursive search**, the pattern `**/` is added in front of the pattern. The `rglob("*")` is just a shorthand for `glob("**/*")` 

Match all files starting with a `0`, followed by a single character, an underscore `_`, any number of characters, ending in `.ipynb`

In [None]:
current_dir = pathlib.Path('.')
for file in current_dir.glob("0?_*.ipynb"):
    print(file)

Looking for the same files by using a **character class**:

In [None]:
current_dir = pathlib.Path('.')
for file in current_dir.glob("0[1-9]_*.ipynb"):
    print(file)

same as above, but look **recursively** in all subdirectories

In [None]:
for file in current_dir.glob("**/0?_*.ipynb"):
    print(file)

same, using `rglob` (recursive glob) shorthand instead

In [None]:
for file in current_dir.rglob("0?_*.ipynb"):
    print(file)

### test if a file or path exists

In [None]:
import pathlib
import shutil

file = pathlib.Path('/home/an/unknown/file/somewhere.txt') 

In [None]:
file.exists()

In [None]:
directory = pathlib.Path('.')

In [None]:
directory.exists()

In [None]:
directory.is_dir()

### Concatenate paths, using `/`

In [None]:
my_hello_world = directory / "my_modules" / "my_hello_world.py"

In [None]:
my_hello_world.exists()

the `absolute()` method returns, not surprisingly, the absolute path. Well, not exactly. It returns a `PosixPath` object:

In [None]:
my_hello_world.absolute()

We can get a normal string representation of it:

In [None]:
my_hello_world.absolute().as_posix()

... or a URI represention:

In [None]:
my_hello_world.absolute().as_uri()

### show and change file access flags: `chmod`

In [None]:
file = pathlib.Path('_stat_info_testfile')
file.touch()

In [None]:
oct(file.stat().st_mode & 0o777)

In [None]:
file.chmod(0o600)

In [None]:
oct(file.stat().st_mode & 0o777)

In [None]:
file.unlink()

### change ownership of a file: `chown`

In [None]:
chown_testfile = pathlib.Path('_pathlib_ownership_testfile')
chown_testfile.touch()

In [None]:
print("owner:", chown_testfile.owner())
print("group:", chown_testfile.group())

In [None]:
shutil.chown(path=chown_testfile, group='everyone')

In [None]:
print("owner:", chown_testfile.owner())
print("group:", chown_testfile.group())

In [None]:
chown_testfile.unlink()

### copy files: `cp`

In [None]:
import shutil
import os
source = os.listdir(".")
destination = "backup_folder"

if not os.path.exists(destination):
    os.mkdir(destination)
    
for file in source:
    if file.endswith(".ipynb"):
        shutil.copy(file, destination)

In [None]:
os.listdir(destination)

In [None]:
shutil.rmtree(destination, ignore_errors=True)

### copy a directory recursively

prepare a nested directory...

In [None]:
import os
source_dir = "start/of/some/deeply/nested/directory"
os.makedirs(source_dir)

create the destination directory

In [None]:
destination_dir = "destination_directory"
os.mkdir(destination_dir)

Try to execute the cell above again. What error do you get? How can we avoid the error?

### catch the `FileExistsError`

In [None]:
import os
try:
    os.mkdir(destination_dir)
except FileExistsError:  # catch this specific error
    pass                 # resolve things. In our case: do nothing

now, we recursively copy the `source_dir` to the `destination_dir`:

In [None]:
shutil.copytree(source_dir, destination_dir)

**???**

now you realize, in the Python standard library, there are sometimes **very annoying limitations**. The code below will  work without annoyances, but with Python 3.8 and onward only, otherwise it will complain again `TypeError: copytree() got an unexpected keyword argument 'dirs_exist_ok'`

In [None]:
shutil.copytree(source_dir, destination_dir, dirs_exist_ok=True)

**Conclusion: Google and StackOverflow are your friends.** Don't hesistate to consult them for the (currently) best solution to your problem :)

Of course, there exists a workaround which works nicely and according to the **DWIM** principle: **D**o **W**hat **I** **M**ean

In [None]:
from distutils.dir_util import copy_tree

copy_tree("start", destination_dir)

In [None]:
shutil.rmtree(destination_dir, ignore_errors=True)

In [None]:
shutil.rmtree("start", ignore_errors=True)