## This will walk through some of the most common path methods used with the pathlib.Path object

Import Standard Libraries for this Notebook

In [1]:
from pathlib import Path
import shutil
import datetime as dt

Create a full path directory from scratch.
Notice the individual parts are individual strings, not list or tuple.

In [2]:
full_path = Path("/one", "two", "three", "four")
print(full_path)
print(type(full_path))

/one/two/three/four
<class 'pathlib.PosixPath'>


Some libraries will work with a pathlib.PosixPath, some require strings.
Convert if the library needs strings.

In [3]:
full_path_as_str = str(full_path)

print(type(full_path))
print(type(full_path_as_str))

<class 'pathlib.PosixPath'>
<class 'str'>


Get the current work directory

In [4]:
current_dir = Path.cwd()

current_dir

PosixPath('/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff')

Get user's home directory

In [5]:
Path.home()

PosixPath('/Users/kehoe')

Create a new path with an additional directory and a filename. Once we have the Path we can use
pathlib method calls to return just the parts we want without changing the saved Path.

In [6]:
fl = current_dir.joinpath("new_dir", "my_awesome_file.py")
print("fl:    ", fl)
print("parent:", fl.parent)  # Return just the path
print("name:  ", fl.name)    # Return just the filename
print("suffix:", fl.suffix)  # return the filename suffix
print("stem:  ", fl.stem)    # return the filename without path or suffix
print("anchor:", fl.anchor)  # return the base of the path

fl:     /Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/new_dir/my_awesome_file.py
parent: /Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/new_dir
name:   my_awesome_file.py
suffix: .py
stem:   my_awesome_file
anchor: /


## Let's play with creating new directories and the directory parts.

Create a new path with additional directories and a file name

In [7]:
fl = current_dir.joinpath("new_dir", "second_dir", "my_awesome_file.py")

Take a full path and split into parts.

In [8]:
fl_pts = fl.parts

print("\nparts:      ", fl_pts)
print("type of parts:", type(fl_pts))
print("parts as list:", list(fl_pts))


parts:       ('/', 'Users', 'kehoe', 'Git_area', 'AtmosphericPythonCourse', 'cool_stuff', 'new_dir', 'second_dir', 'my_awesome_file.py')
type of parts: <class 'tuple'>
parts as list: ['/', 'Users', 'kehoe', 'Git_area', 'AtmosphericPythonCourse', 'cool_stuff', 'new_dir', 'second_dir', 'my_awesome_file.py']


Now trim the path by using the parts and array indexing.

If we use the * character at the start of the tuple it will unpack the tuple into individual parts to work with pathlib. If not, pathlib will not work. Think of the * as returning individual strings.

In [9]:
print(Path(*fl_pts[:-3]))
#print(Path(fl_pts[:-3]))

/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff


The alternative is to string together multiple parents methods. This is good when you know ahead of time how many levels down to go, but does not work when we don't know before compiling the program.

In [10]:
print(fl.parent.parent.parent)

/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff


## Let's play with getting a list of files from a directory

In [11]:
current_dir = Path.cwd()  # Get current directory
print("current_dir:", current_dir)

current_dir: /Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff


Search for all files that end in .py in the directory. Notice that all_py_files is returned as a generator not a list.

In [12]:
all_py_files = current_dir.glob("*.py")
print("\nall_py_files:", all_py_files)


all_py_files: <generator object Path.glob at 0x10662d9e0>


Convert the generator to a list so we can see it. Notice how the individual parts are still pathlib objects, not strings.

In [13]:
all_py_files = list(all_py_files)
all_py_files

[PosixPath('/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/multiprocessing_utility.py'),
 PosixPath('/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/data_multiprocessing.py')]

If we wanted to convert all individual pathlib objects to strings, we can use a for loop or (even better) a list comprehension.

In [14]:
conv_files = [str(x) for x in all_py_files]
print(conv_files)

['/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/multiprocessing_utility.py', '/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/data_multiprocessing.py']


Print the first file from the pathlib list without the full path directory list.

In [15]:
all_py_files[0].name

'multiprocessing_utility.py'

## Let's play with making directories and files then renaming, deleting files and directories.

Make a new path using current directory as base.

In [16]:
current_dir = Path.cwd()
new_dirs = current_dir.joinpath("one", "two")

Actually create the new directory, using the parents keyword to make multiple directories all at once. The exists_ok keyword allows the directory to exist, but be ignored and not throw an exception.

In [17]:
new_dirs.mkdir(parents=True, exist_ok=True)

Add a third level directory. Use exist_ok to do nothing if the directory already exists. Otherwise there will be an error.

In [18]:
new_dirs = new_dirs.joinpath("three")
new_dirs.mkdir(exist_ok=True)

Create a new pathlib with a filename.

In [19]:
new_file = new_dirs.joinpath("file.txt")

In [20]:
new_file.write_text("Hello world")

11

Read the file and print.

In [21]:
new_file.read_text()

'Hello world'

Change the pathlib suffix from .txt to .text_name. This changes the filename in Python memory but not the actual filename on disk.

In [22]:
new_file.with_suffix(".text_name")

PosixPath('/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/one/two/three/file.text_name')

Change the full file name from file.text_name to a_new_file.txt

In [23]:
new_file.with_name("a_new_file.txt")

PosixPath('/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/one/two/three/a_new_file.txt')

Create a new pathlib path one directory down from the previously made directory pathlib.

In [24]:
new_dirs2 = new_dirs.parent
print('new_dirs2:', new_dirs2)
new_file2 = new_dirs2.joinpath("greatest_file_ever.csv")

new_dirs2: /Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/one/two


Create the file, with nothing inside.

In [25]:
new_file2.touch()

Use the pathlib path to extract the directory path and create a new full path filename.

In [26]:
new_file3 = new_file2.parent.joinpath("good_not_great_file.csv")

Use this with a new name to rename the file.

In [27]:
new_file2.replace(new_file3)

PosixPath('/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/one/two/good_not_great_file.csv')

Delete a file. Note, it will not ask if you are sure, so be careful.

In [28]:
new_file.unlink()

Delete a directory. The directory needs to be empty. It will not ask if you are deleting the correct directory, so be careful.

In [29]:
new_dirs.rmdir()

Use the current directory to get path to base of our new directory tree.

In [30]:
rm_dir = Path.cwd().joinpath("one")
rm_dir

PosixPath('/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/one')

Use a different library to delete a directory with something in it. Be VERY CAREFUL with this!!!!! This will remove the directory even if it contains files and other directories, and it will not ask. You had better know what you are doing to use this.

In [31]:
shutil.rmtree(rm_dir)

## Let's play with metadata about the file.

Create pathlib path to this file in current directory.

In [32]:
new_file = Path("path_stuff.ipynb")

Get information about the file

In [33]:
stats = new_file.stat()

stats

os.stat_result(st_mode=33188, st_ino=3463036, st_dev=16777233, st_nlink=1, st_uid=503, st_gid=20, st_size=26872, st_atime=1696351701, st_mtime=1696351821, st_ctime=1696351821)

Print size of file in bytes

In [34]:
stats.st_size

26872

Print the three different times associated with the file

In [35]:
print("st_ctime:", dt.datetime.utcfromtimestamp(stats.st_ctime))
print("st_mtime:", dt.datetime.utcfromtimestamp(stats.st_mtime))
print("st_atime:", dt.datetime.utcfromtimestamp(stats.st_atime))

st_ctime: 2023-10-03 16:50:21.533723
st_mtime: 2023-10-03 16:50:21.533723
st_atime: 2023-10-03 16:48:21.492500


## Play with some more advanced Pathlib stuff to find specific files.

Normal creation of a path

In [36]:
data_path = Path("..", "data", "sgpmetE13.b1")
data_path

PosixPath('../data/sgpmetE13.b1')

Using a special "/" operator. This will add directories or files to an existing path. It is actually OS independent and will work on Windows even if the path separator is not that character.

In [37]:
data_path = Path.cwd() / ".." / "data" / "sgpmetE13.b1"
data_path

PosixPath('/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/../data/sgpmetE13.b1')

Use glob method to search the directory for files ending in '.cdf'

In [38]:
files = data_path.glob("*.cdf")

Loop over all the returned files and check if the data is in that pathlib object. We need to perform match on a single pathlib object entry, hence the loop.

In [39]:
for fl in files:
    print(fl, fl.match("*.20191101.*"))

/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/../data/sgpmetE13.b1/sgpmetE13.b1.20191101.000000.cdf True
/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/../data/sgpmetE13.b1/sgpmetE13.b1.20191104.000000.cdf False
/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/../data/sgpmetE13.b1/sgpmetE13.b1.20191103.000000.cdf False
/Users/kehoe/Git_area/AtmosphericPythonCourse/cool_stuff/../data/sgpmetE13.b1/sgpmetE13.b1.20191102.000000.cdf False


## Use the recursive file glob to look for files in a data directory tree.

Make a relative path pathlib to base of data directory

In [40]:
data_path = Path("..", "data")

data_path

PosixPath('../data')

Initiallize an empty list

In [41]:
files = []

Recursively search for a file ending in the old netCDF extension. Extend the existing list by adding on the list of returned matched files.

In [42]:
files_found = data_path.rglob("*.cdf")
files.extend(files_found)

Recursively search for a file ending in new netCDF extension. Extend the existing list by adding on the list of returned matched files.

In [43]:
files.extend(data_path.rglob("*.nc"))

Print number of files found

In [44]:
print("\nlen(files):", len(files))


len(files): 9


Loop over the list, printing each filename. Notice how it prints the file path in a way that looks like a normal string, but remember that it's actually a pathlib object.

In [45]:
for fl in files:
    print(fl)

../data/sgpmetE13.b1/sgpmetE13.b1.20191101.000000.cdf
../data/sgpmetE13.b1/sgpmetE13.b1.20191104.000000.cdf
../data/sgpmetE13.b1/sgpmetE13.b1.20191103.000000.cdf
../data/sgpmetE13.b1/sgpmetE13.b1.20191102.000000.cdf
../data/sgpecorsfE14.b1/sgpecorsfE14.b1.20191101.000000.nc
../data/sgpceilC1.b1/sgpceilC1.b1.20191103.000012.nc
../data/sgpmfrsrC1.b1/sgpmfrsrC1.b1.20190802.000000.nc
../data/sgpmfrsrC1.b1/sgpmfrsrC1.b1.20190803.000000.nc
../data/sgpmfrsrC1.b1/sgpmfrsrC1.b1.20190801.000000.nc
