# Handling files and paths with pathlib

Some parts of this tutorial are inspired by [this](https://realpython.com/python-pathlib/) and [this](https://towardsdatascience.com/10-examples-to-master-python-pathlib-1249cc77de0b) tutorials as well as [this article](https://treyhunner.com/2019/01/no-really-pathlib-is-great/). You can also have a look at [pathlib's documentation](https://docs.python.org/3/library/pathlib.html).


When dealing with experimental data, you will necessarily come to a point where you need to handle input and output files, potentially stored in multiple directories organized hierarchically which can pretty quickly become a true headache. This is especially the case with windows where for instance a double "\\" needs to be used for indicating subdirectories:

```python
# Doesn't work on windows
open("Documents/data/my_data.txt")
# Works on windows
open("Documents\\data\\my_data.txt")
```

Paths were traditionally represented as strings and some file operations could only be performed by combining the use of multiple modules even for simple operations. Take a look at the following example which requires three imports simply to move all `txt` files to an archive directory: 

```python
import glob
import os
import shutil

for file_name in glob.glob('*.txt'):
    new_path = os.path.join('archive', file_name)
    shutil.move(file_name, new_path)
```


To solve this issues and dealing with paths in a much more straightforward and "pythonic" way, the `pathlib` module was introduced with the release of python 3.4 and is now available in the standard library (the modules installed by default with python) since the 3.6 release.


In [1]:
# Import the Path class from pathlib
from pathlib import Path

## Basic path operations

- `Path.home()`
- `Path.cwd()`
- `Path("path/to/something")`
- `Path(...) / "subfolder"`
- `.joinpath(subfolder1, subfolder2...)`
- `.parent`
- `.name`
- `.exists()`
- `.is_file()`
- `.is_dir()`
- `.iterdir()`

### Creating Path objects

In [2]:
# Path objects can be created from a string using the following syntax
Path("/home/fayat/Documents")

PosixPath('/home/fayat/Documents')

In [3]:
# Create a Path object corresponding to the home directory
home_path = Path.home()

home_path

PosixPath('/home/fayat')

In [4]:
# Create a Path object corresponding to the Current Working Directory
cwd_path = Path.cwd()

cwd_path

PosixPath('/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling')

In [31]:
# Create the path to a subfolder of the home directory
# using the division by a string
home_path / "Documents"

PosixPath('/home/fayat/Documents')

In [5]:
# Join the path of multiple folders / files
metadata_path = Path.cwd().joinpath("data", "metadata.csv")
metadata_path

PosixPath('/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/data/metadata.csv')

In [6]:
# Get the parent
metadata_path.parent

PosixPath('/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/data')

### Common path shortcuts

In [7]:
# The home folder is often symbolised using "~"
Path("~")

PosixPath('~')

In [8]:
# The full path can be obtained as follows:
Path("~").expanduser()


PosixPath('/home/fayat')

In [36]:
# The current working directory is often symbolised using "."
Path(".")

PosixPath('.')

In [37]:
# The absolute path can be obtained as follows:
Path(".").absolute()

PosixPath('/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling')

### Extracting information from a path object
The file name and suffix:

In [40]:
metadata_path.name  # Also works for a directory's name

'metadata.csv'

In [41]:
metadata_path.suffix

'.csv'

Making sure that a Path object corresponds to file / folder that exists:

In [15]:
metadata_path = Path.cwd().joinpath("data", "plate_map_OR.csv")

In [16]:
metadata_path.exists()

True

In [17]:
metadata_path.is_dir()  # True if the Path is a directory

False

In [18]:
metadata_path.is_file()  # True if the Path is a file

True

#### Exercise:
Make sure that `test.txt` exists in the current working directory and is a file:

In [21]:
path_to_test = Path.cwd() / "test.txt"
path_to_test.exists()

True

In [22]:
path_to_test.is_file()

True

## Getting an iterator of the content of a directory
Reminder:
An `iterator` is a Python object representing a series of element that can be accessed using this syntax:
```python
for element in iterator:
    ... # do something with element
```
For instance, lists are iterable (meaning their elements can be accessed using an iterator syntax):
```python
for element in ["a", 1, [1, 2, 3]]:
    print(element)
```


In [23]:
# Iterator of the content of the current working directory
for my_path in Path.cwd().iterdir():
    print(my_path)

/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/.ipynb_checkpoints
/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/00-Pathlib-slides.ipynb
/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/00-Pathlib.ipynb
/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/.virtual_documents
/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/01-FluorescenceDataCaseStudy_corrected.ipynb
/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/01-FluorescenceDataCaseStudy.ipynb
/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/test.txt
/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/ressources
/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/data


#### Exercise:
Grab the file names of all csv files in the `data` folder as strings, make sure that they exist before adding them to the `csv_file_all` list:

In [27]:
csv_file_all = []

data_path = Path.cwd() / "data"

for subpath in data_path.iterdir():
    if subpath.exists():
        if subpath.suffix == ".csv":
            csv_file_all.append(subpath.name)

In [28]:
csv_file_all

['fluo_data_B3.csv',
 'fluo_data_B4.csv',
 'fluo_data_A12.csv',
 'fluo_data_D12.csv',
 'fluo_data_B2.csv',
 'fluo_data_C9.csv',
 'fluo_data_D2.csv',
 'fluo_data_D10.csv',
 'fluo_data_C1.csv',
 'fluo_data_E6.csv',
 'fluo_data_H5.csv',
 'fluo_data_E12.csv',
 'fluo_data_G5.csv',
 'fluo_data_F9.csv',
 'fluo_data_H7.csv',
 'fluo_data_A4.csv',
 'fluo_data_F8.csv',
 'fluo_data_D7.csv',
 'fluo_data_E9.csv',
 'fluo_data_A11.csv',
 'fluo_data_G9.csv',
 'fluo_data_D8.csv',
 'fluo_data_E4.csv',
 'fluo_data_G7.csv',
 'plate_map_concentration.csv',
 'fluo_data_E11.csv',
 'fluo_data_A9.csv',
 'fluo_data_A6.csv',
 'fluo_data_C2.csv',
 'fluo_data_G6.csv',
 'fluo_data_G8.csv',
 'fluo_data_F11.csv',
 'fluo_data_H4.csv',
 'fluo_data_G10.csv',
 'fluo_data_B7.csv',
 'fluo_data_D3.csv',
 'fluo_data_C4.csv',
 'fluo_data_H10.csv',
 'fluo_data_B12.csv',
 'fluo_data_B9.csv',
 'fluo_data_F12.csv',
 'fluo_data_H3.csv',
 'fluo_data_H6.csv',
 'fluo_data_C11.csv',
 'fluo_data_B8.csv',
 'fluo_data_E8.csv',
 'fluo_data

## File / folder handling

### Creating a directory

In [None]:
# Use this method to create the directory from the Path object called path
path.mkdir()

In [31]:
new_dir = Path.cwd() / "new_folder"
new_dir.mkdir()

In [51]:
# Reminder: You can access the function's documentation as follows:
?Path.mkdir
# Don't hesitate to consult pathlib's documentation to get more information about its arguments !

[0;31mSignature:[0m [0mPath[0m[0;34m.[0m[0mmkdir[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mmode[0m[0;34m=[0m[0;36m511[0m[0;34m,[0m [0mparents[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m [0mexist_ok[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Create a new directory at this given path.
[0;31mFile:[0m      ~/miniconda3/lib/python3.7/pathlib.py
[0;31mType:[0m      function


### File operation

In [32]:
# File operations (e.g. copying a file) can also be performed
# by combining pathlib's Path object with shutil
import shutil
?shutil.copy


#### Exercise:
Create a `test` directory in the current working directory after making sure that it doesn't already exists. Create a copy of `test.txt` in this newly created directory called `test_copy.txt`. 

In [35]:
file_path = Path.cwd() / "test.txt"
new_dir = Path.cwd() / "test"
new_file_path = new_dir / "test_copy.txt"
shutil.copy(file_path, new_file_path)

PosixPath('/home/fayat/Documents/python_course/PSL_Graduate/Python_Data_Analysis/07-FileHandling/test/test_copy.txt')

## File reading / writing
Files can be created / edited and their content accessed using the `.open()` method.

The most important argument here is `mode` which indicates *how* the file should be accessed, for instance (see [here](https://stackabuse.com/file-handling-in-python/) for more details):

- `mode="r"`: Read-only mode
- `mode="w+"`: Writing and reading
- `mode="a"`: Append new information to a file, a new file is created if one with the same name doesn't exist.



N.B.: You could also simply use `open("path/to/file/as/string")` but as mentioned before, pathlib's Path objects come handy when dealing with file paths.

In [36]:
# You can for instance open test.txt and read its content as follows
file_path = Path.cwd() / "test.txt"

with file_path.open(mode="r") as f:
    # Inside of the with statement, f can be used to interact
    # with the file
    content = f.readlines()

content

['Hi everyone !\n', 'This is a small example text file.']

**Note:** You might be unfamiliar with this `with ... as ...` syntax, it simply guaranties that python will take care of cleanly destroying the f variable after reading the file, alternatively you can use the following code. Note that **bad things can happen if you forget to close your file after interacting with its content** which is why I recommend you to use `with` :

In [58]:
# possible but not recommended
file_path = Path.cwd() / "test.txt"
f = file_path.open(mode="r") # Open the file
content = f.readlines()
f.close()  # WARNING don't forget to close the file
content

['Hi everyone !\n', 'This is a small example text file.']

#### Exercise:
Add a new line to  `test.txt` with the content of your choice. 

As always, don't hesitate to look online for code snippets that could help you (for instance [here](https://stackabuse.com/file-handling-in-python)).

In [38]:
file_path = Path.cwd() / "test.txt"

with file_path.open(mode="a") as f:
    f.write("\nCoucou")

In [39]:
# You can for instance open test.txt and read its content as follows
file_path = Path.cwd() / "test.txt"

with file_path.open(mode="r") as f:
    # Inside of the with statement, f can be used to interact
    # with the file
    content = f.readlines()

content

['Hi everyone !\n', 'This is a small example text file.\n', 'Coucou']