<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/CC_BY.png"><br />

Created by [Nathan Kelber](http://nkelber.com) and Zhuo Chen for [JSTOR Labs](https://labs.jstor.org/) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/)<br />
For questions/comments/improvements, email nathan.kelber@ithaka.org.<br />
___

# Python Intermediate 3

**Description:** This notebook describes how to:
* Use the Python `pathLib` library to manipulate files

**Use Case:** For Learners (Detailed explanation, not ideal for researchers)

**Difficulty:** Intermediate

**Completion Time:** 90 minutes

**Knowledge Required:** 
* Python Basics ([Start Python Basics I](./python-basics-1.ipynb))

**Knowledge Recommended:**
* [Python Intermediate 1](./python-intermediate-1.ipynb)
* [Python Intermediate 2](./python-intermediate-2.ipynb)

**Data Format:** Text (.txt)

**Libraries Used:** `pathlib`

**Research Pipeline:** None
___

In [None]:
# Import pathlib library
from pathlib import Path

In [None]:
### Download Sample Files for this Lesson
import urllib.request
download_urls = [
    'https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/sample.txt'
]

for url in download_urls:
    urllib.request.urlretrieve(url, './data/' + url.rsplit('/', 1)[-1])
    
print('Sample files ready.')

## An Introduction to `pathlib`

Python Intermediate 2 describes the way to open, read, and write files. The built-in module `Pathlib()` is the best way to work with the file system so your code can seamlessly work with files and directories, even across different operating systems. For example, `Pathlib()` can help you accomplish tasks like:

* Find whether a particular file or directory already exists
* Simplify the code for working with files
* Find information about a file, including its extension

## `pathlib` vs. `os`

Why use one or the other?

## Finding and Defining Paths

We can find the current working directory by using the `.cwd()` method. (This is similar to the unix command `pwd` or the windows command line `cd`.)

In [None]:
# Get the current working directory
Path.cwd()

Since Constellate runs on a linux server, we will get an path object that is a PosixPath. The directories will end in forward slashes `/`, whereas on a windows machine a path will use backward slashes, e.g. `C:\Windows\`.

We can create a path object at any time by using an assignment statement and passing a string into the `Path()` function. The path can be absolute, starting at the root of the operating system filesystem, or it can be relative, starting from the current working directory.

In [None]:
# Create a path object based on a string from a relative path
file_path = Path('data/sample.txt')

We have created a path object, not simply a string. The path object has a lot more flexibility than a traditional string (like what is used in the `os` module). It allows us to create code that is easier to adapt for different operating systems since we do not have to be concerned about formatting the string with slashes in the correct direction and other technical issues that diverge from one operating system to another.

In [None]:
# We have created a Path object, not simply a string
type(file_path)

We can also use the `print()` function on the path object.

In [None]:
# Print out the path object
print(file_path)

We can also build a path based on strings separated by slashes. (This is a much more readable way to join paths than the `os` module.)

In [None]:
# Building another path off the current working directory
# Using the slash notation

file_path = Path.cwd() / 'data' / 'sample.txt'
print(file_path)

When we create a path object, there is no check to make sure it points to an actual file or directory. We can check if the path exists with the `.exists()` method.

In [None]:
# Check if the path exists
# Works for files and directories
file_path.exists()

<h3 style="color:red; display:inline">Coding Challenge! &lt; / &gt; </h3>

Create a new path object that points to a filed called `sample.html` in the `data` folder. Confirm the file does not exist using the `.exists()` method.

In [None]:
# Checking if data/sample.html exists


## Checking if a Path Points to an Existing File or Directory

| Method | Effect |
|---|---|
| `.is_file()` | Return a Boolean True/False if the path points at an existing file |
| `.is_dir()` | Return a Boolean True/False if the path points at an existing directory |

In [None]:
# Check whether path is a file
# Returns a Boolean

file_path.is_file()

In [None]:
# Check whether path is a directory
# Returns a Boolean

file_path.is_dir()

## Find the Absolute Path with `.resolve()`

The `.resolve()` method will take a relative path object and create an abolute path object. The absolute path is the full path from the root of the filesystem. On a Mac or Linux, this is simply `/`. On a Windows computer, it is usually `C:\`.

In [None]:
# Getting the full path using .resolve()
# Returns a path object

file_path.resolve()

A path object also has useful attributes. Unlike the methods above which end in parentheses `()`, the attributes do not use parentheses.

| Method | Effect |
|---|---|
| `.parent` | Return a path object for the parent directory |
| `.parents`[x] | Return a path object for parents x generations higher |
| `.name` | Return a string containing the file name with extension |
| `.stem` | Return a string containing the file name without extension |
| `.suffix` | Return a string containing the file extension |


In [None]:
# Get the parent of the path
# Returns a path object

file_path.parent

In [None]:
# Getting even deeper into the path
# Finding the grandparent of the full path using .parent twice
# Returns a path object

full_path = file_path.resolve()
full_path.parent.parent

In [None]:
# Getting even deeper into the path
# Finding the grandparent of the full path using parents with index
# Returns a path object

full_path = path.resolve()
full_path.parents[1]

In [None]:
# Return just the name of the file or folder
# Returns a string
path.name

In [None]:
# Return just the name of the file without extension
# Returns a string

path.stem

In [None]:
# Return just the extension/suffix
# Returns a string

path.suffix

## Creating Files and Directories

To create a new file or directory, first create the desired path object then use the appropriate method:

* `.touch()` will create a new file
* `.mkdir()` will create a new directory

In [None]:
# Create a new file

new_file_path = Path.cwd() / 'data' / 'new_file.txt'
new_file_path.touch()

In [None]:
# Create a new directory

new_dir = Path.cwd() / 'data' / 'examples'

# Create the directory
new_dir.mkdir(exist_ok = True) # The exist_ok = True parameter does not raise errors if directory exists

## Removing Files and Directories

To remove a file or directory, first create the path object then use the appropriate method:

* `.unlink()` will delete a file
* `.rmdir()` will delete a directory


In [None]:
# Remove a file

new_file_path.unlink()

In [None]:
# Remove a directory

new_dir.rmdir()

**Note: There is no method in Pathlib to recursively delete directories. In that case, you may want to import the module `shutil` and use the `.rmtree()` method: `shutil.rmtree(path)`.**

## Rename a File or Directory

To rename a file, you will need two path objects: the original path object and a new path object with the new name. The syntax looks like:

`old_path.rename(new_path)`

In [None]:
# Create an original file for this example

old_path = Path.cwd() / 'data' / 'original_file.txt'
old_path.touch()

In [None]:
# Rename the original file with `.rename()`
# On Windows, if the renamed file already exists an error will occur
# On Unix, if the renamed file already exists the file will be overwritten silently

new_path = Path.cwd() / 'data' / 'renamed_file.txt'
old_path.rename(new_path)

## Open, Read, and Write to Text Files

Path objects work with the context manager `with open`. Instead of passing a string into the `open()` function, we can add the path object onto the front of a `.open()` method.

In [None]:
# Opening the file with a context manager
# and the `.open()` method
# The 'r', read only mode, argument is optional with `.open()`

with file_path.open() as f:
    print(f.read())

<h3 style="color:red; display:inline">Coding Challenge! &lt; / &gt; </h3>

The `sleep()` function in the `time` module makes Python wait a specific number of seconds before processing the next task. Use it for this coding challenge.

Create a path object that points to the file `test.txt`. Open the file using the file object and print each line one at a time. Between printing each line, wait one second.

In [None]:
# Sleep Function example
from time import sleep

print('Ready?')
sleep(3)
print('Go!')

In [None]:
# Print each line out from file
# Waiting 1 second between lines
from time import sleep

path = Path.cwd() / 'data' / 'test.txt'

with path.open() as f:
    for line in f:
        print(line, end='') # Print the line with no line break
        sleep(1) # Wait one second
    

If you are reading a small text file, then there is an even shorter way to read the file using a path object: `.read_text()`. This method opens the file, creates a string from the file object contents, and then closes the file object automatically.

In [None]:
# Using the read_text method
# Returns a string
print(file_path.read_text())

There is also a fast method for writing to a file using a path object: `.write_text()`. This method opens the file object in write mode, writes a string to the file, and then closes it automatically. *Be careful with this method since it will overwrite any existing files!*

In [None]:
# Create a new file

new_file_path = Path.cwd() / 'data' / 'new_file.txt'

# Write to a file
# This overwrites the file if it already exists

new_file_path.write_text('Hello World!')
print(new_file_path.read_text())

<h4 style="color:red; display:inline">Coding Challenge! &lt; / &gt; </h4>

Create a new file and add Shakespeare's Sonnet VI.

```
Then let not winter's ragged hand deface,
In thee thy summer, ere thou be distilled:
Make sweet some vial; treasure thou some place
With beauty's treasure ere it be self-killed.
That use is not forbidden usury,
Which happies those that pay the willing loan;
That's for thy self to breed another thee,
Or ten times happier, be it ten for one;
Ten times thy self were happier than thou art,
If ten of thine ten times refigured thee:
Then what could death do if thou shouldst depart,
Leaving thee living in posterity?
   Be not self-willed, for thou art much too fair
   To be death's conquest and make worms thine heir.
```
   
Then write a program that will read the file and print it line by line. At the beginning of each line, print the appropriate line number.

In [None]:
# Create a file that contains the sonnet string below
# Open the file and print it out line by line
# At the beginning of each line, print the appropriate line number

sonnet_string = """Then let not winter's ragged hand deface,
In thee thy summer, ere thou be distilled:
Make sweet some vial; treasure thou some place
With beauty's treasure ere it be self-killed.
That use is not forbidden usury,
Which happies those that pay the willing loan;
That's for thy self to breed another thee,
Or ten times happier, be it ten for one;
Ten times thy self were happier than thou art,
If ten of thine ten times refigured thee:
Then what could death do if thou shouldst depart,
Leaving thee living in posterity?
   Be not self-willed, for thou art much too fair
   To be death's conquest and make worms thine heir."""


## Gathering a List of Files with Glob
It is common to gather a list of files in a directory (or set of directories) in order to excecute code on each one at a time. 

In [None]:
# Use .iterdir() to iterate over files in a directory

input_dir = Path.cwd() / 'input'
for file in input_dir.iterdir():
    print(file)

In [None]:
# Use .iterdir() to iterate over files in a directory
# Checking for a particular extension
# Only works for a single directory!

for file in input_dir.iterdir():
    if file.suffix == '.txt'
    print(file)

The `.iterdir()` method will work on a single directory, but if you have multiple nested directories then you need to use the `.rglob()` method. 

In [None]:
# Use glob to iterate over all files including subfolders

for file in input_dir.rglob("*.txt"):
    print(file)

___
## Lesson Complete

Congratulations! You have completed *Python Intermediate 3*.


### Exercise Solutions
Here are a few solutions for exercises in this lesson.

In [None]:
# Create a file that contains the sonnet string below
# Open the file and print it out line by line
# At the beginning of each line, print the appropriate line number

sonnet_string = """Then let not winter's ragged hand deface,
In thee thy summer, ere thou be distilled:
Make sweet some vial; treasure thou some place
With beauty's treasure ere it be self-killed.
That use is not forbidden usury,
Which happies those that pay the willing loan;
That's for thy self to breed another thee,
Or ten times happier, be it ten for one;
Ten times thy self were happier than thou art,
If ten of thine ten times refigured thee:
Then what could death do if thou shouldst depart,
Leaving thee living in posterity?
   Be not self-willed, for thou art much too fair
   To be death's conquest and make worms thine heir."""

# Create the file
new_file = Path.cwd() / 'data' / 'sonnet.txt'
new_file.write_text(sonnet_string)

# Read the sonnet line by line
i = 1
with new_file.open() as f:
    for line in f:
        print(i, line, end='')
        i += 1
