# Practical Python – Learning useful python skills

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Files" data-toc-modified-id="Files-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Files</a></span><ul class="toc-item"><li><span><a href="#Paths" data-toc-modified-id="Paths-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Paths</a></span><ul class="toc-item"><li><span><a href="#Absolute-paths" data-toc-modified-id="Absolute-paths-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Absolute paths</a></span></li><li><span><a href="#The-pathlib-module" data-toc-modified-id="The-pathlib-module-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>The <code>pathlib</code> module</a></span><ul class="toc-item"><li><span><a href="#Path-objects'-special-syntax" data-toc-modified-id="Path-objects'-special-syntax-1.1.2.1"><span class="toc-item-num">1.1.2.1&nbsp;&nbsp;</span>Path objects' special syntax</a></span></li></ul></li><li><span><a href="#Relative-paths" data-toc-modified-id="Relative-paths-1.1.3"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Relative paths</a></span></li></ul></li><li><span><a href="#Managing-files-and-folders" data-toc-modified-id="Managing-files-and-folders-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Managing files and folders</a></span><ul class="toc-item"><li><span><a href="#Reading-files" data-toc-modified-id="Reading-files-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Reading files</a></span><ul class="toc-item"><li><span><a href="#Reading-binary-files" data-toc-modified-id="Reading-binary-files-1.2.1.1"><span class="toc-item-num">1.2.1.1&nbsp;&nbsp;</span>Reading binary files</a></span></li></ul></li><li><span><a href="#Writing-files-and-creating-directories" data-toc-modified-id="Writing-files-and-creating-directories-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Writing files and creating directories</a></span><ul class="toc-item"><li><span><a href="#Creating-files-with-the-pathlib-module" data-toc-modified-id="Creating-files-with-the-pathlib-module-1.2.2.1"><span class="toc-item-num">1.2.2.1&nbsp;&nbsp;</span>Creating files with the <code>pathlib</code> module</a></span></li></ul></li><li><span><a href="#Exercise-–-file-creator-function" data-toc-modified-id="Exercise-–-file-creator-function-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Exercise – file creator function</a></span></li><li><span><a href="#Deleting-files-and-folders" data-toc-modified-id="Deleting-files-and-folders-1.2.4"><span class="toc-item-num">1.2.4&nbsp;&nbsp;</span>Deleting files and folders</a></span></li><li><span><a href="#Copying-and-moving-files" data-toc-modified-id="Copying-and-moving-files-1.2.5"><span class="toc-item-num">1.2.5&nbsp;&nbsp;</span>Copying and moving files</a></span></li><li><span><a href="#Looping-over-files" data-toc-modified-id="Looping-over-files-1.2.6"><span class="toc-item-num">1.2.6&nbsp;&nbsp;</span>Looping over files</a></span></li></ul></li></ul></li></ul></div>

**Introduction**

If you've made it this far, congratulations! You have learned most of the basics in the Python programming language! The first two notebooks in this course focuses mainly on understanding the foundations of Python. This notebook is where we start learning more of how to use Python to do useful things. 

So what's useful? Consider the following: you work at some company. You have just been handed a USB stick with ten thousand pdf files. These files have a date in their names, such as `rec20200414.pdf` (i.e. April 14, 2020) . The files are photocopies of receipts. On each receipt there is a name of the employed person at your company that has created the file. 

Your task is to see:

- Who are the employed that have created all the receipts?
- How many receipts are created per employée?
- How many receipts are there per employée, per month?

Now, we _could_ do this by hand. Reading all files ourselves and typing the information by hand into an Excel spreadsheet. But this would, firstly, take weeks. Secondly, be incredably boring. And, thirdly, probably (because the assignement's been so tedious and boring) be riddled with errors.

However, this is actually a perfect example of a task that is easily solved with Python! Here's a task list of what we could do: 
- Create a list of all the files
- Create a script that extract the name from a pdf receipt
- Loop over all files 

And on each file in our loop:
1. Run our script on each one of the pdf files
2. Extract date from file name
3. Save the name and the date into a data structure

In this course, you will learn to do all task in above task list! But let's take it step by step, and start with learning about filepaths. Then how to create, remove and alter files on our computer using python. We will also learn the basics of regular expressions, or text recognition. And finally, some webscraping basics.

## Files

All files on your computer have these three componants: a name (this notebook, for instance, is called `continuation_course`), a file extension (Jupyter Notebooks have the extension `.ipynb`, textfiles `.txt`, Microsoft Word has `.doc`, etc etc…), and a path – the file's location on your harddrive.

### Paths

As mentioned above, all files have a path. This lets us know where it is located on the computer's hard drive. All folders – also called directories – also have paths. There are two kinds of paths: **relative** and **absolute**. We'll start with the absolute, and continue with the relative further down. 

#### Absolute paths

An absolute path is a file's location in relation to your hard drive's root folder. The root folder – or **root directory** –  is the top level of your hard drive. All files and folders are inside this directory. 

If you use Windows, you probably recognise your root directory when you see it. It's the `C:\` when you check your hard drive folders. See this picture:

![image](course_material/windows_root.png)

On Mac/Linux, the root directory is just named `/`. Pretty lame, but there you go.

Another important difference is the **directory separator**, the slash. On Windows, this is a backslash `\`, while on Mac/Linux, it's just a regular slash `/`.

This notebook you're running, is located somewhere on your computer's hard drive. This location can be found when checking the file's absolute path. Or to be more precise: we can check where this notebook "lives" within your root directory! We could use the `os` module to check the absolute path:

In [52]:
import os

In [53]:
os.getcwd()

'/Users/johekm/Documents/lectures/learning_python'

So for me, this notebook has this path: `'/Users/johekm/Documents/lectures/learning_python'`. It lives in the folder `learning_python`, which is in the folder `lectures`, which is in `Documents`, which is in my user profile `johekm`, and so forth all the way to the root directory. 

This notebook's name is `continuation_course.ipynb`, which means that its absolute path is:

`'/Users/johekm/Documents/lectures/learning_python'/continuation_course.ipynb'`

If this was a Windows laptop, it would look something like:

`'C:\Users\johekm\Documents\lectures\learning_python\continuation_course.ipynb'`



All absolute paths have the root directory to the left. That is, **absolute paths always starts with the root directory**. The directory or the file we're looking for is furthest to the right in the path. So in above examples, we point towards the file `continuation_course.ipynb`, since it is to the right in the path. 

We can also use the `.isabs()` method on our path. It takes a string value and sees if it's a absolute path on our computer:

In [54]:
os.path.isabs('/Users/johekm/Documents/lectures/learning_python/continuation_course.ipynb')

True

As you can see, the `os` module takes string values as arguments. It also returns paths as string values:

In [55]:
type(os.getcwd())

str

This means that we can use the `.join()` method on our directory separator character (`/` on Mac/Linux, `\` on Windows), to get strings with paths! So on on my current running absolute path I could do this:

In [58]:
my_path = ['','Users','johekm','Documents','lectures','learning_python','continuation_course.ipynb']

In [59]:
my_path

['',
 'Users',
 'johekm',
 'Documents',
 'lectures',
 'learning_python',
 'continuation_course.ipynb']

In [60]:
my_path = '/'.join(my_path) # see section 9.9.2 if you want a refreasher!

In [61]:
my_path

'/Users/johekm/Documents/lectures/learning_python/continuation_course.ipynb'

In [62]:
os.path.isabs(my_path)

True

We can use the `.listdir()` method to get a list of all files in a directory. We just pass a path to the method as an argument, and it returns a list. Let's try it on the `course_material` folder:

In [90]:
my_path = '/Users/johekm/Documents/lectures/learning_python/course_material'

In [91]:
os.listdir(my_path)

['mutable_scope.png',
 '.DS_Store',
 'while_loop.png',
 'scopes.png',
 'readme',
 'windows_root.png',
 'immutable_1.png',
 'interrupt.png',
 'immutable_3.png',
 'speach.txt',
 'immutable_2.png',
 'if_statement.png',
 'mutable.png',
 'mutable_2.png']

#### The `pathlib` module

**CAUTION!** It is common practice to use string values when working with file paths. But as you can see from this example above, this code wouldn't work on Windows, since that path syntax requires the backslash `\`.

So instead, we're gonna use the `pathlib` module. This works on all operating systems since the Python interpreter converts all paths into whatever syntax your computer uses! Let's import the `Path` class from the `pathlib` module:

In [63]:
from pathlib import Path

The `Path` class has a method called `.cwd()` ("current working directory") that returns the absolute path of your current "position" on your harddrive. Let's have a look at the current working path using this method:

In [64]:
Path.cwd()

PosixPath('/Users/johekm/Documents/lectures/learning_python')

As you can see, the value returned isn't a string. It is a path object:

In [41]:
type(Path.cwd())

pathlib.PosixPath

This path object actually differs depending on what operating system you're using. For me, using mac, it is a `PosixPath` object. If you're using Windows, it should be a `WindowsPath` object. But the name is not important. Just know a path object is a way to help us construct paths in a very convenient way! 

We can pass any string to the Path class to convert it into a path object:

In [68]:
Path("Johan")

PosixPath('Johan')

The `os` module can read path objects, so we can use path objects to check if this path object is an absolute path: 

In [50]:
os.path.isabs(Path("Johan"))

False

"Johan" isn't an absolute path, but our current working directory is:

In [73]:
os.path.isabs(Path.cwd())

True

The `.home()` method returns the home directory on the computer. For me, this is `/Users/johekm`:

In [69]:
Path.home()

PosixPath('/Users/johekm')

Finally, we can always use the `.exists()` method to see if a path exists:

In [263]:
Path("/Users/johekm/Documents/lectures/learning_python/course_material").exists()

True

In [265]:
Path("/Users/johekm/Documents/lectures/learning_python/BANANAS").exists()

False

##### Path objects' special syntax

Path objects can use operators as their own syntax. This means that I can use the `.home()` method and then construct a path I now will work on whatever operating system you're now running:

In [181]:
path = Path.home() / "Documents" / "lectures" / "learning_python"
path

PosixPath('/Users/johekm/Documents/lectures/learning_python')

Hang on, what the hell happend?? Why did we just use the division operator together with strings and somehow just magically created a path??

If a line of Python code includes a path object, the `/` will not be read as a division operator by the interpreter, it will be read as a path seperator! It will then reconstruct this entire line into one path object. Above code is the same as typing:

In [178]:
path = Path.home() / Path("Documents") / Path("lectures") / Path("learning_python")
path

PosixPath('/Users/johekm/Documents/lectures/learning_python')

...or:

In [179]:
path = Path.home() / Path("Documents/lectures/learning_python")
path

PosixPath('/Users/johekm/Documents/lectures/learning_python')

...or just:

In [180]:
path = Path.home() / "Documents/lectures/learning_python"
path

PosixPath('/Users/johekm/Documents/lectures/learning_python')

#### Relative paths

A relative path always starts in the current working directory. We use realtive paths to find files and directories in relation to where we are currently situated – where our program currently runs – on our hard drive. 

Above, we listed all files and directories in the folder `course_material`, using the `.listdir()` method. Let's do so again, but with a relative path instead of an absolute one:

In [101]:
os.listdir('course_material')

['mutable_scope.png',
 '.DS_Store',
 'while_loop.png',
 'scopes.png',
 'readme',
 'windows_root.png',
 'immutable_1.png',
 'interrupt.png',
 'immutable_3.png',
 'speach.txt',
 'immutable_2.png',
 'if_statement.png',
 'mutable.png',
 'mutable_2.png']

Since this notebook lives in the same directory as `course_material`. This means the relative path is `'course_material'`. Let's go a bit deeper, there is a folder within `course_material` named `readme`. Let's list its content using a relative path:

In [102]:
os.listdir('course_material/readme/')

['material.png',
 '.DS_Store',
 'navigator.png',
 'duplicate.png',
 'searchbar.png',
 'course_start.png',
 'jupyter.png',
 'create_nb.png',
 'documents.png',
 'launchpad.png']

If we check our absolute path once more, using the `.cwd()` method:

In [103]:
Path.cwd()

PosixPath('/Users/johekm/Documents/lectures/learning_python')

What if we wanted to use a relative path to see what is within the "Documents" folder? This is (on my computer) two "levels" above our working directory. We can type the `..` folder! This isn't a real folder, just a specially named folder to indicate "check one directory level above" – **the parent directory**. 

Uncomment this following code cell and run it, the check to see if the listed files are as you expected. If you placed this course folder in the "Documents" folder on you computer, you should see the contents in your "Documents" folder:

In [113]:
#os.listdir('..')

We can continue using `..` with directory separators to go even further up the directory tree:

In [114]:
#os.listdir("../..")

We can also type a single dot `.`, which indicates _this_ folder. The one we're in. Uncomment to check if it's what you expect it to be on your computer:

In [116]:
#os.listdir(".")

### Managing files and folders

Now that we've had a look at paths, we can use them to create, open, append and erase files! 

Files can be binary files or plaintext files. Binary files consists of a complicated soup of code patterns that is unreadable for humans. Most files you use at your office are probably binary files: such as excel files, pdf documents, etc etc. 

Here, we're going to start with plaintext files. Plaintext means that there are nothing but just raw text in the file. There isn't any other information than the actual text characters within the file. Text files (with the extension `.txt`) is also plaintext.

Let's look for a plaintext file! If we check the file contents in the `course_material` folder, we can see that there are two plaintext files therein. Let's use the `.listdir()` method of the `os` module:

In [201]:
os.listdir('course_material/')

['mutable_scope.png',
 '.DS_Store',
 'while_loop.png',
 'scopes.png',
 'readme',
 'windows_root.png',
 'immutable_1.png',
 'interrupt.png',
 'immutable_3.png',
 'speach.txt',
 'immutable_2.png',
 'if_statement.png',
 'hello.txt',
 'mutable.png',
 'mutable_2.png']

Here we see two text files! "speach.txt" and "hello.txt". Let's start with the latter and read its content.

#### Reading files

We can open files with the built-in `open()` function. It has two crucial arguments (it has way more that we will ignore at the moment). First, a _filepath_ that points to the file we want to open (including the filename). 

Second, we pass a string that determines _how_ to open the file. Default is to open in "read" mode, which opens the file, but hinders us from changing its content. Let's open the file "hello.txt" in the `course_material` directory:

In [159]:
file = open("course_material/hello.txt","r")

The `open()` function returns a file object, so we save that to a `file` variable! Let's have a look at our file object:

In [160]:
file

<_io.TextIOWrapper name='course_material/hello.txt' mode='r' encoding='UTF-8'>

Here, we can see that the object is opened in read mode, and that it's encoded in unicode, UTF-8 (=not important at the moment). We can use the `.read()` method to have a look at the file content:

In [161]:
file.read()

'Hello world!\n\nSo happy to see that you guys made it to the continuation course.\nThis is where we start having fun!'

The `.read()` method returns all the file's text as one string. As you can see, the file includes newline characters `\n`. The method `.readlines()` also opens the file's contents, but here, all the file's lines are items organised in a list:

In [165]:
file = open("course_material/hello.txt","r")
file.readlines()

['Hello world!\n',
 '\n',
 'So happy to see that you guys made it to the continuation course.\n',
 'This is where we start having fun!']

When we're done with the file and want to close it, we use the `.close()` method:

In [167]:
file.close()

This means we can't access the file object any longer:

In [168]:
file.read()

ValueError: I/O operation on closed file.

##### Reading binary files

We can also read binary files, but binary content will look like gibberish to a human eye. To read a binary file, we need to pass the argument `"rb"` ("read binary") instead of `"r"` as the second argument of the open function. Let's have a look at an excel file:

In [233]:
excelFile = open('course_material/excelfile.xlsx',"rb")

Let's not open the entire file, just the first 200 characters:

In [234]:
excelFile.read()[:200]

b'PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00\x0c\xeb\xe3\xff[\x01\x00\x00\x88\x04\x00\x00\x13\x00\x08\x02[Content_Types].xml \xa2\x04\x02(\xa0\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

#### Writing files and creating directories

We can pass the string `"w"` as an argument to open in write mode, which lets us change the file's content. If the file our path points to doesn't exist, and we open in write mode, _we will create a file_. Let's try it!

In [191]:
file = open("test.txt","w")

Since we didn't give the `open()` function an absolute path, it took the path and looked for a file named `test.txt` in the current working directory. Since no such file existed, it created on. Have a look in the course folder, there should now be a new text file named "test"!

So, if we open a file in write mode, and no such file exists, we will create a new file. But what happens if we try to open a file that doesn't exist in read mode?

In [192]:
open("xyz.txt","r")

FileNotFoundError: [Errno 2] No such file or directory: 'xyz.txt'

We get an error!

Ok, so we have created a file, and simultanously opened this file in write mode. Let's check the file object:

In [193]:
file

<_io.TextIOWrapper name='test.txt' mode='w' encoding='UTF-8'>

Since it is in write mode, we can add content to the file! Let's start by creating a string that we'd like to add to our new file:

In [195]:
text = "This is some very exciting and new content going on here!"

Our file object has a method called `.write()` that takes whatever content we want to add to the file as an argument. This will then be written to the file object:

In [196]:
file.write(text)

57

The method returns an integer, in this case 57. It just returns the length of the content we just added:

In [197]:
len(text)

57

We've now added our content, let's close the file:

In [200]:
file.close()

If you now check the text file, you'll see that our text string was added! Yay!

**CAUTION!** If you now open the file in write mode again, you'll see that its content has been erased.

In [216]:
file = open("test.txt","w")
file.close()

In [217]:
file = open("test.txt","r")
file.read()

''

If we want to add content to our file, we can open it in "append mode", using the argument `"a"`. Let's add our text again, and then open the file in append mode:

In [222]:
file = open("test.txt","w")

In [223]:
file.write(text)
file.close()

In [224]:
file = open("test.txt","a") # append mode

In [225]:
new_text = "\nSome new exciting text that we've added!"

In [226]:
file.write(new_text)

41

In [227]:
file.close()

Let's check the files content to see if our new text was added:

In [231]:
file = open('test.txt', "r")

In [232]:
print(file.read())

This is some very exciting and new content going on here!
Some new exciting text that we've added!


It worked!

Using the `os` module, we can also create new directories. The `.mkdir()` takes a path as an argument and creates a new folder:

In [235]:
os.mkdir("fruits")

This _relative path_ created a new directory in the current working directory. You can check to see by yourself in the course folder, or we can use the `pathlib` module to check if this directory exists:

In [240]:
Path("fruits").is_dir()

True

Please note that this method will raise an error if the folder we try to create already exist:

In [241]:
os.mkdir("fruits")

FileExistsError: [Errno 17] File exists: 'fruits'

##### Creating files with the `pathlib` module

The standard procedure to create and alter files is with the `open()` function. Interestingly, the `pathlib` module has this function built in, which makes creating files a bit smoother.

Let's start by creating a path object. We want this path object to include the name of the file we're gonna create. So the path will be the path to where the new file should live, _and_ the name of this new file:

In [246]:
path = Path.cwd() / "fruits" / "new_file.txt"

In [247]:
path

PosixPath('/Users/johekm/Documents/lectures/learning_python/fruits/new_file.txt')

The path object actually has the `open()` function as a method! Let's create a new textfile in the `fruits` folder:

In [248]:
file = path.open("w")

In [249]:
text = "This is a new file created using the pathlib module!"

In [250]:
file.write(text)

52

In [251]:
file.close()

Let's use the same path object to again open the file. But in read mode this time to see if it worked:

In [252]:
file = path.open("r")

In [253]:
file.read()

'This is a new file created using the pathlib module!'

In [254]:
file.close()

Yey! It worked! There actually is a faster way to reading the file. Our path object has the `.read_text()` method:

In [255]:
path.read_text()

'This is a new file created using the pathlib module!'

This way, we got the file's content opened and returned using only 1 line of code! Let's try it on another file, this time without saving the path to a variable:

In [256]:
Path("test.txt").read_text()

"This is some very exciting and new content going on here!\nSome new exciting text that we've added!"

Pretty neat, right?!

#### Exercise – file creator function

In this exercise I want you to create a function that creates a text file (with the `.txt` file extension). It should take two arguments: First an absolute path (including the file name, _with_ file extension of course). Second, a string that should be the contents of the text file. 


Your function should check to see if the passed path exists, and if not, it should warn the user and return nothing more. Bonus points if your function also checks if a text file with the passed name already exists at the path location, and in that case just adds the passed string to the file on a new line.

Good luck!

#### Deleting files and folders

A quick warning before we learn how to delete files and folders. When doing so in Python code, the files we erase won't be moved to the trash bin of your computer. They will be permenantly erased. So be careful with what you type so you don't accidently don't remove something you want to keep :)

There are a number of ways of erasing files and folders. First, we have the `.remove()` method of the `os` module. Let's first create a file that we can delete with this method:

In [270]:
file = open("test.txt","w")
file.close()

Let's see if the file was created as expected:

In [271]:
Path("test.txt").is_file()

True

Great! Let's delete it!

In [272]:
os.remove("test.txt")

In [273]:
Path("test.txt").is_file()

False

Gone!

We can also use the `pathlib` module to remove files. The `Path` class has the method `.unlink()` that does the same work that `.remove()` did above:

In [274]:
file = open("test.txt","w")
file.close()

In [275]:
Path("test.txt").is_file()

True

In [276]:
Path("test.txt").unlink()

In [277]:
Path("test.txt").is_file()

False

Gone!

If we want to remove directories, we can do so with both the `os` and the `pathlib` modules. Here, I'll just show you the pathlib method. Let's first create a directory (but only if this directory doesn't exist):

In [284]:
if not Path('apples').is_dir():
    Path('apples').mkdir()

Above, we checked to see if there is a directory called "apples" in this current working directory. If not, create it! Let's see if it worked as expected:

In [285]:
Path('apples').is_dir()

True

Now, let's remove it using the `.rmdir()` method:

In [286]:
Path('apples').rmdir()

In [287]:
Path('apples').is_dir()

False

Gone! Success!

Now, in section 1.2.2.1, we created a text file and put it in the folder "fruits". Let's check to see if it's still there. If not, let's create a new directory and file with this following code!

(Try to go through this code, do you understand what is going on? I've added comments to help you out!)

In [289]:
path = Path("fruits") # relative path to the directory fruits

if not path.is_dir():
    # if the folder "fruits" doesn't exists, this code block executes
    path.mkdir() # create "frutis"
    path = path / "new_file.txt" # append filename to path
    path.open("w") # create file!
    
else:
    # if "fruits" exists, check to see if there is a text file within called "new_file.txt"
    path = path / "new_file.txt"
    
    if not path.is_file():
        # if no file, create it:
        path.open("w")

In [291]:
path = Path("fruits/new_file.txt")

In [292]:
path.exists()

True

Now, let's remove the "fruits" directory:

In [293]:
Path("fruits").rmdir()

OSError: [Errno 66] Directory not empty: 'fruits'

Whoopsie! This method doesn't work when there is content within the directory. It _only_ works on empty directories. Let's try the `os` module's method instead:

In [294]:
os.rmdir("fruits")

OSError: [Errno 66] Directory not empty: 'fruits'

Huh? Same problem there >:(

This means that to remove this "fruits" folder, we have to remove the file (or files) within it first. Which is annoying. Fortunatly, there is a way to just remove an entire tree of folders and files. We'll just have to import and use the `shutil` module. There we find the method `.rmtree()`.

**CAUTION!!!** Since this method **removes all directories and all files in a passed path** be VERY careful that you don't pass a path to something important. Use with care! 

Let's import it!

In [295]:
import shutil

In [296]:
shutil.rmtree("fruits")

In [297]:
path = Path("fruits/new_file.txt")
path.exists()

False

Gone!

As you can see, the entire directory and its contents have been deleted. Again, be careful with this method!

#### Copying and moving files

We can use the `shutil` module for copying and moving files. Just use the `.copy()` method!

In [304]:
import shutil

Let's create a new file and then a new directory. We'll try to move the file into the directory, using Python code!

In [306]:
# again, this if-statement is just to be sure there isn't such a folder already!
if not os.path.isdir("fruits"):
    os.mkdir('fruits') # creating a new folder called "fruits"

In [307]:
file = open("new_file.txt","w") # creating a new file
file.close()

Let's see if the we can see our new file and folder in the course directory, uncomment and check for yourself:

In [309]:
#os.listdir() # no path means that it will list the content in the current working directory

Now, let's move our file into the "fruits" folder, using the `.copy()` method of the `shutil` module. The copy method takes two arguments: one path to the file object we want to copy, and one path pointing to where we want to copy the file to. Let's create two such path objects:

In [311]:
origin_path = Path("new_file.txt")
destination_path = "fruits" / origin_path # path object syntax, remember? Check section 1.1.2.1 if not :)

In [312]:
shutil.copy(origin_path, destination_path)

PosixPath('fruits/new_file.txt')

The `.copy()` method returns the destination path as a default, so don't let that confuse you! Now let's check to see if it worked. Either check for yourself in the course folder on you computer, or by running this code:

In [313]:
os.listdir("fruits")

['new_file.txt']

While copying and moving our file, we can also rename it by writing a new name in the destination path argument. Let me show you what I mean: 

In [314]:
origin_path = Path("new_file.txt")
destination_path = "fruits" / Path("new_file_new_name.txt")

In [316]:
shutil.copy(origin_path, destination_path)

PosixPath('fruits/new_file_new_name.txt')

In [317]:
os.listdir("fruits")

['new_file.txt', 'new_file_new_name.txt']

See! We wook the file "new_file.txt" and copied it to the folder "fruits", renaming it to "new_file_new_name.txt" in the process. Pretty handy!

Now let's delete the fruits directory, with its content, and our test file "new_file.txt":

In [318]:
shutil.rmtree("fruits") # removes the "fruits" fodler and all its content

In [319]:
origin_path.unlink() # removes the "new_file.txt" file

In [320]:
origin_path.is_file()

False

In [322]:
Path("fruits").is_dir()

False

Gone! Well done!

#### Looping over files

Now we know how to manage files – creating, altering, deleting and moving them. And you actually now know enough to loop over files in directories to find what you're looking for! But let's do it together.

We'll start by using the `.listsir()` method of the `os` module to see the contents in the `course_material` folder:

In [325]:
import os

os.listdir("course_material/")

['mutable_scope.png',
 'excelfile.xlsx',
 '.DS_Store',
 'while_loop.png',
 'scopes.png',
 'readme',
 'windows_root.png',
 'immutable_1.png',
 'interrupt.png',
 'immutable_3.png',
 'speach.txt',
 'immutable_2.png',
 'if_statement.png',
 'hello.txt',
 'mutable.png',
 'mutable_2.png']

As you can see, we get _all_ files' and directories' names when using this method. But what if we only want the plaintext files, that is, the files with the file extension `.txt`? We can use a loop!

As you've may have noticed, all names that are returned from the `.listdir()` method are string values. This means that we can use string methods on each name in a for-loop. 

Since we're looking for the file extension `.txt`, and since file extensions always are at the end of filenames, this is a perfect situation to use the string method `.endswith()`. It does exactly what you think it does!

So let's create a for loop! But first, we're gonna need the path to where the files are located:

In [328]:
path = Path("course_material/")
file_list = os.listdir(path)

In [329]:
file_list

['mutable_scope.png',
 'excelfile.xlsx',
 '.DS_Store',
 'while_loop.png',
 'scopes.png',
 'readme',
 'windows_root.png',
 'immutable_1.png',
 'interrupt.png',
 'immutable_3.png',
 'speach.txt',
 'immutable_2.png',
 'if_statement.png',
 'hello.txt',
 'mutable.png',
 'mutable_2.png']

Now the for-loop:

In [330]:
for file in file_list:
    if file.endswith(".txt"):
        print(file)
    else:
        continue

speach.txt
hello.txt


Hang on, slowly now! What happend here exactly?

First, we created a path, pointing to the "course_material" directory. Then we saved the list of files into a variable named `file_list`. We created a for-loop that looped over all filenames in the `file_list` list. In each sequence of the loop, the if statement `if file.endswith(".txt")` returns `True` if the file extension is `.txt`. If true, the filename will be printed, otherwise, the else clause will be executed – only containing a continue statement.