# Welcome to the Dark Art of Coding:
## Introduction to Python
Interacting with the file system

<img src='../images/dark_art_logo.600px.png' width='300' style="float:right">

# Objectives
---

In this session, students should expect to:

* Understand how to use shutil to copy, move, and delete files & folders
* Understand how to walk a directory
* Explore file compression and decompression (zip/unzip) 

# Tool installation
---

1. In your command prompt/terminal, navigate to the folder with this class material.
1. Follow the directions below, appropriate to your operating system:
    * **WINDOWS**:<br> 
    `pip install Send2Trash-1.4.1-py3-none-any.whl`
    * **LINUX/MAC**:<br>
    `sudo pip install Send2Trash-1.4.1-py3-none-any.whl`


# Handling Files and Folders: `os` module
---

The `os` module allows you to interact with various operating systems. 

From the documentation:

`Programs that import and use 'os' stand a better chance of being
portable between different platforms.`

In [1]:
import os

When using a new module, it is useful to examine the help documentation and the methods associated with the module using...

* `os?`
* `os.<tab complete>` OR `dir(os)`

In [2]:
os?

In [None]:
dir(os)

In [3]:
# let's do something fairly straightforward:
#     ask for the current (or present) working directory

os.getcwd()

'/Users/chalmerlowe/gdrive/darkart/class_material_data_analysis/21'

In [4]:
# let's list the files/directories in the current folder

os.listdir()

['.DS_Store',
 '.ipynb_checkpoints',
 'attacks',
 'blackwidow.txt',
 'captain_america.txt',
 'deletable.dlt',
 'dirwalk.png',
 'heroes',
 'Icon\r',
 'ironman.txt',
 'logs.zip',
 'os_and_filesystem.ipynb',
 'Send2Trash-1.4.1-py3-none-any.whl']

In [5]:
# let's now change directories to one of the folders listed above...

os.chdir('heroes')

In [6]:
# and confirm that we really changed directories under the hood...
# But this time, let's save the path, so we can use it later...

path = os.getcwd()
path

'/Users/chalmerlowe/gdrive/darkart/class_material_data_analysis/21/heroes'

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
os_01.py```

Execute your script in the **IPython interpreter** using the command:

```bash
run os_01.py```

In your script, do the following:

* Create a function, called: `txtfilter()`
* In the function, create an empty list labeled `files`
   * Parse each file listed by `os.listdir()`
   * IF the filename ends in `txt`, add that file name to `files`
   * Return the list `files`
* Call the function so that it runs when your script runs
* Print the results to the screen

When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

# Copying files: `shutil` module
---

The `shutil` module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal.

https://docs.python.org/3/library/shutil.html

In [None]:
import shutil

Since we are using a new module, let's examine the help documentation and the methods associated with the module using...

* `shutil?`
* `shutil.<tab complete>` OR `dir(shutil)`

In [None]:
shutil?

In [None]:
dir(shutil)

In [None]:
# We should be in the heroes folder, let's take a look at what is stored in that folder.

os.listdir()

In [None]:
# To copy a file, we use the shutil.copy() method.

shutil.copy('avenger.txt', 'ultron.txt')

In [None]:
# When copying files, you have full control over file paths, etc...
# Here we store a copy of the file in a separate directory.

shutil.copy('ultron.txt', 'directory/ultron_copy.txt')

# Copying directory trees: `shutil` module
---

The `shutil` module allows you to copy individual files AND whole file structures.

In [None]:
# let's move up a directory...
# And examine the files

os.chdir('..')
os.listdir()

In [None]:
# To copy a folder and contents, use shutil.copytree()

shutil.copytree('heroes', 'villains')

os.listdir()

# Moving & renaming files & folders: `shutil`
---

In [None]:
# let's rename a file (by moving it from one filename
#                      to a new filename)

import shutil

shutil.move('blackwidow.txt', 'hulk.txt')
os.listdir()

In [None]:
# next, let's move a file to a new directory
#     and rename it at the same time...

shutil.move('ironman.txt', 'heroes/tonystark.txt')
os.listdir()

In [None]:
# what happens if we try to move a file to a 
#     directory that doesn't exist

shutil.move('captain_america.txt', 'non_existent_folder/capt_a.txt')

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
os_02.py```

Execute your script in the **IPython interpreter** using the command:

```bash
run os_02.py```

In your script:

* copy `blackwidow.txt` to a file called `bw_copy.txt`
* copy the folder `heroes` to a folder called `heroes_copy`
* move the file `bw_copy.txt` to a file called `blackwidow_2.txt`


When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

# Permanently deleting files and folders:
`os` & `shutil`
---

There are several methods to delete files and folder and directory trees.

* `os.unlink(path)       #` delete a file
* `os.rmdir(path)        #` delete an **empty** folder
* `shutil.rmtree(path)   #` remove a folder, all subdirectories & files

**NOTE**: these deletions are not true deletions (i.e. it will not withstand forensic examination). As implied by the method name `unlink`, these commands typically remove links in the filesystem to the file storage location on disk, essentially freeing that space to be overwritten at a later time. Until the disk space is overwritten sufficiently the data can often be retrieved by forensic software.

**WARNING**: It is often in your best interest to test your scripts if you intend to move/delete multiple unspecified files, because the default Python delete process does not send files to the Recycle Bin, etc. 

You can do this by crafting your script and putting `print()` functions in place to show the files that would be moved OR would be deleted.

In [None]:
# TEST script

import os
for filename in os.listdir():
    if filename.endswith('.dlt'):
        print('Deleting:', filename)
        # os.unlink(filename)

In [None]:
# REAL script
# Once you have confirmed the correct behavior, add in the os.unlink() method.
# NOTE: deletions occur silently.


import os
for filename in os.listdir():
    if filename.endswith('.dlt'):
        # print('Deleting:', filename)
        os.unlink(filename)


# Safer deletes:  `send2trash`
---

In [None]:
# A number of operating systems implement a Recycle Bin OR 
#     a Trash directory:
#     Files deleted by the operating system get sent to 
#     the Recycle Bin/Trash, where they will eventually be 
#     marked for deletion (Either by 'emptying' the Recycle
#     Bin manually OR by the natural deletion algorithm 
#     used by the operating system.)
# A third party library, send2trash allows Python to
#     send deleted files to the operating system's version
#     of Recycle Bin OR Trash.


import send2trash

fin = open('thor.txt', 'a')
fin.write('Thor, the son of Odin')
fin.close()

In [None]:
send2trash.send2trash('thor.txt')

# Walking through directories
---

Many directories are deeply nested and it is often convenient to walk through the directories to find particular files/folders.

<img src='dirwalk.png'>

In [None]:
import os

# os.walk() will produce tuples that list three items:
#     * the folder_name 
#     * a list of subfolders
#     * a list of files in the folder

# os.walk() recursively displays the subfolders, etc.


for item in os.walk('attacks'):
    print(item)


A number of tools return multiple values as tuples.
Often we can't effectively use the whole tuple. We really 
need the individual values broken out.

There are several ways to extract these contents from each tuple.

In [None]:
# METHOD zero

import os

for item in os.walk('./attacks'):
    
    # unpack the values here... using tuple unpacking
    folder, subfolders, filenames = item
    
    # Now we can easily reference each item
    print('Folder:', folder)
    print('  Subdirectories:', subfolders)
    print('       Filenames:', filenames)

In [None]:
# METHOD one >>> Pythonic

import os

# unpack the values in the for loop statement

for folder, subfolders, filenames in os.walk('./attacks'):
    print('Folder:', folder)
    print('  Subdirectories:', subfolders)
    print('       Filenames:', filenames)
    
# As long as the number of items to unpack is not TOO large,
#     this method is very Pythonic and reduces
#     the number of lines of code.

# Compressing/Extracting files: `zipfile` module
---

In [None]:
import zipfile

Since we are using a new module, let's examine the help documentation and the methods associated with the module...

In [None]:
zipfile?

In [None]:
dir(zipfile)

In [None]:
# let's open a zip file that we have in the local
#     directory

records = zipfile.ZipFile('logs.zip')

In [None]:
dir(records)

In [None]:
# we can now examine the names of all the folders and files
#     in the compressed file
# NOTE: the listing shows the folder structure, etc.
# Also... be aware that this process has NOT unzipped the
#     file... we are simply looking at metadata associated
#     with the zip file.

records.namelist()

In [None]:
# We may request a subsection of this metadata as a bundle 
#     via the .getinfo() method.

info = records.getinfo('logs/activities/results/funding.txt')

In [None]:
# To see what types of metadata are available, we can 
#     use the dir(function) or <tab.complete> to 
#     see the methods and attributes available.

dir(info)

In [None]:
# Looking at two examples... we can pull out
#     the file_size of the compressed file

fsize = info.file_size
fsize

In [None]:
# And we can pull out the compressed size of the
#     the compressed file

csize = info.compress_size
csize

In [None]:
# For fun, we can calculate the compression ratio:

ratio = round(fsize / csize, -1)

print('Compressed size is {}x smaller!'.format(ratio))


In [None]:
# FUN FACT: the round function can take positive and negative
#     rounding values

sample = 123.4567
print(round(sample, -2))    # rounds to the hundreds place
print(round(sample, -1))    # rounds to the tens
print(round(sample, 0))     # rounds to the units
print(round(sample, 1))     # rounds to the tenths
print(round(sample, 2))     # rounds to the hundredths
print(round(sample, 3))     # rounds to the thousandths


In [None]:
# As with many filehandling protocols, when you are
#     done with your toys, put them away.
#     Use the .close() method to close the zipfile.

records.close()

In [None]:
# If desired, you can extract all files from a zip:

In [None]:
os.listdir()

In [None]:
import zipfile

records = zipfile.ZipFile('logs.zip')
records.extractall()
records.close()

In [None]:
os.listdir()

In [None]:
import shutil
shutil.rmtree('logs')
os.listdir()

In [None]:
# If only a single file is needed, you can extract 
#     individual files, as well:

import zipfile
records = zipfile.ZipFile('logs.zip')
records.extract('logs/activities/results/funding.txt')

os.listdir()

In [None]:
# If we want files extracted and then placed in 
#     a specific location, we can easily 
#     assign a destination directory

import zipfile
records = zipfile.ZipFile('logs.zip')
records.extract('logs/venues/tokyo.txt',
                'temp/folder/folder')

# NOTE: in this case, it does not matter that
#     the destination folder doesn't exist

os.listdir()

In [None]:
records.close()

# Cleanup script
---

In [None]:
shutil.move('heroes/tonystark.txt', 'ironman.txt')
shutil.move('hulk.txt', 'blackwidow.txt')
open('deletable.dlt', 'w').close()


for file in ['heroes/ultron.txt', 'heroes/directory/ultron_copy.txt']:
   os.unlink(file)

for folder in ['temp', 'logs', 'villains']:
    shutil.rmtree(folder)

