# Python 104 - Writing Files, Inventorying Files

This notebook goes through the basics of writing files. We look through one basic example and one that extracts specific information from one file then writes it to a new file. After that, we look at a few modules that will help us to build an inventory of basic system information including filenames, locations (paths), and sizes. Once we identify this information we can use it to create an inventory manifest. 

First, let's look at the basics of writing files. 

## Writing Files

The basic function for writing files is the `write()` function. This can be used to write contents from the argument or 
to write multi-line content. Unlike in other environments like the GUI or shell, where the open command is often assumed, 
you may need to `open()` and then `close()` files when working in python. You cannot write to a file that is not known and opened, and a file that is not closed may be corrupted. 

Fortunately, we can usually use the contexual opener:

```python
with open(file, 'w') as f:
    ```

This will automatically close the file when the loop completes. The `w` argument indicates that the file is opened in "write" mode. If the file doesn't exist, the file will be written. 

In [1]:
# Basic use of open() and write()

line = 'Believe that life is worth living, and your belief will help create the fact.'
# Credit William James https://en.wikiquote.org/wiki/William_James

fout = open('quote-output.txt', 'w')

fout.write(line)

fout.close()

In [5]:
# use the with open() syntax to check if the file is there

with open('quote-output.txt', 'r') as f:
    print(f.read())

Believe that life is worth living, and your belief will help create the fact.


We can also extract information from a file then reuse that in another file. 
For example, we could extract the email addresses from `mbox-short.txt` and create
an address book file:

In [1]:
# create a path to the file
file = '../assets/mbox-short.txt'

# set up a file name for a file to create
fout = 'email-list.txt'

#establish a list to record emails as they are identified
emails = []

# open the source file to extract emails
with open(file, 'r') as f:
    for line in f:
        if line.startswith('From:'):
            email = line[6:]
            if email not in emails:
                emails.append(email)
print(emails, '\n\n')

# open another file in write mode to write the emails.
with open(fout, 'w') as f:
    for email in emails:
        f.write(email)

print(open(fout).read())

['stephen.marquard@uct.ac.za\n', 'louis@media.berkeley.edu\n', 'zqian@umich.edu\n', 'rjlowe@iupui.edu\n', 'cwen@iupui.edu\n', 'gsilver@umich.edu\n', 'wagnermr@iupui.edu\n', 'antranig@caret.cam.ac.uk\n', 'gopal.ramasammycook@gmail.com\n', 'david.horwitz@uct.ac.za\n', 'ray@media.berkeley.edu\n'] 


stephen.marquard@uct.ac.za
louis@media.berkeley.edu
zqian@umich.edu
rjlowe@iupui.edu
cwen@iupui.edu
gsilver@umich.edu
wagnermr@iupui.edu
antranig@caret.cam.ac.uk
gopal.ramasammycook@gmail.com
david.horwitz@uct.ac.za
ray@media.berkeley.edu



## Inventorying Files

For this activity, we are going to use a few modules that allow us to interact with the file system. These should be somewhat familiar after we have already looked into basic shell commands.

* `os` assists in using aspects of the operating system, in this case particularly file information and paths. See https://docs.python.org/3/library/os.html; 
* `os.path` is often called by itself and allows us to interact with file path and directory information. See https://docs.python.org/3/library/os.path.html#module-os.path. 
* `shutil` allows to access some shell utilities, like move, copy, rename, delete. See https://docs.python.org/3/library/shutil.html?highlight=shutils.

We will also use the `csv` module since it will help us to write the information that we gather to a structured data file that can later be opened in Excel or other spreadsheet applications. 

In [3]:
import os
from os.path import join, getsize
import csv

# set up the csv
headers = [
    'path',
    'filename',
    'file_extension',
    'size'
]

Once we know what we want in the csv, how do we get that information? We can use the `os` module to get file information. We will use the `os.walk` function to "walk" over the file tree, identify folder lists, paths, and filenames.  

In [4]:
walk_this_directory = os.path.join('..','assets','Bundle-web-files-small')

print(walk_this_directory)

../assets/Bundle-web-files-small


In [8]:
for FolderPaths, SubfolderNames, filenames in os.walk(walk_this_directory):
    print(FolderPaths)
    print(SubfolderNames)
    print(files)

../assets/Bundle-web-files-small
['audio', 'image', 'pdf', 'presentation', 'video']
['04-04-21full.asf', 'glmp_cig.EQ.wm.p20.t12z', 'oct17cc.asx', 'vlwhcssc.asx']
../assets/Bundle-web-files-small/audio
[]
['04-04-21full.asf', 'glmp_cig.EQ.wm.p20.t12z', 'oct17cc.asx', 'vlwhcssc.asx']
../assets/Bundle-web-files-small/image
[]
['04-04-21full.asf', 'glmp_cig.EQ.wm.p20.t12z', 'oct17cc.asx', 'vlwhcssc.asx']
../assets/Bundle-web-files-small/pdf
[]
['04-04-21full.asf', 'glmp_cig.EQ.wm.p20.t12z', 'oct17cc.asx', 'vlwhcssc.asx']
../assets/Bundle-web-files-small/presentation
[]
['04-04-21full.asf', 'glmp_cig.EQ.wm.p20.t12z', 'oct17cc.asx', 'vlwhcssc.asx']
../assets/Bundle-web-files-small/video
[]
['04-04-21full.asf', 'glmp_cig.EQ.wm.p20.t12z', 'oct17cc.asx', 'vlwhcssc.asx']


In [10]:
# get information about how many files are in each directory and how much space they take up
for FolderPaths, SubfolderNames, filenames in os.walk(walk_this_directory):
    print(FolderPaths, "consumes", end=" ")
    print(sum(getsize(join(FolderPaths, name)) for name in filenames), end=" ")
    print("bytes in", len(filenames), "non-directory files")


../assets/Bundle-web-files-small consumes 9069 bytes in 1 non-directory files
../assets/Bundle-web-files-small/audio consumes 25856261 bytes in 4 non-directory files
../assets/Bundle-web-files-small/image consumes 497284 bytes in 5 non-directory files
../assets/Bundle-web-files-small/pdf consumes 149427 bytes in 5 non-directory files
../assets/Bundle-web-files-small/presentation consumes 289792 bytes in 3 non-directory files
../assets/Bundle-web-files-small/video consumes 115706 bytes in 4 non-directory files


In [None]:
# get filepaths and names and sizes

In [None]:
# write to CSV with above-noted headers

## Reflection Activities

1. Write a script that can walk through a series of directories nad identify files based on their file extension. For example, perhaps you want to count the number of .pdf files or .jpg. Create file that can look for this information and then tally the files. Then, have the program output the list of filenames and filepaths in a CSV file. Call this file `extension_detector.py`. 
1. Write a script that creates a `master` and `derivative` directory within a subdirectory that has the file's name as its name. For example, if there are two files, one named `001.jpg` and `audition.wav`, there should be a directory named `001` and another named `audition`. Within these, there should be master and derivative folders. The original files should be in the `master` folder. Call this file `master_and_derivatives.py`.
1. Create a script that will create an inventory of all the files in the assets folder `Bundle-web-files-small`. The inventory should be a CSV file, and it should include the filename of the file, the extension, the path to the file, and the file size. You may include any other information that you think is important. Call this file `inventory_script.py`.