<div class="pagebreak"></div>

# File Operations

Pythonʼs view of files and directories derives from the Unix/Linux operating system variants.  [Overview of the Unix File System](https://web.archive.org/web/20210419161551/https://homepages.uc.edu/~thomam/Intro_Unix_Text/File_System.html)<br>
(You should know the material on the "Overview of the Unix File System" page.)

Python's [os](https://docs.python.org/3/library/os.html) module provides support for file operations and interacting with the operating system.

Python's functionality largely mirrors that as provided by various command-line programs and the underlying standard C libraries upon which Python is implemented.


## Existence
To see whether or not a given file or directory exists, call `os.path.exists()` with the name as the argument.

In [2]:
import os
print("test_binary.dat",os.path.exists("test_binary.dat"))
print("binary.dat",os.path.exists("binary.dat"))
print(".",os.path.exists("."))        # current directory
print("..",os.path.exists(".."))      # parent directory

test_binary.dat False
binary.dat False
. True
.. True


## Checking Filetype
Use `os.path.isfile()` to return a Boolean on whether or the argument is a file.

Use `os.path.isdir()` to return a Boolean on whether or the argument is a directory.

In [None]:
print("isfile: test_binary.dat", os.path.isfile("test_binary.dat"))
print("isdir: test_binary.dat", os.path.isdir("test_binary.dat"))

## Deleting Files
To delete a file, use `os.remove()`.

In [1]:
os.remove("test_binary.dat")
os.path.exists("test_binary.dat")   #verify that file was removed

NameError: name 'os' is not defined

## File Information: stat
To get details (Unix/Linux calls "status"), call `os.stat()`.  This returns an object with various fields to represent the permissions on the file, the file's type, size, owner, group, and various timestamps. 

[stat documentation](https://docs.python.org/3/library/os.html#os.stat)

In [3]:
stat_obj = os.stat('.')
print(stat_obj)

os.stat_result(st_mode=16895, st_ino=112871465660973341, st_dev=44, st_nlink=1, st_uid=1000, st_gid=1000, st_size=4096, st_atime=1657207179, st_mtime=1657207179, st_ctime=1657207179)


Initially, that result looks very esoteric, but once we break down a few of the fields, it makes more sense.

The `st_mode` contains the file type and permissions associated with the file.  Using `ls -l`, we see this data represented with a string that looks like '-rwxr-xr-x'.  This first character specifies the type: '-' for files and 'd' for directories.  The next nine characters represent the user, group, and world permissions in terms of read, write, and execute.  Typically, st_mode makes more sense in its octal representation.

In [4]:
print(oct(stat_obj.st_mode))

0o40777


The first number represents the file type.  You will see 40 for a directory and 100 for a file.  The last three numbers correspond to the owner, group, and world permissions using a bit representation for read, write, and execute. For example, 111 in binary equals 7 in octal - so read, write, and execute permssions are set for that group.  101 = 5 in octal, so only read and execute permissions are set.  100 = 4 in octal, so only read.

For more explanation, see the "Understanding and Modifying File Permissions" section of [Overview of the Unix File System](https://web.archive.org/web/20210419161551/https://homepages.uc.edu/~thomam/Intro_Unix_Text/File_System.html).

st_size is the number of bytes to contain the file's contents.

st_atime, st_mtime, and st_ctime represent when the file was last accessed, modified, and created.  The times are specified in seconds. To convert to a date and time, they present the number of seconds since the Unix epoch, which is midnight on January 1st, 1970.  While this fact seems  esoteric, this is a ubiquitous representation of dates and times.  Fortunately, as with other languages, Python provides APIs to perform the necessary conversion into a datetime object.

In [8]:
import datetime
accessed_dt = datetime.datetime.utcfromtimestamp(stat_obj.st_atime).replace(tzinfo=datetime.timezone.utc)
print(accessed_dt.isoformat())

2022-07-07T15:19:39.897603+00:00


## Directory Operations
As with files, Python supports various directory operations.

### Create Directory
Use `os.mkdir()` to create a new directory 

In [None]:
os.mkdir('newDir')

### List Directory Contents
Use `os.listdir()` list the contents of a directory.  This method returns a list of file names (strings) within that directory.

In [None]:
os.listdir('newDir')

In [None]:
os.listdir('.')

In [None]:
# now, make a subdirectory in newDir
os.mkdir('newDir/newSubDir')
os.listdir('newDir')

In [None]:
with open("newDir/newSubDir/dickens.txt", 'w') as f:
    f.write('It was the best of times,\n, it was the worst of times.\n')

In [None]:
os.listdir('newDir/newSubDir')

### Delete Directory
To delete a directory, use `os.rmdir()`.  However, the directory must be empty to be deleted – it cannot contain any other files or directories.  You cannot use `os.remove()` to delete a directory, only a file.

In [None]:
# this will cause an error as remove can't be used on directory
os.remove('newDir/newSubDir')

In [None]:
# this will cause an error as the directory is not empty
os.rmdir('newDir/newSubDir')

Fix the following code block to delete the text file created above first.

In [None]:
# add a method call here

# the following two lines of code are correct
os.rmdir('newDir/newSubDir')
os.path.exists('newDir/newSub')

### Change the Current Working Directory
Use `os.chdir()` to change the current working directory.

In [None]:
os.chdir('newDir')

Now, enter the method call to list the contents of the current directory

For other file and directory operations, look at the [os](https://docs.python.org/3/library/os.html#module-os) module.

In [None]:
os.chdir('..')  # move the current directory back to our starting point

## Pathnames
Most computers use a hierarchical file system. As such, we have a current working directory based on our current shell session. Other times, a setting when an executable starts can establish the working directory. At the command line within the shell session), you can print the working directory with `pwd`.  With Python, we get the current working directory with 

In [None]:
os.getcwd()

Within Jupyter Notebooks, we can also call out to the operating system:

In [None]:
!pwd

Throughout this notebook (and in most file/directory operation commands), we pass a directory name or file name as arguments into the various function calls. As we specify those names, we can  either specify *absolute* or *relative* pathnames.  *Absolute* pathnames start from the root (top) directory - these pathnames start with a `/`. *Relative* pathnames start from the current directory.  As demonstrated in this notebook's first code block, `.` refers to the current directory, and `..` refers to its parents.  

To separate directories, most systems use a forward slash `/`. The exception is Windows, which uses a backward slash `\`.  The reasoning dates back to the early days of MS-DOS in the 1980s.  The '/' was used to specify command line arguments, whereas Unix typically uses a dash `-`.  Windows is slowly migrating away from the `\`.  Within PowerShell, you can specify names with a `/`, PowerShell converts it automatically to `\`. Powershell uses `-` to specify arguments. This migration demonstrates how difficult it is to overcome an implemented decision.

### Finding Absolute Pathnames
From a relative pathname, we can determine the absolute pathname with `os.path.abspath()`

In [None]:
os.path.abspath('.')

### Creating Pathnames
We can build a pathname from several parts(i.e., strings) by using `os.path.join()`.  This function combines names with the proper path separation character for the current operating system.

In [None]:
os.path.join('stuff','foo','bar.txt')

## Pathlib
In Python 3.4, the language developers added the `pathlib` module.  This module provides an alternative to the `os` module presented in this notebook.

The `pathlib` module introduced a `Path` class to treat files and directories as objects with methods we call from that object rather than strings and calling functions under `os`.  

[Further details](https://docs.python.org/3/library/pathlib.html)  The very bottom of that page shows the correspondence between the two approaches.

## Exercise
For the current working directory, print each of the files on a separate line.  Each line should start with the file size in bytes, followed by a tab character, and then the file's name. Do not display subdirectories. Sort this output by the file name. After all of the lines have been, print a blank line and then this line: 
<pre>
Directory size: XXXX
</pre>
where XXX is the total of all the file sizes (excluding subdirectories). 
