# Chapter 10. File and Resource Management

## 10.1 Introduction

### 10.1.1 `open()`

Open a file:

(1) Arguments:

* file: path to file (required)
* mode: read/write/append, binary/text
* encoding: text encoding

(2) Mode:

* Binary: write and read the data as binary objects.
* Text: encode then write; decode then read; platform-specific encoding (e.g., '\n' ==> '\r\n' on Windows)

(3) Encoding

In [1]:
import sys
sys.getdefaultencoding()

'utf-8'

## 10.2 Writing text files

Note that `write()` returns the number of code points not the number of bytes.

In [2]:
help(open)

Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise IOError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position

In [3]:
f = open('wasteland.txt', mode='wt', encoding='utf-8')
help(f)

Help on TextIOWrapper object:

class TextIOWrapper(_TextIOBase)
 |  Character and line based layer over a BufferedIOBase object, buffer.
 |  
 |  encoding gives the name of the encoding that the stream will be
 |  decoded or encoded with. It defaults to locale.getpreferredencoding(False).
 |  
 |  errors determines the strictness of encoding and decoding (see
 |  help(codecs.Codec) or the documentation for codecs.register) and
 |  defaults to "strict".
 |  
 |  newline controls how line endings are handled. It can be None, '',
 |  '\n', '\r', and '\r\n'.  It works as follows:
 |  
 |  * On input, if newline is None, universal newlines mode is
 |    enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
 |    these are translated into '\n' before being returned to the
 |    caller. If it is '', universal newline mode is enabled, but line
 |    endings are returned to the caller untranslated. If it has any of
 |    the other legal values, input lines are only terminated by the

In [4]:
f.write('What are the roots that clutch, ')

32

In [5]:
f.write('what branches grow\n')

19

In [6]:
f.write('Out of this stony rubbish? ')

27

In [7]:
f.close()

## 10.3 Reading text files

In [8]:
g = open("wasteland.txt", mode="rt", encoding="utf-8")
g.read(32)

'What are the roots that clutch, '

In [9]:
g.read()

'what branches grow\nOut of this stony rubbish? '

In [10]:
g.read()

''

In [11]:
# The input argument is the offset from the start of the file.
# The return value is the new file pointer position.
g.seek(0)

0

In [12]:
g.readline()

'What are the roots that clutch, what branches grow\n'

In [13]:
g.readline()

'Out of this stony rubbish? '

In [14]:
g.readline()

''

In [15]:
g.seek(0)

0

In [16]:
g.readlines()

['What are the roots that clutch, what branches grow\n',
 'Out of this stony rubbish? ']

## 10.4 Appending to text files

In [17]:
h = open('wasteland.txt', mode='at', encoding='utf-8')
h.writelines(
    ['Son of man,\n',
     'You cannot say, or guess, ',
     'for you know only,\n',
     'A heap of broken images, ',
     'where the sun beats\n'])
h.close()

## 10.5 Files as iterators

Each element of the iterable is separated by newlines.

In [None]:
# SAVE AS files.py

#!/usr/bin/env python3

import sys

def main(filename):
    f = open(filename, mode='rt', encoding='utf-8')
    for line in f:
        # Don't use print() which will generate extra newlines after each line.
        sys.stdout.write(line)
    f.close()
    
if __name__ == '__main__':
    main(sys.argv[1])

```bash
$ chmod +x files.py
$ ./files.py wasteland.txt
What are the roots that clutch, what branches grow
Out of this stony rubbish? Son of man,
You cannot say, or guess, for you know only,
A heap of broken images, where the sun beats

```

## 10.6 Managing files with try...finally



In [None]:
# SAVE AS recaman.py

#!/usr/bin/env python3
"""Generate Recaman's sequence and write it to a text file."""

import sys
from itertools import count, islice

def sequence():
    """Generate Recaman's sequence."""
    seen = set()
    a = 0
    for n in count():
        yield a
        seen.add(a)
        c = a - n
        if c < 0 or c in seen:
            c = a + n
        a = c
        
def write_sequence(filename, num):
    """Write Recaman's sequence to a text file."""
    f = open(filename, mode='wt', encoding='utf-8')
    f.writelines("{}\n".format(r)
                 # There is a bug in the code given by tutorial: 
                 # to write 'num' Recaman numbers into the text file,
                 # the stop index should be `num`  instead of `num + 1`.
                 for r in islice(sequence(), num))
    f.close()
    
if __name__ == '__main__':
    write_sequence(filename=sys.argv[1],
                   num=int(sys.argv[2]))

In [None]:
# SAVE AS series.py

#!/usr/bin/env python3
"""Read and print an integer series."""

import sys

def read_series(filename):
    f = open(filename, mode='rt', encoding='utf-8')
    series = []
    for line in f:
        a = int(line.strip()) # strip() removes the newline character.
        series.append(a)
    f.close()
    return series

def main(filename):
    series = read_series(filename)
    print(series)
    
if __name__ == '__main__':
    main(sys.argv[1])

```bash
$ chmod +x recaman.py
$ ./recaman.py recaman.dat 1000

$ chmod +x series.py
$ ./series.py recaman.dat
```

* If we manually change one integer in recaman.dat into a string like "oops!", `./series.py recaman.dat` will be terminated unexpectedly with a raised ValueError exception. Also, the file object `f` won't be closed.

```bash
$ ./series.py recaman.dat
Traceback (most recent call last):
  File "./series.py", line 20, in <module>
    main(sys.argv[1])
  File "./series.py", line 16, in main
    series = read_series(filename)
  File "./series.py", line 10, in read_series
    a = int(line.strip()) # strip() removes the newline character.
ValueError: invalid literal for int() with base 10: 'oops!'
```

In [None]:
# SAVE AS series.py

#!/usr/bin/env python3
"""Read and print an integer series."""

import sys

def read_series(filename):
    try:
        f = open(filename, mode='rt', encoding='utf-8')
        return [ int(line.strip()) for line in f ]
    finally:
        f.close()

def main(filename):
    series = read_series(filename)
    print(series)
    
if __name__ == '__main__':
    main(sys.argv[1])

## 10.7 Context manager and with-blocks

### 10.7.1 Typical file use

```python
f = open()
# work work work
f.close()
```

Note that `close()` is required to actually write the data!

### 10.7.2 With-block

* Resource cleanup with context-managers

* Remove the explicit call of `close()`.

"Beautiful is better  
than ugly"

"Sugary syntax  
fewer defects attained through  
sweet fidelity"

In [None]:
# SAVE AS recaman.py

#!/usr/bin/env python3
"""Generate Recaman's sequence and write it to a text file."""

import sys
from itertools import count, islice

def sequence():
    """Generate Recaman's sequence."""
    seen = set()
    a = 0
    for n in count():
        yield a
        seen.add(a)
        c = a - n
        if c < 0 or c in seen:
            c = a + n
        a = c
        
def write_sequence(filename, num):
    """Write Recaman's sequence to a text file."""
    with open(filename, mode='wt', encoding='utf-8') as f:
        f.writelines("{}\n".format(r)
                     # There is a bug in the code given by tutorial: 
                     # to write 'num' Recaman numbers into the text file,
                     # the stop index should be `num`  instead of `num + 1`.
                     for r in islice(sequence(), num))
        
if __name__ == '__main__':
    write_sequence(filename=sys.argv[1],
                   num=int(sys.argv[2]))

In [18]:
# SAVE AS series.py

#!/usr/bin/env python3
"""Read and print an integer series."""

import sys

def read_series(filename):
    with open(filename, mode='rt', encoding='utf-8') as f:
        return [ int(line.strip()) for line in f ]

def main(filename):
    series = read_series(filename)
    print(series)
    
if __name__ == '__main__':
    main(sys.argv[1])

FileNotFoundError: [Errno 2] No such file or directory: '-f'

## 10.8 Writing binary files

(1) Example: device-independent bitmaps

(2) Bitwise operations

In [None]:
# SAVE AS bmp.py

"""A module for dealing with BMP bitmap image files."""

def write_grayscale(filename, pixels):
    """Creates and writes a grayscale BMP file.
    
    Args:
        filename: The name of the BMP file to be created.
        
        pixels: A rectangular image stored as a sequence of rows.
            Each row must be an iterable series of integers in the 
            range 0-255.
            
    Raises:
        OSError: If the file couldn't be written.
    """
    # BUGBUG: Need to check if each row has the same length.
    height = len(pixels)
    width = len(pixels[0])
    
    with open(filename, 'wb') as bmp:
        # BMP header
        bmp.write(b'BM')
        
        size_bookmark = bmp.tell()      # The next four bytes hold the filesize as a 32-bit integer.
        bmp.write(b'\x00\x00\x00\x00')  # Little-endian integer. Zero placeholder for now.
        
        bmp.write(b'\x00\x00')  # Unused 16-bit integer - should be zero
        bmp.write(b'\x00\x00')  # Unused 16-bit integer - should be zero
        
        pixel_offset_bookmark = bmp.tell()  # The next four bytes hold the integer offset 
        bmp.write(b'\x00\x00\x00\x00')      # to the pixel data. Zero placeholder for now.
        
        # Image Header
        bmp.write(b'\x28\x00\x00\x00')      # Image header size in bytes - 40 decimal
        bmp.write(_int32_to_bytes(width))   # Image width in pixels
        bmp.write(_int32_to_bytes(height))  # Image height in pixels
        bmp.write(b'\x01\x00')              # Number of image planes
        bmp.write(b'\x08\x00')              # Bits per pixel 8 for grayscale
        bmp.write(b'\x00\x00\x00\x00')      # No compression
        bmp.write(b'\x00\x00\x00\x00')      # Zero for uncompressed images
        bmp.write(b'\x00\x00\x00\x00')      # Unused pixels per meter
        bmp.write(b'\x00\x00\x00\x00')      # Unused pixels per meter
        bmp.write(b'\x00\x00\x00\x00')      # Use whole color table
        bmp.write(b'\x00\x00\x00\x00')      # All colors are important
        
        # Color palette - a linear grayscale
        for c in range(256):
            bmp.write(bytes((c, c, c, 0)))  # Blue, Green, Red, Zero
            
        # Pixel data
        pixel_data_bookmark = bmp.tell()
        for row in reversed(pixels):  # BMP files are bottom to top
            row_data = bytes(row)
            bmp.write(row_data)
            padding = b'\x00' * ((4 - (len(row) % 4)) % 4)  # Pad row to multiple of four bytes
            bmp.write(padding)
            
        # End of file
        eof_bookmark = bmp.tell()
        
        # Fill in file size placeholder.
        bmp.seek(size_bookmark)
        bmp.write(_int32_to_bytes(eof_bookmark))
        
        # Fill in pixel offset placeholder.
        bmp.seek(pixel_offset_bookmark)
        bmp.write(_int32_to_bytes(pixel_data_bookmark))
        
def _int32_to_bytes(i):
    """Convert an integer to four bytes in little-endian format."""
    return bytes((i & 0xff,
                  i >> 8 & 0xff,
                  i >> 16 & 0xff,
                  i >> 24 & 0xff))

In [None]:
# SAVE AS fractal.py

import math

def mandel(real, imag):
    """Compute a point in the Mandelbrot.
    
    The logrithm of number of iterations needed to 
    determine whether a complex point is in the 
    Mandelbrot set.
    
    Args:
        real: The real coordinate
        imag: The imginary coordinate
        
    Returns:
        An integer in the range 1-255.
    """
    
    x = 0
    y = 0
    for i in range(1, 257):
        if x * x + y * y > 4.0:
            break
        xt = real + x * x - y * y
        y = imag + 2.0 * x * y
        x = xt
    return int(math.log(i) * 256 / math.log(256)) - 1

def mandelbrot(size_x, size_y):
    """Make an Mandelbrot set image.
    
    It generates the best image when the aspect ratio is close 7:4.
    
    Args:
        size_x: Image width
        size_y: Image height
        
    Returns:
        A list of lists of integers in the range 0-255.
    """
    
    return [[mandel((3.5 * x / size_x) - 2.5, (2.0 * y / size_y) - 1.0)
             for x in range(size_x)]
            for y in range(size_y)]

In [1]:
import fractal
pixels = fractal.mandelbrot(448, 256)
print(pixels)

[[31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, 73

In [2]:
import bmp
bmp.write_grayscale('mandel.bmp', pixels)

## 10.9 Reading binary files

In [None]:
# ADDED to bmp.py

def dimension(filename):
    """Determine the dimensions in pixels of a BMP image.
    
    Args:
        filename: The filename of a BMP file.
        
    Returns:
        A tuple containing two integers with the width
        and height in pixels.
        
    Raises:
        ValueError: If the file was not a BMP file.
        OSError: If there was a problem reading the file.
    """
    with open(filename, 'rb') as f:
        magic = f.read(2)
        if magic != b'BM':
            raise ValueError("{} is not a BMP file".format(filename))
            
        f.seek(18)
        width_bytes = f.read(4)
        height_bytes = f.read(4)
        
        return _bytes_to_int32(width_bytes), 
            _bytes_to_int32(heigth_bytes)
        
def _bytes_to_int32(b):
    """Convert a bytes object containing four bytes into an integer."""
    return b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24)

In [1]:
import bmp
bmp.dimension('mandel.bmp')

(448, 256)

## 10.10 File-like objects

"If it looks like a file and reads like a file, then it is a file."

In [2]:
def words_per_line(flo):
    return [ len(line.split()) for line in flo.readlines() ]

with open('wasteland.txt', mode = 'rt', encoding = 'utf-8') as real_file:
    wpl = words_per_line(real_file)

wpl

[9, 8, 9, 9]

In [3]:
type(real_file)

_io.TextIOWrapper

In [4]:
from urllib.request import urlopen

with urlopen('http://sixty-north.com/c/t.txt') as web_file:
    wpl = words_per_line(web_file)
    
wpl

[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 7, 8, 14, 12, 8]

In [5]:
type(web_file)

http.client.HTTPResponse

## 10.11 Closing with context managers

The `with`-statement can be used with any type of object which implements the context-manager protocol.

In [None]:
# SAVE AS fridge.py

"""Demonstrate raiding a refrigerator."""

class RefrigeratorRaider:
    """Raid a refrigerator"""
    
    def open(self):
        print("Open fridge door.")
        
    def take(self, food):
        print("Finding {}...".format(food))
        if food == 'deep fried pizza':
            raise RuntimeError("Health warning!")
        print("Taking {}".format(food))
        
    def close(self):
        print("Close fridge door.")
        
def raid(food):
    r = RefrigeratorRaider()
    r.open()
    r.take(food)
    r.close()

In [6]:
from fridge import raid
raid('bacon')

Open fridge door.
Finding bacon...
Taking bacon
Close fridge door.


In [7]:
# Door is not closed.
raid('deep fried pizza')

Open fridge door.
Finding deep fried pizza...


RuntimeError: Health warning!

In [None]:
from contextlib import closing

def raid(food):
    with closing(RefrigeratorRaider()) as r:
        r.open()
        r.take(food)

In [1]:
from fridge import raid
# Door is closed.
raid('deep fried pizza')

Open fridge door.
Finding deep fried pizza...
Close fridge door.


RuntimeError: Health warning!