In [1]:
%reset -f

from array import array
import numpy as np
import pickle

This notebook covers file I/O. The built-in function [open](https://docs.python.org/library/functions.html#open) opens a file and returns a corresponding [file object](https://docs.python.org/glossary.html#term-file-object). Python distinguishes between text and binary I/O. If `t` is included in the parameter `mode` of the function `open`, it is text mode, and if `b` is included, it is binary mode. Text I/O is the default mode so specifying e.g. `w` is equivalent to `wt`. The type of file object returned by the function open depends on the mode. When text mode is used, it returns a subclass of [io.TextIOBase](https://docs.python.org/library/io.html#io.TextIOBase) (specifically [io.TextIOWrapper](https://docs.python.org/library/io.html#io.TextIOWrapper)). When binary mode is used, the returned class is a subclass of [io.BufferedIOBase](https://docs.python.org/library/io.html#io.BufferedIOBase). The exact class varies: in read binary mode, it returns an [io.BufferedReader](https://docs.python.org/library/io.html#io.BufferedReader); in write binary and append binary modes, it returns an [io.BufferedWriter](https://docs.python.org/library/io.html#io.BufferedWriter). [IOBase](https://docs.python.org/library/io.html#io.IOBase) is the abstract base class for all I/O classes.

# Text files

The [write](https://docs.python.org/library/io.html#io.TextIOBase.write) method writes a `str` to the stream and returns the number of characters written.

```python
write(s, /)
```

The built-in function [print](https://docs.python.org/library/functions.html#print) can also be used to write to text files. It is good practice to use the [with](https://docs.python.org/reference/compound_stmts.html#with) statement when dealing with file objects. The advantage is that the file is properly closed even if an exception is raised at some point. Using with is also much shorter than writing equivalent try-catch blocks.

In [2]:
with open('file.txt', 'w') as f:
    print('line 1', file=f) # print adds the newline character
    print(f.write('\n'. join(['line 2', 'line 3'])))

13


The [read](https://docs.python.org/library/io.html#io.TextIOBase.read) method reads and returns at most *size* characters from the stream as a single `str`. If *size* is negative or `None`, reads until EOF. A slash in the argument list of a function denotes that the parameters prior to it are positional-only ([Python FAQ](https://docs.python.org/faq/programming.html#faq-positional-only-arguments)).

```python
read(size=-1, /)
```

In [3]:
with open('file.txt', 'r') as f:
    chars = f.read()
print(chars)

line 1
line 2
line 3


The [readline](https://docs.python.org/library/io.html#io.TextIOBase.readline) method reads until newline or EOF and returns a single `str`. If the stream is already at EOF, an empty string is returned. If *size* is specified, at most *size* characters will be read. If a newline is read, it is included in the string.

```python
readline(size=- 1, /)
```

In [4]:
with open('file.txt', 'r') as f:
    while line := f.readline():
        print(line, end='') # line includes the newline

line 1
line 2
line 3

IOBase (and its subclasses) supports the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream.

In [5]:
with open('file.txt', 'r') as f:
    for line in f:
        print(line, end='')

line 1
line 2
line 3

The [readlines](https://docs.python.org/library/io.html#io.IOBase.readlines) method reads and returns a list of lines from the stream. *hint* can be specified to control the number of lines read: no more lines will be read if the total size (in characters) of all lines so far exceeds *hint*.

```python
readlines(hint=- 1, /)
```

In [6]:
with open('file.txt', 'r') as f:
    print(f.readlines())

['line 1\n', 'line 2\n', 'line 3']


The [seek](https://docs.python.org/library/io.html#io.TextIOBase.seek) method changes the stream position to the given *offset*.

```python
seek(offset, whence=SEEK_SET, /)
```

The following code reads all lines, changes the stream position to its beginning and reads the lines again.

In [7]:
with open('file.txt', 'r') as f:
    print(f.readlines())
    f.seek(0)
    print(f.readlines())

['line 1\n', 'line 2\n', 'line 3']
['line 1\n', 'line 2\n', 'line 3']


# Binary files

The [write](https://docs.python.org/library/io.html#io.BufferedWriter.write) method writes the [bytes-like object](https://docs.python.org/glossary.html#term-bytes-like-object), b, and returns the number of bytes written.

```python
write(b, /)
```

The [read](https://docs.python.org/library/io.html#io.BufferedReader.read) method reads and returns *size* bytes, or if *size* is not given or negative, until EOF.

```python
read(size=- 1, /)
```

The following example writes and reads a `int` to/from a binary file. The `int` is converted to a [bytes](https://docs.python.org/library/stdtypes.html#bytes) object using the [to_bytes](https://docs.python.org/library/stdtypes.html#int.to_bytes) method of `int`.

In [8]:
with open('file.bin', 'wb') as f:
    print(f.write((100).to_bytes(4)))

with open('file.bin', 'rb') as f:
    print(int.from_bytes(f.read(4)))

4
100


The [array](https://docs.python.org/library/array.html#array.array) class can be used to read/write an array of basic values (characters, integers, floating point numbers).

In [9]:
# Write a list of int and a list of float to file using the array class
l_i = [1, 2]
l_f = [1.5, 2.5]

def write_array(arr, typecode):
    # Convert arr to an array instance
    arr = array(typecode, arr)
    # Write the size of the array using 4 bytes
    f.write(len(arr).to_bytes(4))
    # Write the array to file
    arr.tofile(f)

def read_array(f, typecode):
    # Read the size of the array (4 bytes)
    n = int.from_bytes(f.read(4))
    # Read the array from file
    (arr := array(typecode)).fromfile(f, n)
    return arr

with open('file.bin', 'wb') as f:
    write_array(l_i, 'l') # C type: signed long
    write_array(l_f, 'd') # C type: double

with open('file.bin', 'rb') as f:
    print(read_array(f, 'l').tolist())
    print(read_array(f, 'd').tolist())

[1, 2]
[1.5, 2.5]


The [numpy.savez](https://numpy.org/doc/stable/reference/generated/numpy.savez.html) method saves several arrays into a single file in uncompressed .npz format. The [numpy.load](https://numpy.org/doc/stable/reference/generated/numpy.load.html) method loads arrays or pickled objects from .npy, .npz or pickled files.

In [10]:
np.savez('file', l_i=l_i, l_f=l_f)
arrays = np.load('file.npz')
print(arrays['l_i'])
print(arrays['l_f'])

[1 2]
[1.5 2.5]


The [pickle](https://docs.python.org/library/pickle.html#module-pickle) module implements binary protocols for serializing and de-serializing a Python object structure and can be used with different data types (`list`, `tuple`, `dict`, `DataFrame`, etc.).

In [11]:
with open('file.pkl', 'wb') as f:
    pickle.dump((l_i, l_f), f)

with open('file.pkl', 'rb') as f:
    l_i, l_f = pickle.load(f)
print(l_i)
print(l_f)

[1, 2]
[1.5, 2.5]


# See also

- [io — Core tools for working with streams](https://docs.python.org/library/io.html)