In [None]:
CHAPTER 5 File IO
‘File’ is an indispensible word in the vocabulary of even an ordinary
computer (even mobile) user. Every day, he is required to deal with files
which may be documents, spreadsheets, presentations, images, and so on.
Slightly advanced user, to whom we may call developer, prepares scripts,
builds executables which are also files.

In [None]:
When user starts any application, he enters certain data, either through
keyboard or any other device such as mouse, camera, scanner, and so on.
The data goes in computer’s main memory and is further processed as per
the process as defined in the application. If this data - input or resulting from
process - is needed for subsequent use, it is saved in computer file, because
if left in the computer memory, it will be erased when computer is turned
off. In this chapter, we shall discuss how data from Python program is stored
in persistent disk files.


In [None]:
A Python console application interacts with peripheral devices through its
built-in input() and print() functions. Channels of interaction between
processor and peripheral devices are called streams. A stream is any object
that sends/receives a continuous flow of data. Python’s input() function
reads data from standard input streaming device i.e. keyboard that is
recognized as sys.stdin object defined in sys built-in module. Similarly
print() function sends data to standard output device which is a computer
display screen (monitor), defined as sys.stdout object.

In [None]:
The stdin object has read() and readline() methods to accept user input
through keyboard. The read() method accepts data till stream is terminated
by Ctrl+D character. On the other hand readline() method accepts all
keystrokes till ‘Enter’ key is pressed. Both methods leave ‘\n’ at the end of
input.
Example 5.1
>>> data=sys.stdin.read()Hello
How are you?
>>> data
'Hello\nHow are you?\n'
>>> data=sys.stdin.readline()
Hello How are you?
>>> data
'Hello How are you?\n'
>>>

In [None]:
In fact, the input()function performs stdin.readline() and returns by
stripping the trailing ‘\n’ character.
The write() method available to stdout object does exactly what print()
function does. It sends the argument data to default output device – the
computer monitor. However, when use in interactive mode, it also displays
size of object in bytes.
Example 5.2
>>> sys.stdout.write(data)
Hello How are you?
19
>>>


In [None]:
Any object that can send/receive stream of bytes is called ‘File like object’
in Python. A file (like) object can invoke read() or write() methods
depending upon the stream to which it is connected. Hence, stdin and stdout
object are file like objects too. Python can perform IO operations with
objects representing disk file, network sockets, memory arrays, and so on. In
this chapter we shall deal with computer disk files. These files store data in a
persistent manner, which is often the need as same collection of data may be
needed repeatedly. Instead of laboriously keying in same data again and
again from keyboard, reading it from a file becomes more efficient and less
error-prone. Similarly, screen output being temporary and limited, can
instead be stored in files. (Figure 5.1)

In [None]:
Standard IO streams communicating with stdin and stdout objects are
always available. To use a disk file for reading/writing purpose file object
needs to be declared first, by using built-in open() function.

In [None]:
5.1 Opening File
The open() function takes a string corresponding to disk file’s name along
with its path as an argument. Second argument indicates to the mode in
which file is intended to be opened. Default file opening is ‘r’ which stands
for ‘read’ mode which means data in the file is read into program variables.
In order to use file as output destination, use ‘w’ as value of mode parameter.
The function returns a file object.
Example 5.3
>>> obj=open('test.txt','r')
Traceback (most recent call last):
File "<pyshell#16>", line 1, in <module>
obj=open('test.txt','r')
FileNotFoundError: [Errno 2] No such file
'test.txt'
>>> obj=open('test.txt','w')
>>> obj.close()
>>>
----------------
Note that when in ‘r’ mode open() function can open existing file otherwise
raises FileNotFoundError. Always ensure that the opened file object is
closed to flush data if any in the buffer.

In [None]:
5.2 Writing to File
The file object needs write permission to be able to save data to a file –
which is done by setting mode parameter to ‘w’. Let us store a famous quote
by top computer scientist Alan kay in ‘top-quotes.txt’ file.
To begin with declare a file object referring to desired file with ‘w’ mode
enabled.
>>> file=open('top-quotes.txt','w')
The write() method sends a string to the file object and stores it in the
underlying disk file.
Example 5.4
>>> quote="'The best way to predict the future is to invent
it.' - Alan Kay"
>>> file.write(quote)
64
>>> file.close()
Note that the interactive mode shows number of bytes written. Be sure to
close the file and go ahead and view the created file using your favourite text
editor software (like Notpad) to confirm that above quote is stored in it.

In [None]:
So, now that we have successfully created a file, let us try add a few more
quotes in it as follows:
Example 5.5
>>> file=open('top-quotes.txt','w')
>>>
quote=''''There
are
only
two
kinds
of
programming
languages: those people always bitch about and those nobody
uses.' - Bjarne Stroustrup
'The only way to learn a new programming language is by writing
programs in it.' -Dennis Ritchie'A computer would deserve to be called intelligent if it could
deceive a human into believing that it was human.' – Alan
Turing'''
>>> file.write(quote)
352
>>> file.close()
Note the use of triple quotes to form a multi-line string. Open the file again
using editor. To your surprise, the earlier string is not visible now. Why?
The reason is behaviour of ‘w’ mode. It opens a file for writing purpose,
erasing its earlier contents if it already exists. Here we want to add few more
lines to existing file. For that we need to use ‘a’ as mode parameter to let
new data added to existing file. Following table 5.1 lists other valid mode
parameters and their purpose:

In [None]:
Table 5.1: Valid mode parameters
mode
parameter purpose
r allows the file to be read. (default)
w Opens a file for writing only, erases contents if existing
a appends new data to existing file.
t stores data in text format (default)
b stores data in binary format.
+ allows simultaneous reading and writing in file.
x opens file for exclusive creation.
Going back to our attempt to add new quotes in top-quotes.txt, change the
mode parameter to ‘a’.


In [None]:
Example 5.6
>>> file=open('top-quotes.txt','a')>>>
quote=''''There
are
only
two
kinds
of
programming
languages: those people always bitch about and those nobody
uses.' - Bjarne Stroustrup
'The only way to learn a new programming language is by writing
programs in it.' -Dennis Ritchie
'A computer would deserve to be called intelligent if it could
deceive a human into believing that it was human.' – Alan
Turing'''
>>> file.write(quote)
352
>>> file.close()
The file should have earlier text intact and new quotes added after it. File
object also possesses writelines() method to write strings in a list object.
Each item in the list is treated as one line. Note that the method doesn’t
insert ‘\n’ by itself, hence it must be explicitly provided as a part of each
string.

In [None]:
Example 5.7
>>> file=open('top-quotes.txt','a')
>>> quotes=[
'programming languages have a devious influence. They shape
our thinking habits - Edsger W. Dijkstra\n',
'programmers do programming not because they expect to get paid
or get adulation by the public, but because it is fun to
program - Linus Torvalds\n',
'A computer would deserve to be called intelligent if it could
deceive a human into believing that it was human - Alan
Turing\n']
>>> file.writelines(quotes)
>>> file.close()

In [None]:
5.3 Reading a File
Let us now read ‘top-quotes.txt’ programmatically by opening it with ‘r’
mode. When in ‘r’ mode, the file object can call read(), readline(), and
readlines() methods.Out of these, read() method can read specified number of bytes from file,
the size defaults to file size. However, if file is very big, the available
memory restrictions may not allow entire file to be read, so you may have to
provide size parameter to read bytes at once.
Example 5.8
>>> file=open('top-quotes.txt','r')
>>> text=file.read()
>>> text
"'The best way to predict the future is to invent it.' - Alan
Kay'\nThere are only two kinds of programming languages: those
people always bitch about and those nobody uses.' - Bjarne
Stroustrup\n'The only way to learn a new programming language
is by writing programs in it.' -Dennis Ritchie\n'A computer
would deserve to be called intelligent if it could deceive a
human
into
believing
that
it
was
human.’
–
Alan
Turing\nprogramming languages have a devious influence. They
shape our thinking habits - Edsger W. Dijkstra\nprogrammers do
programming not because they expect to get paid or get
adulation by the public, but because it is fun to program -
Linus
Torvalds\nA
computer
would
deserve
to
be
called
intelligent if it could deceive a human into believing that it
was human - Alan Turing”
>>>file.close()

In [None]:
To read specified number of bytes from the beginning of file
Example 5.9
>>> file=open('top-quotes.txt','r')
>>> text=file.read(65)
>>> text
"'The best way to predict the future is to invent it.' - Alan
Kay\n"
>>> file.close()

In [None]:
Reading a File Line-by-LineThe readline() method reads all bytes till a newline character ‘\n’ is
encountered. It returns an empty string when no more lines are left to be
read. Hence, we can call readline() method in a loop till empty string is
returned. (Figure 5.2)
>>> file=open('top-quotes.txt','r')
>>> while True:
line=file.readline()
if line=='':break
print (line, end='')
'The best way to predict the future is to invent it.' - Alan
Kay
'There are only two kinds of programming languages: those
people always bitch about and those nobody uses.' - Bjarne
Stroustrup
'The only way to learn a new programming language is by writing
programs in it.' -Dennis Ritchie
'A computer would deserve to be called intelligent if it could
deceive a human into believing that it was human.' – Alan
Turing
programming languages have a devious influence. They shape our
thinking habits - Edsger W. Dijkstra
programmers do programming not because they expect to get paid
or get adulation by the public, but because it is fun to
program - Linus Torvalds
A computer would deserve to be called intelligent if it could
deceive a human into believing that it was human - Alan Turing

In [None]:
The file object is a data stream that acts as an iterator. An iterator serves
subsequent object every time next() function is called till stream is
exhausted and StopIteration exception is encountered. We can use next()
method on file object to read a file line by line.
Example 5.10
f=open("top-quotes.txt","r")while True:
try:
line=next(f)
print (line, end="")
except StopIteration:
break
f.close()
Or simply use a for loop over each line in the file iterator:
Example 5.11
file=open("top-quotes.txt","r")
for line in file:
print (line, end="")
file.close()
The readlines() method returns list of lines in the file.
>>> file=open('top-quotes.txt','r')
>>> lines=file.readlines()

In [None]:
5.4 Write/Read Binary File
When the mode parameter to open() function is set to ‘w’ (or ‘a’) value, the
file is prepared for writing data in text format. Hence write() and
writelines() methods send string data to the output file stream. The file is
easily readable using any text editor. However, other computer files that
represent pictures (.jpg), multimedia (.mp3, .mp4), executables (.exe, .com),
databases (.db, .sqlite) will not be readable if opened using notepad like text
editor because they contain data in binary format.


In [None]:
Python’s open() function lets you to write binary data to a file. You need to
add ‘b’ to the mode parameter. Set mode to ‘wb’ to open a file for writing
binary data. Naturally, such file needs to be read by specifying ‘rb’ as file
opening mode.
In case of ‘wb’ mode parameter, the write() method doesn’t accept a string
as argument. Instead, it requires a bytes object. So, even if you are writing a
string to a binary file, it needs to be converted to bytes first. The encode()method of string class converts string to bytes using one of the defined
encoding schemes, such as ‘utf-8’ or ‘utf-16’.
Example 5.12
>>> f=open('binary.dat','wb')
>>> data='Hello Python'
>>> encoded=data.encode(encoding='utf-16')
>>> encoded
b'\xff\xfeH\x00e\x00l\x00l\x00o\x00
\x00P\x00y\x00t\x00h\x00o\x00n\x00'
>>> f.write(encoded)
26
>>> f.close()

In [None]:
To read back data from binary file, encoded string will have to be decoded
for it to be used normally, like printing it.
Example 5.13
>>> f=open('binary.dat','rb')
>>> encoded=f.read()
>>> data=encoded.decode(encoding='utf-16')
>>> print (data)
Hello Python
>>> f.close()


In [None]:
If you have to write numeric data to a binary file, number object is first
converted to bytes and then provided as argument to write() method
Example 5.14
>>>
>>>
>>>
>>>
>>>
f=open('binary.dat','wb')
num=1234
bin1=num.to_bytes(16, 'big')
f.write(bin1)
f.close()
To read integer data from binary file, it has to be extracted from bytes in the
file.Example 5.15
>>> f=open('binary.dat','rb')
>>> data=f.read()
>>> num=int.from_bytes(data,'big')
>>> num
1234
>>> f.close()

In [None]:
To read integer data from binary file, it has to be extracted from bytes in the
file.Example 5.15
>>> f=open('binary.dat','rb')
>>> data=f.read()
>>> num=int.from_bytes(data,'big')
>>> num
1234
>>> f.close()
In to_bytes() and from_bytes() function, ‘big’ is the value of byteorder
parameter.
Writing float data to binary file is a little complicated. You’ll need to use
built-in struct module and pack() function from it to convert float to binary.
Explanation of struct module is beyond the scope of this book.
Example 5.16
>>>
>>>
>>>
>>>
>>>
4
>>>
import struct
num1=1234.56
f=open('binary.dat','wb')
binfloat=struct.pack('f',num1)
f.write(binfloat)
f.close()


In [None]:
To read back binary float data from file, use unpack() function.
Example 5.17
>>> f=open('binary.dat','rb')
>>> data=f.read()
>>> num1=struct.unpack('f', data)
>>> print (num1)
(1234.56005859375,)
>>> f.close()

In [None]:
5.5 Simultaneous Read/Write
Modes ‘w’ or ‘a’ allow a file to be written to, but not to be read from.
Similarly ‘r’ mode facilitates reading a file but prohibits writing data to it.To be able to perform both read/write operations on a file without closing it,
add ‘+’ sign to these mode characters. As a result ‘w+’, ‘r+’ or ‘a+’ mode
will open file for simultaneous read/write. Similarly, binary read/write
simultaneous operations are enabled on file if opened with ‘wb+’, ‘rb+’ or
‘ab+’ modes.
It is also possible to perform read or write operation at any byte position of
file. As you go on writing data in a file, the end of file (EOF) position keeps
on moving away from its beginning. By default, new data is written at
current EOF position. When opened in ‘r’ mode, reading starts from 0 th byte
i.e. from beginning of file.


In [None]:
The seek() method of file object lets you set current reading or writing
position to a desired byte position in the file. It locates the desired position
by counting the offset distance from beginning (0), current position (1), or
EOF (2). Following example illustrates this point:
Example 5.18
>>> file=open("testfile.txt","w+")
>>> file.write("This is a rat race")
>>> file.seek(10,0) #seek 10th byte from beginning
>>> txt=file.read(3) #read next 3 bytes
>>> txt
'rat'
>>> file.seek(10,0) #seek 10th byte position
>>> file.write('cat') #overwrite next 3 bytes
>>> file.seek(0)
>>> text=file.read() #read entire file
>>> text
'This is a cat race'
>>> file.close()
Of course this may not work correctly if you try to insert new data as it may
overwrite part of existing data. One solution could be to read entire content
in a memory variable, modify it, and rewrite it after truncating existing file.
Other built-in modules like fileinput and mmap allow modifying file in-
place. However, later in this book, we are going to discuss a more
sophisticated tool for manipulating databases and update data on a random
basis.

In [None]:
5.6 File Handling using os Module
Python’s built-in library has os module that provides useful operating system
dependent functions. It also provides functions for performing low level
read/write operations on file. Let us briefly get acquainted with them.
The open() function from os module (obviously it needs to be referred to as
os.open()) is similar to the built-in open() function in the sense it also opens
a file for read/write operations. However, it doesn’t return a file or file like
object but a file descriptor, an integer corresponding to file opened. File
descriptor’s values 0, 1 and 2 are reserved for stdin, stdout, and stderr
streams. Other files will be given incremental file descriptor. Also, the
write() and read() functions of os module needs bytes object as argument.


In [None]:
The os.open() function needs to be provided one or combinations of
following constants: (Table 5.2)
Table 5.2 File handling constants
os.O_WRONLY open for writing only
os.O_RDWR open for reading and writing
os.O_APPEND append on each write
os.O_CREAT create file if it does not exist
os.O_TRUNC truncate size to 0
os.O_EXCL error if create and file exists
As in case of file object, the os module defines lseek() function to set file r/w
position at desired place from beginning, current position or end indicated by
integers 0,1, and 2 respectively.


In [None]:
Example 5.19
>>>
>>>
>>>
>>>
>>>
>>>
fd=os.open("testfile.txt",os.O_RDWR|os.O_CREAT)
text="Hello Python"
encoded=text.encode(encoding='utf-16')
os.write(fd, encoded)
os.lseek(fd,0,0)
encoded=os.read(fd)>>> os.path.getsize("testfile.txt") #calculate file size
>>> encoded=os.read(fd,26)
>>> text=encoded.decode('utf-16')
>>> text
'Hello Python'

In [None]:
5.7 File/Directory Management Functions
We normally use operating system’s GUI utilities or DOS commands to
manage directories, copy, and move files etc. The os module provides useful
functions to perform these tasks programmatically.
os.mkdir() function creates a new directory at given path. The path
argument may be absolute or relative to current directory. Use chdir()
function to set current working directory at desired path. The getcwd()
function returns current working directory path.


In [None]:
Example 5.20
>>> os.mkdir('mydir')
>>> os.path.abspath('mydir')
'E:\\python37\\mydir'
>>> os.chdir('mydir')
>>> os.getcwd()
'E:\\python37\\mydir'
You can remove a directory only if the given path to rmdir() function is not
the current working directory path, and it is empty.
Example 5.21
>>> os.chdir('..') #parent directory becomes current working
directory
>>> os.getcwd()
'E:\\python37'
>>> os.rmdir('mydir')
The rename() and remove() functions respectively change name of a file
and delete a file. Another utility function is listdir() which returns a list
object comprising of file and subdirectory names in given path.

In [None]:
5.8 Exceptions
Even an experienced programmer’s code does contain errors. If errors
pertain to violation of language syntax, more often than not, they are
detected by interpreter (compiler in case of C++/Java) and code doesn’t
execute till they are corrected.
There are times though, when the code doesn’t show syntax related errors
but errors creep up after running it. What is more, sometimes code might
execute without errors and some other times, its execution abruptly
terminates. Clearly some situation that arises in a running code is not
tolerable to the interpreter. Such a runtime situation causing error is called
exception.
Take a simple example of displaying result of two numbers input by user.
Following snippet appears error free as far as syntax error is concerned.
Example 5.22
num1=int(input('enter a number..'))
num2=int(input('enter another number..'))
result=num1/num2
print ('result: ', result)
When executed, above code gives satisfactory output on most occasions, but
when num2 happens to be 0, it breaks. (Figure 5.3)

In [None]:
You can see that program terminates as soon as it encounters the error
without completing rest of the statements. Such abnormal termination may
prove to be harmful in some cases.
Imagine a situation involving a file object. If such runtime error occurs after
file is opened, abrupt end of program will not give a chance for file object to
close properly and it may result in corruption of data in file. Hence
exceptions need to be properly handled so that program ends safely.
If we look at the class structure of builtins in Python, there is an Exception
class from which a number of built-in exceptions are defined. Depending
upon the cause of exception in running program, object representing
corresponding exception class is created. In this section, we restrict
ourselves to consider file operations related exceptions.

In [None]:
Python’s exception handling mechanism is implemented by use of two
keywords – try and except. Both keywords are followed by block of
statements. The try: block contains a piece of code that is likely to encounter
an exception. The except: block follows the try: block containing
statements meant to handle the exception. Above code snippet of division of
two numbers is rewritten to use try – catch mechanism.
Example 5.23
try:
num1=int(input('enter a number..'))
num2=int(input('enter another number..'))
result=num1/num2
print ('result: ', result)
except:
print ("error in division")
print ("end of program")
Now there are two possibilities. As said earlier, exception is a runtime
situation largely depending upon reasons outside the program. In the code
involving division of two numbers there is no exception if denominator is
non-zero. In such case, try: block is executed completely, except: block is
bypassed and program proceeds to subsequent statements.
If however, denominator happens to be zero, statement involving division
produces exception. Python interpreter abandons rest of statements in try:block and sends the program flow to except: block where exception
handling statements are given. After except: block rest of unconditional
statements keep on executing. (Figure 5.4)
Figure 5.4 Exception Handling

In [None]:
Here, except block without any expression acts as a generic exception
handler. To catch object of specific type of exception, corresponding
Exception class is mentioned in front of except keyword. In this case
ZeroDivisionError is raised, so it is mentioned in except statement. Also
you can use ‘as’ keyword to receive exception object in an argument and
fetch more information about exception.


In [None]:
Example 5.24
try:
num1=int(input('enter a number..'))
num2=int(input('enter another number..'))
result=num1/num2
print ('result: ', result)
except ZeroDivisionError as e:
print ("error message",e)
print ("end of program")
File operations are always prone to raising exceptions. What if the file you
are trying to open doesn’t exist at all? What if you opened a file in ‘r’ mode
but trying to write data to it? These situations will raise runtime errors
(exceptions) which must be handled using try – except mechanism to avoid
damage to data in files.FileNotFoundError is a common exception encountered. It appears when
attempt to read a non-existing file. Following code handles the error.

In [None]:
Example 5.25
fn=input('enter filename..')
try:
f=open(fn,'r')
data=f.read()
print (data)
except FileNotFoundError as e:
print ("error message",e)
print ("end of program")
Output (Figure 5.5):

In [None]:
Another exception occurs frequently when you try to write data in a file
opened with ‘r’ mode. Type of exception is UnsupportedOperation defined
in io module.
Example 5.26
import io
try:
f=open('testfile.txt','r')
f.write('Hello')
print (data)except io.UnsupportedOperation as e:
print ("error message",e)
print ("end of program")

In [None]:
As we know, write() method of file object need a string argument. Hence
argument of any other type will result in typeError.
Example 5.27
try:
f=open('testfile.txt','w')
f.write(1234)
except TypeError as e:
print ("error message",e)
print ("end of program")
Output (Figure 5.7):
Figure 5.7 Output
Conversely, for file in binary mode, write() method needs bytes object as
argument. If not, same TypeError is raised with different error message.

In [None]:
Example 5.28
try:
f=open('testfile.txt','wb')
f.write('Hello')
except TypeError as e:print ("error message",e)
print ("end of program")
Output (Figure 5.8):
Figure 5.8 Output
All file related functions in os module raise OSError in the case of invalid
or inaccessible file names and paths, or other arguments that have the correct
type, but are not accepted by the operating system.

In [None]:
Example 5.29
import os
try:
fd=os.open('testfile.txt',os.O_RDONLY|os.O_CREAT)
os.write(fd,'Hello'.encode())
except OSError as e:
print ("error message",e)
print ("end of program")
Output (Figure 5.9):
Figure 5.9 Output
In this chapter we learnt the basic file handling techniques. In next chapter
we deal with advanced data serialization techniques and special purpose file
storage formats using Python’s built-in modules.