# Python File I/O

File is a named location on disk to store related information. It is used to permanently store data in a non-volatile memory (e.g. hard disk).

Since, random access memory (RAM) is volatile which loses its data when computer is turned off, we use files for future use of the data.

When we want to read from or write to a file we need to open it first. When we are done, it needs to be closed, so that resources that are tied with the file are freed.

Hence, in Python, a file operation takes place in the following order.

1. Open a file
2. Read or Write(perform an operation)
3. Close the file

# How to open a file?

Python has a built-in function open() to open a file. This function returns a file object, also called a handle, as it is used to read or modify the file accordingly.

In [None]:
f = open("test.txt")    # open file in current directory
f = open("C:/Python33/README.txt")  # specifying full path

We can specify the mode while opening a file. In mode, we specify whether we want to read 'r', write 'w' or append 'a' to the file. We also specify if we want to open the file in text mode or binary mode.

The default is reading in text mode. In this mode, we get strings when reading from the file.

On the other hand, binary mode returns bytes and this is the mode to be used when dealing with non-text files like image or exe files.

1. 'r' => Open a file for reading. (default)
2. 'w' => Open a file for writing. Creates a new file if it does not exist or truncates the file if it exists.
3. 'x' => Open a file for exclusive creation. If the file already exists, the operation fails.
4. 'a' => Open for appending at the end of the file without truncating it. Creates a new file if it does not exist.
5. 't' => Open in text mode. (default)
6. 'b' => Open in binary mode.
7. '+' => Open a file for updating (reading and writing)

In [None]:
f = open("test.txt")      # equivalent to 'r' or 'rt'
f = open("test.txt",'w')  # write in text mode
f = open("img.bmp",'r+b') # read and write in binary mode

Unlike other languages, the character 'a' does not imply the number 97 until it is encoded using ASCII (or other equivalent encodings).

Moreover, the default encoding is platform dependent. In windows, it is 'cp1252' but 'utf-8' in Linux.

So, we must not also rely on the default encoding or else our code will behave differently in different platforms.

Hence, when working with files in text mode, it is highly recommended to specify the encoding type.



In [None]:
f = open("test.txt",mode = 'r',encoding = 'utf-8')

# How to close a file Using Python?

When we are done with operations to the file, we need to properly close the file.

Closing a file will free up the resources that were tied with the file and is done using Python close() method.

Python has a garbage collector to clean up unreferenced objects but, we must not rely on it to close the file.

In [None]:
f = open("test.txt",encoding = 'utf-8')
# perform file operations
f.close()

This method is not entirely safe. If an exception occurs when we are performing some operation with the file, the code exits without closing the file.

A safer way is to use a try...finally block.

In [None]:
try:
   f = open("test.txt",encoding = 'utf-8')
   # perform file operations
finally:
   f.close()

This way, we are guaranteed that the file is properly closed even if an exception is raised, causing program flow to stop.

The best way to do this is using the with statement. This ensures that the file is closed when the block inside with is exited.

We don't need to explicitly call the close() method. It is done internally.

In [None]:
with open("test.txt",encoding = 'utf-8') as f:
   # perform file operations

# How to write to File Using Python?

In order to write into a file in Python, we need to open it in write 'w', append 'a' or exclusive creation 'x' mode.

We need to be careful with the 'w' mode as it will overwrite into the file if it already exists. All previous data are erased.

Writing a string or sequence of bytes (for binary files) is done using write() method. This method returns the number of characters written to the file.

In [None]:
with open("test.txt",'w',encoding = 'utf-8') as f:
   f.write("my first file\n")
   f.write("This file\n\n")
   f.write("contains three lines\n")

This program will create a new file named 'test.txt' if it does not exist. If it does exist, it is overwritten.

We must include the newline characters ourselves to distinguish different lines.

# How to read files in Python?

To read a file in Python, we must open the file in reading mode.

There are various methods available for this purpose. We can use the read(size) method to read in size number of data. If size parameter is not specified, it reads and returns up to the end of the file.

# Python File Methods

There are various methods available with the file object. Some of them have been used in above examples.

Here is the complete list of methods in text mode with a brief description.

1. close() => Close an open file. It has no effect if the file is already closed.
2. detach() => Separate the underlying binary buffer from the TextIOBase and return it.
3. fileno() => Return an integer number (file descriptor) of the file.
4. flush() => Flush the write buffer of the file stream.
5. isatty() => Return True if the file stream is interactive.
6. read(n) => Read atmost n characters form the file. Reads till end of file if it is negative or None.
7. readable() => Returns True if the file stream can be read from.
8. readline(n=-1) => Read and return one line from the file. Reads in at most n bytes if specified.
9. readlines(n=-1) => Read and return a list of lines from the file. Reads in at most n bytes/characters if specified.
10. write(s) => Write string s to the file and return the number of characters written.
11. writelines(lines) => Write a list of lines to the file.

# Python Directory and Files Management

If there are a large number of files to handle in your Python program, you can arrange your code within different directories to make things more manageable.

A directory or folder is a collection of files and sub directories. Python has the os module, which provides us with many useful methods to work with directories (and files as well).

# Get Current Directory

We can get the present working directory using the getcwd() method.

This method returns the current working directory in the form of a string. We can also use the getcwdb() method to get it as bytes object.

In [1]:
import os

os.getcwd()

'/home/dipanjan'

In [2]:
os.getcwdb()

b'/home/dipanjan'

The extra backslash implies escape sequence. The print() function will render this properly.

In [3]:
print(os.getcwd())

/home/dipanjan


# Changing Directory

We can change the current working directory using the chdir() method.

The new path that we want to change to must be supplied as a string to this method. We can use both forward slash (/) or the backward slash (\) to separate path elements.

It is safer to use escape sequence when using the backward slash.

In [None]:
os.chdir('C:\\Python33')

# List Directories and Files

All files and sub directories inside a directory can be known using the listdir() method.

This method takes in a path and returns a list of sub directories and files in that path. If no path is specified, it returns from the current working directory.

In [4]:
print(os.getcwd())

/home/dipanjan


In [5]:
os.listdir()

['Dataset_numbering.csv',
 'Untitled.ipynb',
 'Python3_Note_V.ipynb',
 '.thunderbird',
 'abmlInLaw.pdf',
 'cv_dl_resource_guide.pdf',
 'v2_Annotations_Train_mscoco',
 'Pattern_Classification_by_Richard_O._Dud.pdf',
 'Combined Machine Learning Algorithms.ipynb',
 'Public',
 'sample.jpeg',
 'transfer-learning-keras',
 '.adobe',
 '.zoom',
 'Music',
 'MachineLearning_FullCourse_Codes.ipynb',
 'Text Clustering with KMeans and TF-IDF.ipynb',
 'sample_image.png',
 'examples.desktop',
 '.cache',
 'Datasets',
 '.python_history',
 'v2_Annotations_Train_mscoco.zip',
 'bbc',
 '.profile',
 '.ipynb_checkpoints',
 'anaconda3',
 'NLP and Text Mining.ipynb',
 'Python3_Note_II.ipynb',
 'index.png',
 'v2_Questions_Train_mscoco.zip',
 'ImageClef-2019-VQA-Med-Training',
 '5_6086862149068521656.pdf',
 '.conda',
 'nltk_data',
 'TF-IDF Calculation.ipynb',
 'ACM',
 'fruits',
 'Untitled1.ipynb',
 '.sudo_as_admin_successful',
 'v2_Questions_Train_mscoco',
 'mozilla.pdf',
 '.dropbox-dist',
 'Dataset_number.csv',


# Making a New Directory

We can make a new directory using the mkdir() method.

This method takes in the path of the new directory. If the full path is not specified, the new directory is created in the current working directory.



In [6]:
os.mkdir('test')
os.listdir()

['Dataset_numbering.csv',
 'Untitled.ipynb',
 'Python3_Note_V.ipynb',
 '.thunderbird',
 'abmlInLaw.pdf',
 'cv_dl_resource_guide.pdf',
 'v2_Annotations_Train_mscoco',
 'Pattern_Classification_by_Richard_O._Dud.pdf',
 'Combined Machine Learning Algorithms.ipynb',
 'Public',
 'sample.jpeg',
 'transfer-learning-keras',
 '.adobe',
 '.zoom',
 'Music',
 'MachineLearning_FullCourse_Codes.ipynb',
 'Text Clustering with KMeans and TF-IDF.ipynb',
 'sample_image.png',
 'examples.desktop',
 '.cache',
 'Datasets',
 '.python_history',
 'v2_Annotations_Train_mscoco.zip',
 'bbc',
 '.profile',
 '.ipynb_checkpoints',
 'anaconda3',
 'NLP and Text Mining.ipynb',
 'Python3_Note_II.ipynb',
 'index.png',
 'v2_Questions_Train_mscoco.zip',
 'ImageClef-2019-VQA-Med-Training',
 '5_6086862149068521656.pdf',
 '.conda',
 'nltk_data',
 'TF-IDF Calculation.ipynb',
 'ACM',
 'fruits',
 'Untitled1.ipynb',
 '.sudo_as_admin_successful',
 'v2_Questions_Train_mscoco',
 'mozilla.pdf',
 '.dropbox-dist',
 'Dataset_number.csv',


# Renaming a Directory or a File

The rename() method can rename a directory or a file.

The first argument is the old name and the new name must be supplies as the second argument.

In [None]:
os.rename('test','new_one')

# Removing Directory or File

A file can be removed (deleted) using the remove() method.

Similarly, the rmdir() method removes an empty directory.

In [None]:
os.remove('old.txt')

In [None]:
os.rmdir('new_one')

However, note that rmdir() method can only remove empty directories.

In order to remove a non-empty directory we can use the rmtree() method inside the shutil module.

# Python Errors and Built-in Exceptions

When writing a program, we, more often than not, will encounter errors.

Error caused by not following the proper structure (syntax) of the language is called syntax error or parsing error.

In [7]:
if a < 3

SyntaxError: invalid syntax (<ipython-input-7-3e28e520013d>, line 1)

We can notice here that a colon is missing in the if statement.

Errors can also occur at runtime and these are called exceptions. They occur, for example, when a file we try to open does not exist (FileNotFoundError), dividing a number by zero (ZeroDivisionError), module we try to import is not found (ImportError) etc.

Whenever these type of runtime error occur, Python creates an exception object. If not handled properly, it prints a traceback to that error along with some details about why that error occurred.

In [8]:
1 / 0

ZeroDivisionError: division by zero

# Python Built-in Exceptions

Illegal operations can raise exceptions. There are plenty of built-in exceptions in Python that are raised when corresponding errors occur. We can view all the built-in exceptions using the local() built-in functions as follows.

In [9]:
locals()['__builtins__']

<module 'builtins' (built-in)>

This will return us a dictionary of built-in exceptions, functions and attributes.

Some of the common built-in exceptions in Python programming along with the error that cause then are tabulated below.

1. AssertionError => Raised when assert statement fails.
2. AttributeError => Raised when attribute assignment or reference fails.
3. EOFError => Raised when the input() functions hits end-of-file condition.
4. FloatingPointError => Raised when a floating point operation fails.
5. GeneratorExit => Raise when a generator's close() method is called.
6. ImportError => Raised when the imported module is not found.
7. IndexError => Raised when index of a sequence is out of range.
8. KeyError => Raised when a key is not found in a dictionary.
9. KeyboardInterrupt => Raised when the user hits interrupt key (Ctrl+c or delete).
10. MemoryError => Raised when an operation runs out of memory.
11. NameError => Raised when a variable is not found in local or global scope.
12. RuntimeError => Raised when an error does not fall under any other category.
13. SyntaxError => Raised by parser when syntax error is encountered.
14. IndentationError => Raised when there is incorrect indentation.
15. SystemError => Raised when interpreter detects internal error.
16. TypeError => Raised when a function or operation is applied to an object of incorrect type.
17. UnicodeError => Raised when a Unicode-related encoding or decoding error occurs.
18. ValueError => Raised when a function gets argument of correct type but improper value.
19. ZeroDivisionError => Raised when second operand of division or modulo operation is zero.

# Python Exception Handling - Try, Except and Finally

Python has many built-in exceptions that force your program to output an error when something in the program goes wrong.

When these exceptions occur, it causes the current process to stop and passes it to the calling process until it is handled. If not handled, our program will crash.

For example, if function A calls function B which in turn calls function C and an exception occurs in function C. If it is not handled in C, the exception passes to B and then to A.

If never handled, an error message is displayed and our program comes to a sudden unexpected halt.

# Catching Exceptions in Python

In Python, exceptions can be handled using a try statement.

A critical operation which can raise an exception is placed inside the try clause and the code that handles exceptions is written in the except clause.

We can thus choose what operations to perform once we have caught the exception. Here is a simple example.

In [1]:
# import module sys to get the type of exception
import sys

randomList = ['a', 0, 2]

for entry in randomList:
    try:
        print("The entry is", entry)
        r = 1/int(entry)
        break
    except:
        print("Oops!", sys.exc_info()[0], "occured.")
        print("Next entry.")
        print()
print("The reciprocal of", entry, "is", r)

The entry is a
Oops! <class 'ValueError'> occured.
Next entry.

The entry is 0
Oops! <class 'ZeroDivisionError'> occured.
Next entry.

The entry is 2
The reciprocal of 2 is 0.5


In this program, we loop until the user enters an integer that has a valid reciprocal. The portion that can cause an exception is placed inside the try block.

If no exception occurs, the except block is skipped and normal flow continues. But if any exception occurs, it is caught by the except block.

Here, we print the name of the exception using the exc_info() function inside sys module and ask the user to try again. We can see that the values a and 1.3 cause ValueError and 0 causes ZeroDivisionError.

# Catching Specific Exceptions in Python

In the above example, we did not mention any exception in the except clause.

This is not a good programming practice as it will catch all exceptions and handle every case in the same way. We can specify which exceptions an except clause will catch.

A try clause can have any number of except clauses to handle them differently, but only one will be executed in case an exception occurs.

We can use a tuple of values to specify multiple exceptions in an except clause. Here is an example pseudo code.

In [None]:
try:
   # do something
   pass

except ValueError:
   # handle ValueError exception
   pass

except (TypeError, ZeroDivisionError):
   # handle multiple exceptions
   # TypeError and ZeroDivisionError
   pass

except:
   # handle all other exceptions
   pass

# Raising Exceptions

In Python programming, exceptions are raised when corresponding errors occur at runtime, but we can forcefully raise it using the raise keyword.

We can also optionally pass values to the exception to clarify why that exception was raised.

In [2]:
raise KeyboardInterrupt

KeyboardInterrupt: 

In [3]:
raise MemoryError("This is an argument")

MemoryError: This is an argument

In [5]:
try:
    a = int(input("Enter a positive integer: "))
    if a <= 0:
        raise ValueError("That is not a positive number!")
except ValueError as ve:
    print(ve)

Enter a positive integer: -1
That is not a positive number!


# try...finally

The try statement in Python can have an optional finally clause. This clause is executed no matter what, and is generally used to release external resources.

For example, we may be connected to a remote data center through the network or working with a file or working with a Graphical User Interface (GUI).

In all these circumstances, we must clean up the resource once used, whether it was successful or not. These actions (closing a file, GUI or disconnecting from network) are performed in the finally clause to guarantee the execution.

Here is an example of file operations to illustrate this.

In [None]:
try:
   f = open("test.txt",encoding = 'utf-8')
   # perform file operations
finally:
   f.close()

This type of construct makes sure the file is closed even if an exception occurs.