<div style="color:red;background-color:black">
Diamond Light Source

<h1 style="color:red;background-color:antiquewhite"> Python Language: FileIO</h1>  

©2000-20 Chris Seddon 
</div>

## 1
Execute the following cell to activate styling for this tutorial

In [None]:
from IPython.display import HTML
HTML(f"<style>{open('my.css').read()}</style>")

## 2
Q. How do we read from a file in Python?  Well, Python builds a thin wraper around the C programming language support.  This wrapper is fairly easy to use.

To begin with, we need to differentiate between text and binary files.  Technically, in Red Hat Unix, there is no real distinction between these types, but the distinction is much more important in Windows.  However, Python handles text and binary files differently.

Let's begin with reading a text file:

In [None]:
try: 
    f = open("data/14763.dat", "r")
    try:
        for line in f:
            print(line, end=" ")
    finally:
        f.close()
except IOError as e:
    print(e)

## 3
It is important to use exception handling with the above code.  There are many reasons why reading a file might fail: perhaps we have used the wrong filename or we do not have permission to open the file or even the file might be on a remote file system and we get a network error.  

We start by opening the file in "read" mode: <pre>f = open("data/14763.dat", "r")</pre>

If all goes well the "open" call returns a pointer to a "file" object "f".  This object is basically a representation of the file's <a href="http://www.linfo.org/inode.html">inode</a>.  

"f" is an iterator and all iterators are can be used inside a "for" loop.  With this iterator it will return a line of text on each iteration.  When all the lines are read, the loop terminates and we should make sure the file is closed and its inode is garbage collected.  This should be performed in the finally block, because this block is always executed whether or not an exception is thrown.

Here is the same example where we try to read from a non existing file:

In [None]:
try: 
    f = open("data/non-existing-file.dat", "r")
    try:
        for line in f:
            print(line, end=" ")
    finally:
        f.close()
except IOError as e:
    print(e)

## 4 
One problem you often see with FileIO code is that the programmer forgets to use a finally block and the file doesn't get closed.  I should point out that this isn't normally a serious problem, because the operating system will close all open files automatically when the program finishes.  Nevertheless, it is poor practice (I once saw an application that opened several hundred files and forgot to close them - this hit the limit on the number of open files that could be opened simultaneously and the program crashed!).  

Python provides a shorthand for the above that includes the finally block - the "with" statement.  Here is the example rewritten using a "with" statement; this is the recommended way to read a file (it's shorter and less error prone).  The "with" statement automatically closes the file:

In [None]:
try:
    with open("data/14763.dat", "r") as f:
        for line in f:
            print(line, end=" ")
except IOError as e:
    print(e)

## 5
The file object "f" has a set of methods.  The loop above calls the "readline" method implicitly, but you can call methods explicitly.  The above example can be rewritten with an explicit call to "readline", although the code is somewhat less elegant:  

In [None]:
try:
    with open("data/14763.dat", "r") as f:
        line = True
        while line:
            line = f.readline()
            print(line, end=" ")
except IOError as e:
    print(e)

## 6
The file object "f" doesn't have to be used inside a loop. "f" can be used to read the entire file in one go into a string using the "read" method:

In [None]:
try: 
    with open("data/14763.dat", "r") as f:
        allLines = f.read()
        print(type(allLines))
        print(allLines)
except IOError as e:
    print(e)

## 7
The "readlines" method is similar to "read" except if reads the file into a list instead of a string:

In [None]:
try: 
    with open("data/14763.dat", "r") as f:
        allLines = f.readlines()
        print(type(allLines))
        print(allLines)
except IOError as e:
    print(e)

## 8
Note "\n" and "\t" in the above display.  These are the newline and tab characters.  

Displaying a large list, as in the above, makes things difficult to read, especially since the newline and tab characters are not expanded.  The following version of the program converts the list to a string, making it much easier to read (but obviously it would have been easier to use the "read" method as discussed earlier).  

Note the line: <pre>allLinesAsString = "".join(allLines)</pre>
This joins all elements of the list with the empty string "". 

In [None]:
try: 
    with open("data/14763.dat", "r") as f:
        allLines = f.readlines()
        print(type(allLines))
        allLinesAsString = "".join(allLines)
        print(allLinesAsString)
except IOError as e:
    print(e)

## 9
We can also create and write to files.  Code is analogous to the above examples.  

In all the examples observe that "open" has a second parameter after the filename.  This parameter is defined as follows: <pre>r: read
w: write and truncate
r+: read and write
w+: read, write and truncate
a: append
t: text mode
b: binary mode</pre>

Truncating a file means deleting its previous contents on opening (truncate to zero length).  Text mode is the default; text mode works with strings, but binary mode works with bytes.  Note that you can open a file both for reading and writing at the same time.  You can also open the same file several times simultaneously.

Let's start by writing a list of strings to a text file; the "w+" will elete the previous contents of the file:

In [None]:
data = ("line 1\n", "line 2\n", "line 3\n", "line 4\n", "line 5\n")
try:
    with open("data/example.txt", "w+") as f: 
        f.writelines(data)
except IOError as e:
    print(e)

## 10
We can check the file has been written correctly:

In [None]:
%%bash
cat data/example.txt

## 11
Reading and writing from and to binary files is also possible using methods of the file object.  Examples of binary fles are PDF files, Nexus files, images, audio and video files.  However, usually you will be using a library to work with these file and not resort to low level file object methods.  

But, just for the record, here is an example of writing a series of bytes to a file.  Note that when working with text files we use strings, but for binary files, Python insists on using bytes:

In [None]:
# use bytes
data = b"\x5F\x9D\x3E\x5F\x00\x00\x00\x00\x9D\x3E\x5F\x9D\x3E\x5F\x9D\x3E\x5F\x9D\x3E"

try:
    with open("data/myfile.bin", "wb") as f:
        f.write(data)
except IOError as e:
    print(e)

## 12
To read the binary file, we use <pre>hexdump</pre>

In [None]:
%%bash
hexdump data/myfile.bin

## 13
All the examples discussed so far use sequential IO.  When we read or write from/to a file the "file position indicator" moves sequentially through the file.  However, you can jump around in the file using random access.  This is normally done when working with binary files where you read/write records from/to the file at a known offset (number of bytes) from the start of the file.  

The following example shows how to write bytes to a file at offsets of 40, 140 and 240 bytes from the start of the file and then writing much further on in the file (4096*25 bytes into the file) using:  

`seek(offset, whence)`
* whence = 0: offset relative to start of file
* whence = 1: offset relative to current position in file
* whence = 2: offset relative to end of file
</pre>
The gaps will be filled with zeros:

In [None]:
b = bytes([0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33])
try:
    with open('data/myfile2.bin', 'wb') as myFile:
        myFile.seek(40, 0)
        myFile.write(b)

        myFile.seek(140, 0)
        myFile.write(b)

        myFile.seek(240, 0)
        myFile.write(b)

        myFile.seek(4096*25, 0)
        myFile.write(b)
except IOError as e:
    print(e)

## 14
Let's check it worked:

In [None]:
%%bash
hexdump data/myfile2.bin