# <center> STATS 607 - LECTURE 5
## <center> 09/19/2018

# <center> Reading and writing files

You'll certainly need to read data from files to data structures. For structured data, lets say, a table of data in a excel or csv file, you'll be able to use pandas input/output functions to read the data. But it is very important to know how to work with files in Python, specially if you need to deal with unstructured data.

In [1]:
f = open('workfile') # Opens the file specified in the first argument with mode specified in the second argument.

In [2]:
f.close() # You always need to close the file object so to free OS resources.

Below is a very convenient way of opening files. This is because using the keyword 'with' will guarantee that the file object will be closed.

In [3]:
with open('workfile') as f:
    read_data = f.read()

In [4]:
f.closed # Tells whether the file is open or closed.

True

In [5]:
read_data # This file is actually completely empty.

''

I created a file named 'textFile.txt' with the content below.

first
second

third
fourth
fifth

In [6]:
f = open('textFile.txt', 'r') # If you don't specify the second argument, the file will be opened in reading mode.
read_data = f.read() # Notice that function 'read' reads the entire content of the file.
read_data

'first\nsecond\n\nthird\nfourth\nfifth\n'

In [7]:
print(read_data, end='') # The print function (as seen before) will interpret the end of line characters.

first
second

third
fourth
fifth


In [8]:
f.read() # You have reached the end of the file. There is nothing more to be read.

''

In [9]:
f.seek(0) # This will redirect file to the beginning of the file.

0

In [10]:
print(f.readline(), end='')

first


In [11]:
print(f.readline(), end='')

second


In [12]:
for line in f: # This is a very convenient, fast way of accessing lines of a file.
    print(line, end='')


third
fourth
fifth


In [13]:
f.seek(0)
list(f) # You can also create a list out of lines in a file.

['first\n', 'second\n', '\n', 'third\n', 'fourth\n', 'fifth\n']

In [14]:
f.seek(0)
f.readlines() # Returns the same as above.

['first\n', 'second\n', '\n', 'third\n', 'fourth\n', 'fifth\n']

In [15]:
f.seek(0)
lines = [x.rstrip() for x in f] # Now cleaning up the list while reading lines from the file.
lines

['first', 'second', '', 'third', 'fourth', 'fifth']

You can also supply an argument to the read function. This argument has different meanings based on the opening mode. If you open the file in text mode, the argument represent the number of characters and what determines a character is determined by the file's encoding ('utf-8' is the default). If you opened the file in binary mode (you can do that using the character 'b' in the mode), the read argument represent raw bytes.

In [16]:
f.seek(0)
f.read(3) # Reads a number of bytes equivalent to the number of characters requested.

'fir'

In [17]:
f.read(7) # Again, notice that the read method advances the file handler position everytime is used.

'st\nseco'

If you want to get to know the default encoding, you can find it out using the sys module, as below.

In [18]:
import sys
sys.getdefaultencoding() 

'utf-8'

Now, if you want to write to a file...

In [19]:
f = open('newtextfile.txt', 'w') # Make sure you open the file in writing mode.
f.write('This is a new line!\n')

20

What if we would like to write a tuple to a file?

In [20]:
value = ('A string', 10)
s = str(value)  # convert the tuple to string
print(s, end='')
f.write(s)

('A string', 10)

16

In [21]:
f.close()

In [22]:
with open('newtextfile.txt') as f:
    lines = [x.rstrip() for x in f]
lines

['This is a new line!', "('A string', 10)"]

The default behavior for Python files (whether readable or writable) is text mode (Python strings). This contrasts with binary mode, but if you want to write and read more complex objects, you'll need to use serialization and deserialization functions (to be seen in a later class).

# <center> Errors and Exceptions

Handling Python errors or exceptions is an important part of building robust programs. There are (at least) two distinguishable kinds of errors: syntax errors and exceptions.

What is missing in the code below? This is an example of a syntactic error.

In [23]:
for i in range(3)
    print('Here!')

SyntaxError: invalid syntax (<ipython-input-23-7faf7ecb466b>, line 1)

Below are examples of errors detected during execution, or exceptions. The three examples below all correspond to different types of exceptions.

In [24]:
1 + '1' 

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [25]:
1/0

ZeroDivisionError: division by zero

In [26]:
int('sf')

ValueError: invalid literal for int() with base 10: 'sf'

Lets test the following function:

In [27]:
def test_exception_handling(x):
    """"""
    output1 = 10+x
    print('First statement was executed properly!')
    output2 = 10/x
    print('Second statement was executed properly!')
    return output1,output2

In [28]:
test_exception_handling(2)

First statement was executed properly!
Second statement was executed properly!


(12, 5.0)

In [29]:
test_exception_handling('2')

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [30]:
test_exception_handling(0)

First statement was executed properly!


ZeroDivisionError: division by zero

Lets say you would like to handle these execution errors differently:

In [31]:
def test_exception_handling(x):
    """"""
    try:
        output1 = 10+x
        print('First statement was executed properly!')
        output2 = 10/x
        print('Second statement was executed properly!')
        return output1,output2
    except TypeError:
        print(x, "of type", type(x), "can't be summed with integer 10 - Please call again this function!")

In [32]:
test_exception_handling(2)

First statement was executed properly!
Second statement was executed properly!


(12, 5.0)

In [33]:
test_exception_handling('-') # This exception was caught and its body executed.

- of type <class 'str'> can't be summed with integer 10 - Please call again this function!


You could also handle more than one exception in the same body code and there are different ways you can do that:

In [34]:
def test_exception_handling(x):
    """"""
    try:
        output1 = 10+x
        print('First statement was executed properly!')
        output2 = 10/x
        print('Second statement was executed properly!')
        return output1,output2
    except TypeError:
        print(x, "of type", type(x), "can't be summed with integer 10 - Please call again this function!")
    except ZeroDivisionError:
        print("You are trying to divide by ", x)
        output2 = 'NA'
        return output1,output2

In [35]:
test_exception_handling(0)

First statement was executed properly!
You are trying to divide by  0


(10, 'NA')

Note that you can also do one single thing for more than one exception type. In the case below, you'll be doing absolutely anything, which could result in worse behavior:

In [36]:
def test_exception_handling(x):
    """"""
    try:
        output1 = 10+x
        print('First statement was executed properly!')
        output2 = 10/x
        print('Second statement was executed properly!')
        return output1,output2
    except (TypeError, ZeroDivisionError):
        pass

In [37]:
test_exception_handling('2') # Note that your function did not pass the first statement, but you have no idea what happened.

In [38]:
test_exception_handling(0)

First statement was executed properly!


In [39]:
test_exception_handling(2)

First statement was executed properly!
Second statement was executed properly!


(12, 5.0)

You can also raise exceptions again using the 'raise' keyword. The 'else' keyword contains code that will be executed in case no exceptions occur in the 'try' body:

In [40]:
def test_exception_handling(x):
    """"""
    try:
        output1 = 10+x
        print('First statement was executed properly!')
        output2 = 10/x
        print('Second statement was executed properly!')
    except TypeError:
        pass
    except:
        print('What did just happened?')
        raise
    else:
        print('Both statements were executed properly!')
        return output1,output2

In [41]:
test_exception_handling(2)

First statement was executed properly!
Second statement was executed properly!
Both statements were executed properly!


(12, 5.0)

In [42]:
test_exception_handling(0)

First statement was executed properly!
What did just happened?


ZeroDivisionError: division by zero

Finally, you can use the 'finally' keyword for code that needs to be executed regardless:

In [43]:
def test_exception_handling(x):
    """"""
    try:
        output1 = 10+x
        print('First statement was executed properly!')
        output2 = 10/x
        print('Second statement was executed properly!')
    except TypeError:
        print(x, "of type", type(x), "can't be summed with integer 10 - Please call again this function!")
    except ZeroDivisionError:
        print("You are trying to divide by ",x)
        output2 = 'NA'
        return output1,output2
    else:
        print('Both statements were executed properly!')
        return output1,output2
    finally:
        print("I do this under any circunstances!")

In [44]:
test_exception_handling(-2)

First statement was executed properly!
Second statement was executed properly!
Both statements were executed properly!
I do this under any circunstances!


(8, -5.0)

In [45]:
test_exception_handling(0)

First statement was executed properly!
You are trying to divide by  0
I do this under any circunstances!


(10, 'NA')