# MFE 9815 - Software Engineering in Finance

## Fall 2016 - Python Class - Part 3

### Alain Ledon

# Today We'll Cover

* Exceptions
* Reading and Writing Files
* Serialization in Python
* Reading Excel (if you have to)

# Exceptions

* Exceptions indicate errors and break out of the normal control flow of a program. 
* An exception is raised using the **raise** statement. 
* If the raise statement is used by itself, the last exception generated is raised again (although this works only while handling a previously raised exception). 
* Exceptions are caught using **try** and **except** statements
* [Errors and Exceptions](http://docs.python.org/tutorial/errors.html)

In [None]:
# raising an exception
raise RuntimeError("The s#!t hit the fan")

In [None]:
f = open('foo')

In [None]:
pass

In [None]:
try:
    f = open('foo')
except IOError as e:
    #pass
    print "There was an I/O Error"
    print "file not found: {0}".format(e.filename)

# Exceptions

* When an exception occurs, the interpreter stops executing statements in the try block and looks for an except clause that matches the exception
* If one is found, control is passed to the first statement in the except clause. 
* After the except clause is executed, control continues after the try-except block. 
* If not, the exception is propagated up to the block of code in which the try statement appeared. 
* If an exception works its way up to the top level of a program without being caught, the interpreter aborts with an error message. 
* The optional **as var** modifier to the **except** statement supplies the name of a variable in which an instance of the exception
* Exception handlers can examine this value to find out more about the cause of the exception.
* Multiple exception handling blocks are specified using multiple **except** clauses
* A single handler can catch multiple exceptions
* Use **pass** to ignore an exception 
* To catch all exceptions except those related to program exit, use **Exception**. Make sure to report accurate information so you can debug the issue later.

In [None]:
# Multiple exception handling blocks are specified using multiple clauses

TEST_EXCEPTIONS = [IOError, TypeError, NameError, ZeroDivisionError]

def test_exceptions(i):
    try:
        raise TEST_EXCEPTIONS[i]        
    except IOError as e:
        # handle IOError
        print "handling IOError"
    except TypeError as e:
        # handle TypeError
        print "handling TypeError"
    except NameError as e:
        # handle NameError
        print "handling NameError"
    except ZeroDivisionError:
        # use pass to ignore exceptions
        pass

In [None]:
enumerate?

In [None]:
for i, x in enumerate(TEST_EXCEPTIONS):
    test_exceptions(i)

In [None]:
def test_exceptions_one_handler(i):
    try:
        raise TEST_EXCEPTIONS[i]("Exception Message")        
    except (IOError, TypeError, NameError) as e:
        # handle NameError
        print "handling more than one exception here"
        print e.message
    except ZeroDivisionError:
        # use pass to ignore exceptions
        print "handling zero div error"

In [None]:
test_exceptions_one_handler(3)

In [None]:
# To catch all exceptions except those related to program exit, use **Exception**. 
# Make sure to report accurate information so you can debug the issue later.
def test_catch_all(i):
    try:
        test_exceptions(i)
    except Exception as e:
        print "catch anything"

In [None]:
test_catch_all(4)

# Exceptions 

* The **try** statement supports an **else** clause after the last **except**. 
* The **else** clause is executed if no exception is raised. You can re-raise by calling **raise** without argument
* The **try** statement also supports a **finally** clause. It is executed regardless if there is an exception or not. 
* The code in **finally** will always be executed regardless if there is an error or not (useful to manage resources, i.e. files).
* [Python Exception Handling Techniques](http://www.doughellmann.com/articles/how-tos/python-exception-handling/index.html)

In [None]:
def test_reraise(i):
    try:
        raise TEST_EXCEPTIONS[i]        
    except IOError as e:
        # handle IOError
        print "handling IOError"
    except TypeError as e:
        # handle TypeError
        print "handling TypeError"
    except NameError as e:
        # handle NameError
        print "handling NameError"
        raise
    except ZeroDivisionError:
        # use pass to ignore exceptions
        pass

In [None]:
test_reraise(0)

In [None]:
test_reraise(2)

In [None]:
def try_file(file_path):
    try:
        f = open(file_path, 'r')
        #raise NameException("bla")
    except IOError as e:
        print 'Unable to open myfile: {0}'.format(e.filename)
    else:
        data = f.read()
        print "closing the file"
        f.close
    finally:
        print "we are running this regardless..."

In [None]:
try_file("apapapap")

# Built-in Exceptions

[Built-in Exceptions](http://docs.python.org/library/exceptions.html)      

In [None]:
from IPython.display import Image
Image("builtinexceptions.JPG")

# Defining New Exceptions

* All exceptions are defined in terms of classes
* Create a new exception as a new class that inherits from Exception
* Values supplied with the **raise** statement are used as arguments to the exception's constructor
* [User-defined Exceptions](http://docs.python.org/tutorial/errors.html#user-defined-exceptions)

In [None]:
class DeviceError(Exception):
    def __init__(self, errno, msg):
        self.args = (errno, msg)
        self.errno = errno
        self.msg = msg
        
try:
    raise DeviceError(22, 'GIPB Error')
except DeviceError as e:
    print "There was a device error => {0}".format(e.msg)

# Reading and Writing Files

* Reading and writing files is done using the function **open** 
* It follows the same semantics as old C **fopen**

In [None]:
file?

In [None]:
f = open('myfile', 'w') # open file for writing
f.write("this is a test\n") # write something to the file
f.write("something else\n") # write something else
f.close() # close the file

In [None]:
f = open('myfile', 'r') # open file for writing
a = f.read(5) # read 5 bytes from a file
print a

In [None]:
a = f.readline()  # read one line (until \n)
print a

In [None]:
l = f.readlines()  # read all the lines and return a list
print l
f.seek(0)
# read all the lines in a loop
for line in f:
    print line
f.close()

# Reading and Writing Files

* It is a good practice to use the **with** keyword when dealing with files. 
* The file will be properly closed after its suite finishes, even if an exception is raised on the way.

In [None]:
with open('myfile', 'r') as f:
    # process the file here 
    read_data = f.read()
    print read_data

# More about reading files

* If you want to process files efficiently use the [linecache module](http://www.doughellmann.com/PyMOTW/linecache/index.html)
* If you need to create a temporary file, use the [tempfile module](http://www.doughellmann.com/PyMOTW/tempfile/index.html#module-tempfile)
* To work with compressed files (.tar, .bz2, .gz, .zip), use any of the [data compression modules](http://www.doughellmann.com/PyMOTW/compression.html)

# Reading CSV files

* To read CSV files use the [csv module](http://docs.python.org/library/csv.html). Here are some examples [PyMOTW csv](http://www.doughellmann.com/PyMOTW/csv/index.html#module-csv)
* The **csv** module gets a bad rap for being inefficient, different modules implement their own version of csv parsing like **numpy.loadtxt**
* If you need to read a csv file in a context of a library, check first if the library includes a csv reader
* **pandas** has a very efficient csv parser: [A new high performance, memory-efficient file parser engine for pandas](http://wesmckinney.com/blog/?p=543)

# Serialization in Python

* **Serialization** is the process of converting a data structure or object into a format that can be stored and materialized at a later time
* **Deserialization** is the opposite process (materializing)
* In theory, serialization allows to create an identical clone of the original object
* For objects with references, i.e. shallow copies, the process of serialization is not straighforward. 

# Serialization in Python

* Python uses the **pickle** module to serialize Python objects into a stream of bytes suitable for storing in a file, transferring across a network, or placing in a database. 
* The process is called pickling, serializing, marshalling, or flattening. 
* The resulting byte stream can also be converted back into a series of Python objects using an unpickling process.
* **cPickle** is another module with the same interface much faster than **pickle** but you can't subclass it
* If subclassing is not important, use **cPickle**
* Use **pickle.load**, **pickle.loads**, **cPickle.Pickler** to serialize objects
* Use **pickle.dump**, **pickle.dumps**, **cPickle.Unpickler** to materialize objects
* [http://www.doughellmann.com/PyMOTW/pickle/index.html](http://www.doughellmann.com/PyMOTW/pickle/index.html)

In [None]:
# Here is an example
try:
    import cPickle as pickle
except:
    import pickle
import pprint

In [None]:
print 'DATA:'
data = [ { 'a':'A', 'b':2, 'c':3.0 } ]
pprint.pprint(data)

In [None]:
print 'PICKLE:'
data_string = pickle.dumps(data)
print data_string

In [None]:
print 'AFTER:'
data2 = pickle.loads(data_string)
pprint.pprint(data2)

In [None]:
# Better implementation
from functools import total_ordering

@total_ordering
class BetterBBPlayer(object):
    def __init__(self, name, knick, rings):
        self.name = name
        self.knick = knick
        self.rings = rings
            
    def __str__(self):
        return "I'm {0}".format(self.knick)

    def __repr__(self):
        return "I'm a baseball player named {0}, knicknamed {1}".format(self.name, self.knick)
        
    def __lt__(self, other):
        return self.rings < other.rings
        
    def __eq__(self, other):
        return self.rings == other.rings

    def __len__(self):
        return self.rings 

In [None]:
dj = BetterBBPlayer("DJ", "The Captain", 5)
yogi = BetterBBPlayer("Berra", "Yogi", 13)

In [None]:
print dj, yogi

In [None]:
PICKLE_FILE = "pickle_bbplayers.dat"
out_s = open(PICKLE_FILE, 'wb')
try:
    # Write to the stream
    print 'writing to a file...'
    pickle.dump(dj, out_s)
    pickle.dump(yogi, out_s)
finally:
    out_s.close()

In [None]:
in_s = open(PICKLE_FILE, 'rb')
try:
    try:
        print 'reading from file...'
        djcopy = pickle.load(in_s)
        ycopy = pickle.load(in_s)
    except EOFError:
        print 'Oops, error reading from file.'
    else:
        print djcopy
    print 'SAME?:', (dj is djcopy)
    print 'EQUAL?:', (dj == djcopy)
finally:
    in_s.close()
print ycopy

# Reading Excel (if you have to)

* There are currently two popular modules to read excel files
* [xlrd](https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html) lets you read Excel XLS files 97-2003 formats.
* [openpyxl](http://packages.python.org/openpyxl/) lets you read Excel XLSX files 2007 up formats.
* **pandas** has direct support for reading Excel files into dataframes using the modules specified above
* A new free tool is [xlwings](http://xlwings.org/). It works in Windows and in OS X
* Another option for interacting with Excel files is to use [DataNitro](http://datanitro.com/)

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import norm
from xlwings import Workbook, Range, Chart

In [None]:
wb = Workbook()  # Creates a reference to the calling Excel file

In [None]:
Range('A1').value = ['Foo 1', 'Foo 2', 'Foo 3', 'Foo 4']

In [None]:
Range('A2').value = [10, 20, 30, 40]

In [None]:
Range('A1').table.value  # Read the whole table back

In [None]:
chart = Chart().add(source_data=Range('A1').table)

In [None]:
Range("A8").value = norm.rvs(size=100).reshape(100, 1)

In [None]:
norm_df = pd.DataFrame({'x': norm.rvs(size=1000000)})

In [None]:
Range("C8").value = norm_df

In [None]:
bins = np.linspace(min(norm_df.x), max(norm_df.x), 15)

In [None]:
x_bins = norm_df.x.groupby(pd.cut(norm_df.x, bins))

In [None]:
Range("F8").value = x_bins.count()

In [None]:
hist_chart = Chart().add(source_data=Range('G8').table)