# Overview

In this session we will build upon the fundamentals of Python coding and highlight several key high-level, more abstract concepts that will help improve your overall software development technique. The key concepts will include:

* reusable code
* debugging
* documentation and best practices

## Dataset download

To illustrate the use of these techniques, we will be building on the MNIST handwritten digits dataset classifier you started in the first assignment. Let us begin by downloading the full dataset now:

In [0]:
!git clone https://github.com/CAIDMRes/lecture_02
!unzip lecture_02/data.zip
!rm -r lecture_02
!ls

Let's take a quick look at loading the data:

In [0]:
# Loading a pickle (*.pkl) file
import pickle
x = pickle.load(open('x.pkl', 'rb'))

# x is a NumPy array with (flattened) image data
import numpy as np
print(type(x))
print(x.shape)

# y is a NumPy array with labels
y = pickle.load(open('y.pkl', 'rb'))
print(y.shape)
print(np.unique(y))
print(y[:10])

## NumPy

Let's reivew some basic NumPy functionality:

In [0]:
# Indexing files
print(x[0].shape)
print(x[:, 0].shape)

# Reshaping
im = x[0].reshape(28, 28)
print(im.shape)

## Pylab

Let's review some basic pylab drawing functionality:

In [0]:
import pylab

# Draw single image
pylab.imshow(im)
pylab.axis('off')
pylab.show()

# Draw many images
n = 9 
rows = 3
columns = 3

fig = pylab.figure(figsize=(3, 3))
for i in range(n):
    fig.add_subplot(rows, columns, i + 1)
    pylab.axis('off')
    pylab.imshow(x[i].reshape(28, 28))
pylab.show()

# Reusable Code

A fundamental principle common to all programming languages and styles is the concept of "don't repeat yourself" (e.g. **DRY**). This is important so that your code is easy to read and importantly maintain over time. Several useful programming constructs described below will help you accomplish this in your own projects.

## Methods

Here we build on the idea of methods introduced in the first session. 

### Return value

The return value is the final end result after execution of a given method. By default, without a return statement, the Python variable `None` is returned. Let us see a simple example:

In [0]:
import pickle, numpy as np

def load_data(filename):
    """
    This method loads a pickled Python file
    
    """
    arr = pickle.load(open(filename, 'rb'))
    
    return arr

In order to run this method do the following:

In [0]:
filename = 'x.pkl'

x = load_data(filename)

### Parameters

By default every single parameter in a method call must be defined. If you wish set a default value for a parameter (thus making an explicit variable declaration optional) you can do so by setting it's value in the method signature:

In [0]:
def load_data(filename, verbose=True):
    """
    This method loads a pickled Python file
    
    """
    if verbose:
        print('Loading ' + filename)
    
    arr = pickle.load(open(filename, 'rb'))
    
    return arr

Running the method:

In [0]:
filename = 'x.pkl'

x = load_data(filename)

Another useful feature is the ability to explicity name variables during the method call. Not only does this make your code more readable, it allows you to pass variables in any arbitrary order regardless of method signature.

In [0]:
x = load_data(verbose=False, filename=filename)

## Classes (Object-Oriented Progamming)

Sometimes it is useful to group a collection of methods (and variables) into one set. In programming, these collections of related code are referenced under a construct known as as **class**. Let us define a class as follows:

In [0]:
class Digits():
    
    def load_data(self, filename, verbose=True):
        """
        This method loads a pickled Python file

        """
        if verbose:
            print('Loading ' + filename)

        arr = pickle.load(open(filename, 'rb'))

        return arr

In this very simple class definition, we have simply put our previous `load_data` method underneath the new `Digits` class. The one important change is that we need to add an additiona parameter called `self` at the start of our method definition (more on why later). Of course we can add more methods as needed, but for now let's see how to create a new **object** from this class:

In [0]:
digits = Digits()

That's it! Now we can call our method (which now "lives" inside our class) by performing the following:

In [0]:
x = digits.load_data(filename, verbose=True)

### self

The name `self` is a special variable referencing the class itself. By attaching variables to `self` you have now put that new variable inside the class. For example if I define a new variable `self.arr = 1` then the variable `arr` is attached to `digits` (literally addressed as `digits.arr` **not** plain `arr` anymore). Let us see an example here:

In [0]:
class Digits():
    
    def load_data(self, filename, verbose=True):
        """
        This method loads a pickled Python file

        """
        if verbose:
            print('Loading ' + filename)

        self.arr = pickle.load(open(filename, 'rb'))

With this new definition, the opend array is now attached to the `digits` object:

In [0]:
digits = Digits()
digits.load_data(filename, verbose=True)
print(digits.arr)

### __init__() method

The `__init__()` method is a special method that all Python classes have. This method represents instructions that the class should execute immediately upon creation (instantiation). Let's try an example here:

In [0]:
class Digits():
    
    def __init__(self):
        """
        This method initializes the class
        
        """
        self.x = None
        self.y = None
        
    def load_data(self, filename, verbose=True):
        """
        This method loads a pickled Python file

        """
        if verbose:
            print('Loading ' + filename)

        self.x = pickle.load(open(filename, 'rb'))


Here we will initialize two class variables `x` and `y` with any newly instantiated `Digits` objects. They will be set to `None` (a special Python placeholder representing an empty variable).

In [0]:
digits = Digits()
print(digits.x, digits.y)

## Exercises

### Showing images

#### 1. Complete the following method to show an image


In [0]:
import pylab
import numpy as np

def show(im):
    """
    This method uses pylab to show a given digit image
    
    :params
    
      (np.array) im : a 2D input representing MNIST digit image
    
    """
    # Resize im to 28 x 28
    im = np.reshape(im, (28, 28))
    
    # Pylab commands
    pylab.imshow(im)
    pylab.axis('off')
    pylab.show()
    
# Test your method
im = x[0]
show(im)

### 2. Modify the method to show an arbitrary number of images

Use the `pylab.figure(figsize)` method to draw as many images you pass into the method. 

In [0]:
import pylab

def show(im):
    """
    This method uses pylab to show a given digit image
    
    :params
    
      (np.array) im : a 2D input representing MNIST digit image
    
    """
    # Resize im to 28 x 28
    im = np.reshape(im, (-1, 28, 28))
    
    # Pylab commands
    n = im.shape[0]
    
    rows = np.ceil(np.sqrt(n)).astype('int')
    columns = rows

    fig = pylab.figure(figsize=(rows, columns))
    
    for i in range(n):
        fig.add_subplot(rows, columns, i + 1)
        pylab.axis('off')
        pylab.imshow(im[i])
        
    pylab.show()
    
# Test your method
im = x[:9]
show(im)


#### 3. Add the method to the existing `Digits` class

In [0]:
class Digits():
    
    def __init__(self):
        """
        This method initializes the class
        
        """
        self.x = None
        self.y = None
        
    def load_data(self, filename, verbose=True):
        """
        This method loads a pickled Python file

        """
        if verbose:
            print('Loading ' + filename)

        self.x = pickle.load(open(filename, 'rb'))
    
    def show(self, im):
        """
        This method uses pylab to show a given digit image

        :params

          (np.array) im : a 2D input representing MNIST digit image

        """
        # Resize im to 28 x 28
        im = np.reshape(im, (-1, 28, 28))

        # Pylab commands
        n = im.shape[0]

        rows = np.ceil(np.sqrt(n)).astype('int')
        columns = rows

        fig = pylab.figure(figsize=(rows, columns))

        for i in range(n):
            fig.add_subplot(rows, columns, i + 1)
            pylab.axis('off')
            pylab.imshow(im[i])

        pylab.show()

# Test your class
digits = Digits()
digits.load_data(filename, verbose=True)
digits.show(digits.x[100:200])

### Classification Algorithm

#### 1. Random classifier

Write a method that randomly guesses whether an image represents the digit 0 or 1.

Hint, use the `np.random.randint()` method. Inspect the `docstring` associated with this function using the `tab-complete` method to understand the parameters. 

In [0]:
def classify(im):
    """
    This method classifies an image digit randomly
    
    :params

      (np.array) im : a 2D input representing MNIST digit image
      
    """
    # Return random digit either 0 or 1
    return np.random.randint(2)

#### 2. Add this method to your `Digits` class from abovem


In [0]:
classify(im=x[0])

### Accuracy Assessment

#### 1. Write an algorithm to assess algorithm accuracy

In the `load_data()` method:

* load the `y.pkl` file

In the `test_algorithm()` method:

* loop through each image and label in dataset
* test if the image is either a 0 or 1 via ground-truth label
* if True, pass each `image` into the `classify()` method
* accumulate total number of correct guesses
* accumulate total number of guesses
* print accuracy

In [0]:
class Digits():
    
    def __init__(self):
        """
        This method initializes the class
        
        """
        self.x = None
        self.y = None
        
    def load_data(self, x_file, y_file, verbose=True):
        """
        This method loads a pickled Python file

        """
        if verbose:
            print('Loading ' + filename)

        self.x = pickle.load(open(x_file, 'rb'))
        self.y = pickle.load(open(y_file, 'rb'))
    
    def show(self, im):
        """
        This method uses pylab to show a given digit image

        :params

          (np.array) im : a 2D input representing MNIST digit image

        """
        pass
    
    def classify(self, im):
        """
        This method classifies an image digit randomly

        :params

          (np.array) im : a 2D input representing MNIST digit image

        """
        # Return random digit either 0 or 1
        return np.random.randint(2)
    
    def test_algorithm(self):
        """
        This method tests the accuracy of classification
        
        """
        right = 0
        wrong = 0
        
        for n in range(self.x.shape[0]):
            
            x = self.x[n] # image
            y = self.y[n] # label
            
            if y in [0, 1, 2, 3]:
                
                y_pred = self.classify(x)
                if y == y_pred:
                    right = right + 1
                
                else:
                    wrong = wrong + 1
        
        print( right / (right + wrong) )
        

# Test your class
digits = Digits()
digits.load_data('x.pkl', 'y.pkl', verbose=True)
digits.test_algorithm()

## Inheritance

One of the most powerful features of classes in object-riented programming is the concept of inheritance. The idea is that one first makes a generic base class containing code that may be reused in a number of similar cases. Then, for every specific project instance, you just need to define and/or replace a few methods to accomplish a task. For example, in our `Digits` class, we have already created a common collection of methods to load data, display images, classify digits and test algorithm accuracy. As we continue to iterate and test out new ideas, the `classify()` method can simply be updated with a new type of algorithm, while the rest of the code can be reused to do all the other routine housekeeping tasks.

Let us look at an example of overloading (e.g. replacing) the `classify()` method:

In [0]:
class BetterDigits(Digits):
    
    def classify(self, im):
        """
        This method replaces our previous method with a better one
        
        """
        pass

See how compact and easy that was! Once you've written a solid base class, the rest of your work is now much easier. Go ahead and add your `classify()` method from the first assignment to this new (better) class:

In [0]:
class BetterDigits(Digits):
    
    def classify(self, im):
        """
        This method replaces our previous method with a better one
        
        """
        pass

Now let's run the new class the test your algorithm accuracy:

In [0]:
# Test your class
digits = Digits()
digits.load_data(filename, verbose=True)
digits.test_accuracy()