In [1]:
import numpy as np
from PIL import Image, ImageEnhance
import matplotlib.pyplot as plt
import random
from IPython.display import HTML, display
from assign2_utils import load_train_data, load_test_data, flatten
from assign2_utils1 import Layers, Linear, NonLinear, Dropout, args

%matplotlib inline
%load_ext autoreload
%autoreload 2

#np.random.seed(1)


# Data Augumentation
Generally, deep learning models thrive on data. More the data (in the order of hundreds of millions), better can the models perform. It is easy to acquire large data in unsupervised scenario, for eg by scrapping from web. However, in supervised scenario, acquiring large amount of data along with labels is mostly infeasible. Labels or groundtruth is generally marked by human experts and expecting them to committ to mark labels for huge data is not always possible. In such scenario, data augumentation will be very helpful. Data augumentation are set of techniques that can artificially bloat up the size of the data. For eg, horizontally flipping a cat image will still be a valid cat image. We can scale the cat image or rotate the cat image, say within -15 degrees to 15 degrees, to obtain more valid cat images. Some examples of artificially obtainined valid cat images is shown below. 

In [2]:
display(HTML("<table><tr><td><img src='orig_img1.png'></td><td><img src='rot_img.png'></td></tr></table>"))
display(HTML("<table><tr><td><img src='orig_img2.png'></td><td><img src='hf_img.png'></td></tr></table>"))

So, if we want to horizontally flip the images and bloat up the training size, do we offline flip all the training images? Of course this will double the size. However, there is another idea. During every epoch, we can  randomly flip or not flip the image. Suppose we have 10 images in the training set. Offline flipping will double the size to 20 images that can be used during training across all epochs. If we assume an image may or may not be flipped (say with Bernoulli probability p) during an epoch, then the set of images seen during an epoch is a sample from set of 1024 sets of 10 images (with binomial probability) which is much larger than the 20 images obtained by offline flipping. This is the strategy we will follow wherever feasible. I say wherever feasible because the aforementioned scenario may not always be possible. For eg, let's say the size of the input image should be 128 x 128 but what is available is of size 144 x 144. So we can augument the data by cropping 128 x 128 from 144 x 144. If we fix our croppings to be from top left, top right, centre, bottom left and bottom right, then we will get 5 crops of size 128 x 128 each from every 144 x 144 image. There is no randomness here. Our data size will increase by factor of 5.  

Below is the class definition for performing random horizontal flip. It is a callable class. For dealing with images, we will use PIL package. So, in the below code it is assumed that the input 'img' in the '__call__' method is a PIL object (i.e image opened by PIL package). 

In [3]:
class RandomHorizontalFlip(object): #inherits from python's object class
    def __init__(self, prob = 0.5):
        self.prob = prob
        
    def __call__(self, img): # a callable class
        if np.random.rand() < self.prob:
            return img.transpose(Image.FLIP_LEFT_RIGHT)
        else:
            return img

## Exercise
You will write a class with name __RandomRotate__ to randomly rotate the given image. Obviously, angle to rotate the image (in degrees) and the probability to rotate the image needs to be specified to initialize an object of __RandomRotate__ type. It should be a callable class. Angle to rotate the image can be either specified as a single number like 5 or as a list or tuple of length 2 like [-3, 5] or (-3, 5). In the former case, an angle in degrees will be randomly chosen between -5 and 5. In the latter case, an angle in degrees will be randomly chosen between -3 and 5. Note that there are two levels of randomness involved here. First, whether to randomly rotate the image or not. Second, if image is to be randomly rotated, then choosing an angle randomly. 

In [4]:
# Complete the class RandomRotate here
print("class RandomRotate")

class RandomRotate(object):
    def __init__(self, prob=0.6, angle=[-5, 5]):
        super().__init__()
        self.prob = prob
        if isinstance(angle, list):
            self.low = angle[0]
            self.high = angle[1] + 1
        if isinstance(angle, tuple):
            self.low = angle[0]
            self.high = angle[1] + 1
        if isinstance(angle, int):
            if angle < 0:
                self.low = angle
                self.high = -1 * angle
            else:
                self.low = -1 * angle
                self.high = angle
        
    def __call__(self, img):
        if np.random.rand() < self.prob:
            return img.rotate(np.random.randint(low=self.low, high=self.high))
        else:
            return img

class RandomRotate


Since multiple transforms can be applied to an image sequentially, we will write a callable class called __Compose__ that composes or applies the sequence of transforms sequentially on the input. The definition of the class is given below.

In [5]:
class Compose(object):
    def __init__(self, transforms):# transforms is a sequence of tranforms; 
                                # For eg, transforms = [RandomHorizontalFlip(0.6), RandomRotate(0.5, [-5, 6])]
        self.transforms = transforms
        
    def  __call__(self, img): # img is a PIL object; i.e image opened by PIL package
        for t in self.transforms:
            img = t(img)
        return img

Let's consider the following model along with the train, test and main functions given below and run it.

In [6]:
class Model(Layers):
    def __init__(self, in_features):
        super().__init__()        
        self.fc1 = Linear(in_features, 20)
        self.relu1 = NonLinear('ReLU')
        self.fc2 = Linear(20, 7)
        self.relu2 = NonLinear('ReLU')
        self.fc3 = Linear(7, 5)
        self.relu3 = NonLinear('ReLU')  
        self.fc4 = Linear(5, 1)        
        self.sigmoid = NonLinear('Sigmoid')
        
    def forward(self, x):
        x = self.fc1(x) # Note that we made classes callable which automatically calls forward method
                        # That's why we could call fc1(x) instead of fc1.forward(x). Calls below 
                        # are in similar line.
                        # we could call fc1.forward(x) also.
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        x = self.relu3(x)
        x = self.fc4(x)          
        x = self.sigmoid(x)
        return x
    
    def loss(self, output, y):
        m = output.shape[1]
        L = -(1./m) * np.sum(y*np.log(output) + (1-y)*np.log(1-output)) # compute loss
        for att in self.__dict__:
            if hasattr(att, 'reg_penalty'):
                L += att.reg_penalty
        return L
    
    def backward(self, output, y):
        epsilon = 1e-6
        m = output.shape[1]
        d_output = (1./m) * (output-y) * (1./((output*(1-output))+epsilon)) # compute da        
        dz = self.sigmoid.backward(d_output)        
        dx = self.fc4.backward(dz)          
        dz = self.relu3.backward(dx)
        dx = self.fc3.backward(dz)
        dz = self.relu2.backward(dx)
        dx = self.fc2.backward(dz)
        dz = self.relu1.backward(dx)
        dx = self.fc1.backward(dz)  
    
    def update_params(self, learning_rate = 0.005):
        self.fc1.update_params(learning_rate)
        self.fc2.update_params(learning_rate)
        self.fc3.update_params(learning_rate)
        self.fc4.update_params(learning_rate)        

In [7]:
def train(model, x, y, transforms = None):   
    for i in range(args.miter):
        x_transformed = np.empty_like(x, dtype = np.uint8)
        if transforms:
            for j in range(x.shape[0]):
                if y[j]:                    
                    img = Image.fromarray(x[j].astype(np.uint8))
                    x_transformed[j] = np.asarray(transforms(img))
                else:
                    x_transformed[j] = x[j]
        else:
            x_transformed = x
        
        x_transformed = flatten(x_transformed)
        x_transformed = x_transformed / 255.
        output = model(x_transformed) # model is a callable object with call to its forward method.
                          # we could also have written the rhs as model.forward(x)
        del x_transformed
        L = model.loss(output, y)
        model.backward(output, y)
        model.update_params(args.alpha)
        if not i%args.print_freq: # print loss every 100 iterations
                print(f'Loss at iteration {i}:\t{np.asscalar(L):.4f}')
                
def test_model(model, x, y):
    predictions = model(x)
    predictions[predictions > 0.5] = 1
    predictions[predictions <= 0.5] = 0
    acc = np.mean(predictions == y)
    acc = np.asscalar(acc)
    return acc

In [8]:
def main(): # main function to train and test the model    
    
    global args
    # load train data
    x, y = load_train_data()
    print(x.shape)    
    
    transforms = Compose([RandomHorizontalFlip(), RandomRotate(angle = [-5, 5])])
    
    #instantiate model
    my_model = Model(64*64*3)
    # train the model
    train(my_model, x, y, transforms)
    
    # test the model
    x, y = load_train_data()
    x = flatten(x)
    x = x/255. # normalize the data to [0, 1]
    #print(f'train accuracy: {test_model(my_model, x, y) * 100:.2f}%')

    x, y = load_test_data()
    x = flatten(x)
    x = x/255. # normalize the data to [0, 1]
    print(f'test accuracy: {test_model(my_model, x, y) * 100:.2f}%')
    
    return my_model
    
if __name__ == '__main__':
    my_model = main()

(209, 64, 64, 3)
Loss at iteration 0:	0.6912
Loss at iteration 100:	0.6302
Loss at iteration 200:	0.5829
Loss at iteration 300:	0.5422
Loss at iteration 400:	0.5213
Loss at iteration 500:	0.4627
Loss at iteration 600:	0.5156
Loss at iteration 700:	0.2571
Loss at iteration 800:	0.2610
Loss at iteration 900:	0.3994
Loss at iteration 1000:	0.1316
Loss at iteration 1100:	0.0970
Loss at iteration 1200:	0.1901
Loss at iteration 1300:	0.0597
Loss at iteration 1400:	0.6062
Loss at iteration 1500:	0.1154
Loss at iteration 1600:	0.0320
Loss at iteration 1700:	0.0201
Loss at iteration 1800:	0.0180
Loss at iteration 1900:	0.0144
test accuracy: 84.00%
