# Week 8 Homework: Classification of Google Street View Images

As Timnit talked about in her guest lecture, "socioeconomic attributes such as income, race, education, and voting patterns can be inferred from cars detected in Google Street View images using deep learning." This week, we'll be working on classifying car makes that appear in Google Street View images to produce this kind of visual census data.

Check out Timnit's paper [here](https://www.pnas.org/content/114/50/13108) and you can find our data [here](https://ai.stanford.edu/~tgebru/car_data.html).

Before you run the below cell, be sure to make sure that you are using GPU (Edit->Notebook settings) because we will be training larger networks than before.

In [0]:
import requests, zipfile, io
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from PIL import Image
from io import BytesIO

import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50
from keras.layers import GlobalAveragePooling2D, Dense
from keras import optimizers
from keras.models import Model

In [0]:
# download annotations
r = requests.get("http://web.stanford.edu/class/cs21si/resources/timnit_resources.zip")
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()

In [0]:
# download images
r = requests.get("https://www.dropbox.com/s/uxmng8gu1bg1dcz/unit4_week8_resources.zip?dl=1")
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()

## Part 1: Clean the Data

Here we'll be taking a subset of our dataset and cleaning it by removing unused columns and unlabeled examples. This is for the purposes of visualization in Part 2. Behind the scenes, the teaching team has used similar code to clean the entire dataset. Read through the code to understand what it is doing!

In [0]:
# read in training data
train_data = pd.read_csv('annotations/gsv_annos/gsv_train.txt', 
                         names = ['fname', 'bbox', 'group', 'class', 'big_enough', 'image_id'])

# drop unused columns
filtered_train_data = train_data.drop('class', axis = 1)
filtered_train_data = filtered_train_data.drop('big_enough', axis = 1)

# filter training data without class labels
filtered_train_data = filtered_train_data[filtered_train_data['group'] != -1]

filtered_train_data.head()

In [0]:
# read in annotations about car model and make
class_data = pd.read_csv('annotations/attribute_annos/group_id_car_atts.txt')
class_data.head()

In [0]:
# map group ID to car make
filtered_train_data['make'] = ''

for index, row in filtered_train_data.iterrows():
    group = row['group']
    make = class_data[class_data['group_id'] == int(group)]['make'].values
    filtered_train_data.at[index, 'make'] = make[0] 
    
filtered_train_data.head()

In [0]:
# remove entries with an uncommon make (fewer than 100)
uncommon_makes = filtered_train_data['make'].value_counts()[-19:].index
filtered_train_data = filtered_train_data.loc[~filtered_train_data['make'].isin(uncommon_makes)]

## Part 2: Visualize the Data

Just as in previous assignments, we begin by looking to better understand our data. Here, your job is to use the function *visualize_train_image* below to display images. You can find image file names in `filtered_train_data` above.

In [0]:
def make_url_from_fname(fname):
    base_url = 'http://imagenet.stanford.edu/geo/gsv_100k_unwarp'
    return os.path.join(base_url, fname)

def get_image_from_url(url):
    response = requests.get(url)
    return np.array(Image.open(BytesIO(response.content)), dtype=np.uint8)

def visualize_train_image(fname, bbox = True):
    url = make_url_from_fname(fname)
    im = get_image_from_url(url)
    plt.imshow(im)
    
    if bbox:
        entry = filtered_train_data.loc[filtered_train_data['fname'] == fname].iloc[0]        
        bbox = entry['bbox'].split()
        x1, y1, x2, y2 = [int(i) for i in bbox]
        rect = Rectangle((x1, y1), (x2 - x1), (y2 - y1), linewidth = 2, 
                         edgecolor = 'r', facecolor = 'none')
        plt.gca().add_patch(rect)

In [0]:
### BEGIN YOUR CODE ###

### END YOUR CODE ###

## Part 3: Exploring CNN Architectures 

At this point in the course, you've had the opportunity to construct and train your own convolutional neural networks with extra bells and whistles like pooling or batch normalization layers. Your CNNs were likely only a few layers deep, which constrains the representative power of the models. 

Thankfully, researchers around the world have been working on stacking more and more layers to create deeper architectures. However, early on, these researchers found that deeper networks actually performed worse. Why could this be?

The authors of an architecture called [ResNet](https://arxiv.org/abs/1512.03385) observed something simple: direct mappings are hard to learn. Instead of trying to learn an underlying mapping from $x$ to $f(x)$, we can learn the difference between the two, or the “residual.” Then, to calculate $f(x)$, we can just add the residual to the input. Say the residual is $r(x) = f(x) - x$. Now, instead of trying to learn $f(x)$ directly, our networks are trying to learn $r(x)+x$. You can check out this [blog post](https://towardsdatascience.com/an-intuitive-guide-to-deep-network-architectures-65fdc477db41) to learn more.

We'll be exploring a 50-layer ResNet CNN architecture in Keras in Part 3.

Before we jump into training our ResNet though, we need to load our data in with Keras data generators. Data generators are cool because they quickly iterate through your sets in batches. The batch size is a hyperparameter you can tune. We've put together that code for you. You can go ahead and play around with the batch size later when submitting your final model (note that batch sizes above 64 might cause memory errors, because very large batches will not fit on GPU).

In [0]:
### BEGIN YOUR CODE ###
batch_size = 32 
### END YOUR CODE ###

def get_data_generator(set_name, batch_size):
    datagen = ImageDataGenerator(rescale = 1./255, horizontal_flip = set_name == 'train')
    generator = datagen.flow_from_directory('data/' + set_name,
                                            target_size = (224, 224),
                                            batch_size = batch_size,
                                            class_mode = 'categorical')
    return generator
    
train_generator = get_data_generator('train', batch_size)
val_generator = get_data_generator('val', batch_size)
test_generator = get_data_generator('test', batch_size)

num_classes = 31

We can see that there are 31 classes and input images are 224x224 with 3 color channels (RGB).

Keras has an easy API for loading the ResNet-50. You can play around with the learning rate.

In [0]:
### BEGIN YOUR CODE ###
lr = 1e-3
### END YOUR CODE ###

def get_base_resnet_model(lr):
    model = keras.applications.ResNet50(include_top = True, weights = None, 
                                        classes = num_classes, input_shape=(224, 224, 3), 
                                        input_tensor = None)

    optim = optimizers.Adam(lr = lr)
    model.compile(optimizer = optim, loss = 'categorical_crossentropy', metrics=['accuracy'])

    return model
    
base_resnet_model = get_base_resnet_model(lr)
base_resnet_model.summary()

Now that we have our 50-layer architecture set up, let's train it for 1 epoch (to avoid long training times)! Refer to the [documentation](https://keras.io/models/model/#fit_generator) to fit the model to the data from your generators. This will take around 10 minutes (make sure to run on GPU)! The training script will print out loss and accuracies as it trains–note that accuracies will be lower than you've seen before because we have many more output classes and we are only training for one epoch.

In [0]:
base_resnet_model.fit_generator(train_generator,
                                steps_per_epoch = len(train_generator),
                                epochs = 1,
                                validation_data = val_generator,
                                validation_steps = len(val_generator))

## Part 4: Transfer Learning

Good job training your first super-deep network! For some historical context, ResNet was created for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). From the ImageNet [website](http://image-net.org/challenges/LSVRC/), "ILSVRC evaluates algorithms for object detection and image classification at large scale. One high level motivation is to allow researchers to compare progress... across a wider variety of objects -- taking advantage of the quite expensive labeling effort." In other words, ImageNet is a huge labeled dataset that researchers can use to benchmark their architectures. 

After ResNet-50 is trained on ImageNet, we can save the weights and use them to bootstrap another task with the same architecture. This technique is broadly called **transfer learning**. As discussed in this [blog post](https://machinelearningmastery.com/transfer-learning-for-deep-learning/), "In transfer learning, we first train a base network on a base dataset and task, and then we repurpose the learned features, or transfer them, to a second target network to be trained on a target dataset and task. This process will tend to work if the features are general, meaning suitable to both base and target tasks, instead of specific to the base task." In this case, the task is image classification on a different dataset.

In Part 4, you'll have the opportunity to use a ResNet initialized with **pretrained weights** and see how it does on your task. Note that the ImageNet dataset has 1000 classes but our dataset only has 31 classes, so we'll have to replace the last layer of our ResNet to reflect that difference. Again, you can play around with the learning rate.



In [0]:
### BEGIN YOUR CODE ###
lr = 1e-3
### END YOUR CODE ###

def get_pretrained_resnet_model(lr):
    # here, we pass in weights = 'imagenet' instead of weights = None
    pretrained_resnet = keras.applications.ResNet50(include_top = False, weights = 'imagenet', 
                                                    classes = num_classes, input_shape = (224, 224, 3), 
                                                    input_tensor = None)
    # replace last layer (including the pooling)
    h = GlobalAveragePooling2D()(pretrained_resnet.output)
    y_hat = Dense(num_classes, activation = 'softmax', name = 'fc1000')(h)
    
    model = Model(input = pretrained_resnet.input, output = y_hat)

    optim = optimizers.Adam(lr = lr)
    model.compile(optimizer = optim, loss = 'categorical_crossentropy', metrics=['accuracy'])

    return model
    
pretrained_resnet_model = get_pretrained_resnet_model(lr)
pretrained_resnet_model.summary()

Train your new pretrained model for 1 epoch. Refer to the code for *base_resnet_model* above to see how to use *fit_generator*!

In [0]:
### BEGIN YOUR CODE ###
pretrained_resnet_model.fit_generator(None)
### END YOUR CODE ###

## Part 5: Transfer Learning with Frozen Layers

As you might have noticed, training these 50-layer ResNets end-to-end takes quite a bit more time than our previous smaller models. One of the big benefits of transfer learning is that it can oftentimes speed up training. 

In CNNs, features are more generic in early layers and more original-dataset-specific in later layers. This means that the initial layers of the ResNet model are still useful for this new task. So, we can choose to keep our ImageNet weights in earlier layers and only finetune them in later layers. We do this by freezing earlier layers and making them untrainable. We thus introduce an additional hyperparameter: the number of frozen layers. Note that due to different definitions for what a "layer" means, model.layers contains 177 layers.

In [0]:
### BEGIN YOUR CODE ###
lr = 1e-3
num_freeze_layers = 30
### END YOUR CODE ###

def get_pretrained_frozen_resnet_model(lr, num_freeze_layers):
    pretrained_resnet = keras.applications.ResNet50(include_top = False, weights = 'imagenet', 
                                                    classes = 31, input_shape = (224, 224, 3), 
                                                    input_tensor = None)
    # replace last layer (including the pooling)
    h = GlobalAveragePooling2D()(pretrained_resnet.output)
    y_hat = Dense(num_classes, activation = 'softmax', name = 'fc1000')(h)
    
    model = Model(input = pretrained_resnet.input, output = y_hat)

    optim = optimizers.Adam(lr = lr)
    model.compile(optimizer = optim, loss = 'categorical_crossentropy', metrics=['accuracy'])
    
    print('Freezing %d of %d model layers...' % (num_freeze_layers, len(model.layers)))
    
    if not num_freeze_layers:
        for layer in model.layers[:num_freeze_layers]:
            layer.trainable = False
        for layer in model.layers[num_freeze_layers:]:
            layer.trainable = True

    return model
    
pretrained_frozen_resnet_model = get_pretrained_frozen_resnet_model(lr, num_freeze_layers)
pretrained_frozen_resnet_model.summary()

Train your new pretrained model with frozen layers for 1 epoch. Notice that training is a bit faster now, depending on how many layers you are freezing, since we are training fewer layers.

In [0]:
### BEGIN YOUR CODE ###
pretrained_frozen_resnet_model.fit_generator(None)
### END YOUR CODE ###

## Part 6: Evaluation

At this point, we've trained a base 50-layer ResNet and a pretrained ResNet initialized with ImageNets weights. We've also finetuned a pretrained ResNet in Part 5. After completing hyperparameter tuning above (and also training for more epochs, if you like), pick the best of your three models according to validation performance and run evaluation on your test set. Note that you do not have to wait for training to complete to get an idea of what hyperparameters are doing well–you can keep track of how quickly the loss decreases.

Remember: to avoid bias, only run evaluation on the test set once! Use validation accuracy (reported at the end of an epoch) to tune hyperparameters.




In [0]:
### BEGIN YOUR CODE ###

# Save your best model to best_model here

loss, acc = best_model.evaluate_generator(test_generator, len(test_generator))

print('Your best test loss was', loss)
print('Your best test accuracy was', acc)
### END YOUR CODE ###

## And that's a wrap!

The skills covered in this notebook are very critical for deep learning in practice, because machine learning engineers don't want to keep reinventing the wheel. Good work finishing up this assignment! 