We extract the outputs of each layer from the CaffeNet model based on the network architecture of Krizhevsky et al. for ImageNet. The point of this is to see what the CNN will do when given:
    a) conflicting textual and visual information
    b) a digital logo of a brand
    c) the same logo but in a "real-world" setting

In [4]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

plt.rcParams['figure.figsize'] = (10,10)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

caffe_root = '/home/rips_tc/caffe/'
import os
import caffe

caffe.set_mode_cpu()
net = caffe.Net(caffe_root + 'models/google_logonet/deploy.prototxt',
                caffe_root + 'models/google_logonet/logonet.caffemodel',
                caffe.TEST)

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', np.load(caffe_root + 'data/logos/logos_mean_deploy.npy').mean(1).mean(1)) # mean pixel--read documentation
transformer.set_raw_scale('data', 255)  # the reference model operates on images in [0,255] range instead of [0,1]
transformer.set_channel_swap('data', (2,1,0)) #should the channels be swapped for googlenet?

logo_labels = caffe_root + 'data/logos/index-brand.txt'
labels = np.loadtxt(logo_labels, str, delimiter='\t')

Now, for the images. We'll also process all three images at once, so we set the batch to 3.

In [5]:
net.blobs['data'].reshape(3, 3, 224, 224)
conflict = caffe_root + 'data/logos/images/netflix/image_2.jpg'
digital = caffe_root + 'data/logos/images/lenovo/image_33.jpg'
real = caffe_root + 'data/logos/images/lenovo/image_35.jpg'

def top_results(image_src, num_results=3):
    net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image(image_src))
    out = net.forward()
    top_k = net.blobs['prob'].data[0].flatten().argsort()[-1:-(num_results+1):-1]
    print labels[top_k]

%timeit top_results(conflict)

['105 netflix' '166 youtube' '7 android']
['105 netflix' '166 youtube' '7 android']
['105 netflix' '166 youtube' '7 android']
['105 netflix' '166 youtube' '7 android']
1 loops, best of 3: 1.9 s per loop
