## Usage of Places Neural Network implemented in Caffe in order to get new unknown features from each listing's images

Doc: http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb


In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# set display defaults
plt.rcParams['figure.figsize'] = (10,10) # large images
plt.rcParams['image.interpolation'] = 'nearest'  # don't interpolate: show square pixels
plt.rcParams['image.cmap'] = 'gray'  # use grayscale output rather than a (potentially misleading) color heatmap

This following code gets all Airbnb images' directory paths by iterating over the metadata-json.txt file.

In [2]:
import json

metafile = '/Users/Pere/Desktop/scraper/metadata-json.txt'
with open(metafile,'r') as f:
    data = json.loads(f.read())

In [3]:
def getPaths(data):
    """ 
    This code gets all image paths from a previously parsed json format dictionary into a list. 
    Params:
            data: 
                must be a json format dict coming from metadata-json.txt file 
    """
    tmp = []
    for k in data.keys():
        for path in data[k]['img_paths']:
            tmp.append(path)
    return tmp

paths = getPaths(data)

paths[1]

u'/Users/Pere/Desktop/scraper/airbnb_imgs/8000000_to_8999999/8000000_to_8099999/8010000_to_8019999/8013000_to_8013999/8013000_to_8013099/8013090_to_8013099/b34a5b36_original.jpg\n'

In [4]:
# This following code makes you able to change each directory path in case of having the images folder in your pc.
# Note that an example is 
# /Users/Pere/Desktop/scraper/airbnb_imgs/8000000_to_8999999/8000000_to_.../b34a5b36_original.jpg
# So you only have to give to the function as a parameter the path until airbnb_imgs. For example:
# newdir = "/new/user/system/path/" 
# and then the function itself will complete the path properly.

# path parameter is the list got by the getPaths function.

def changeDirs(newdir, paths):
    old_path = "/Users/Pere/Desktop/scraper/"
    new_list = []
    for path in paths:
        new_list.append(path.replace(old_path, newdir))
    return new_list

newdir = "/Users/Hola/"
n = changeDirs(newdir,paths)

print(paths[1])
print(n[1])

/Users/Pere/Desktop/scraper/airbnb_imgs/8000000_to_8999999/8000000_to_8099999/8010000_to_8019999/8013000_to_8013999/8013000_to_8013099/8013090_to_8013099/b34a5b36_original.jpg

/Users/Hola/airbnb_imgs/8000000_to_8999999/8000000_to_8099999/8010000_to_8019999/8013000_to_8013999/8013000_to_8013099/8013090_to_8013099/b34a5b36_original.jpg



### Caffe Wrapper using Places neuronal network

Code from Doc link shown at the header.

- Load caffe

In [None]:
# The caffe module needs to be on the Python path;
#  we'll add it here explicitly.
import sys
caffe_root = '../'  # this file should be run from {caffe_root}/examples (otherwise change this line)
sys.path.insert(0, caffe_root + 'python')

import caffe
# If you get "No module named _caffe", either you have not built pycaffe or you have the wrong path.

- If needed, download the reference model ("CaffeNet", a variant of AlexNet).

In [None]:
import os
if os.path.isfile(caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'):
    print 'CaffeNet found.'
else:
    print 'Downloading pre-trained CaffeNet model...'
    !../scripts/download_model_binary.py ../models/bvlc_reference_caffenet

#### 2. Load net and set up input preprocessing

Set Caffe to CPU mode and load the net from disk.

In [None]:
caffe.set_mode_cpu()

model_def = caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt'
model_weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'

net = caffe.Net(model_def,      # defines the structure of the model
                model_weights,  # contains the trained weights
                caffe.TEST)     # use test mode (e.g., don't perform dropout)

- Set up input preprocessing. (We'll use Caffe's caffe.io.Transformer to do this, but this step is independent of other parts of Caffe, so any custom preprocessing code may be used).

    Our default CaffeNet is configured to take images in BGR format. Values are expected to start in the range [0, 255] and then have the mean ImageNet pixel value subtracted from them. In addition, the channel dimension is expected as the first (outermost) dimension.

    As matplotlib will load images with values in the range [0, 1] in RGB format with the channel as the innermost dimension, we are arranging for the needed transformations here.

In [None]:
# load the mean ImageNet image (as distributed with Caffe) for subtraction
mu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy')
mu = mu.mean(1).mean(1)  # average over pixels to obtain the mean (BGR) pixel values
print 'mean-subtracted values:', zip('BGR', mu)

# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})

transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

#### 3. CPU classification

Now we're ready to perform classification. Even though we'll only classify one image, we'll set a batch size of 50 to demonstrate batching.

In [None]:
# set the size of the input (we can skip this if we're happy
#  with the default; we can also change it later, e.g., for different batch sizes)
net.blobs['data'].reshape(50,        # batch size
                          3,         # 3-channel (BGR) images
                          227, 227)  # image size is 227x227

In [None]:
image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)
plt.imshow(image)

- Classification

In [None]:
# copy the image data into the memory allocated for the net
net.blobs['data'].data[...] = transformed_image

### perform classification
output = net.forward()

output_prob = output['prob'][0]  # the output probability vector for the first image in the batch

print 'predicted class is:', output_prob.argmax()

- The net gives us a vector of probabilities; the most probable class was the 281st one. But is that correct? Let's check the ImageNet labels...

In [None]:
# load ImageNet labels
labels_file = caffe_root + 'data/ilsvrc12/synset_words.txt'
if not os.path.exists(labels_file):
    !../data/ilsvrc12/get_ilsvrc_aux.sh
    
labels = np.loadtxt(labels_file, str, delimiter='\t')

print 'output label:', labels[output_prob.argmax()]

In [None]:
# sort top five predictions from softmax output
top_inds = output_prob.argsort()[::-1][:5]  # reverse sort and take five largest items

print 'probabilities and labels:'
zip(output_prob[top_inds], labels[top_inds])

#### 4. Switching to GPU Mode

In [None]:
%timeit net.forward()