<a href="https://colab.research.google.com/github/rkhanna19/amenity-detection/blob/master/Amenity_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Amenity Detection with Detectron2 🛋📺☕️

This notebook documents Group 6's machine learning project for GATECH CS 4641.


![kitchen](https://user-images.githubusercontent.com/31427851/88501290-0b9ce900-cf99-11ea-9195-6768db11a29d.png)

Sample image taken from [here](https://medium.com/airbnb-engineering/amenity-detection-and-beyond-new-frontiers-of-computer-vision-at-airbnb-144a4441b72e).

The image above is the result of a customized computer vision model trained by Airbnb engineers. Computer vision is a field of artificial intelligence that trains computers to understand digital images or videos. The generation of these bounding boxes relates to a sub-field of computer vision, object detection, which involves the detection of semantic objects of certain classes.

### First off, what is an amenity?
An amenity is a useful feature of a building or place. A couch, a dining table, or a refrigerator 

### Why would we want to detect amenities?

When Airbnb lists a home or apartment or Trivago lists a hotel room, the service must label the amenities on their listings to match the actual furnishings of the property. Human efforts in this regard are not only fallible but also not sustainable. Computer vision models can automatically identify amenities from listing photos, eliminating the time and friction involved with generating a listing. Moreover, tagging photos and listings based on specific amenities grants users further granularity when searching for listings.

Suppose a user needsa walk-in shower because they have a wheelchair. The host may not think it's important to list this amenity, but a computer vision algorithm could pick up on it from the photos and allow the user to find listings that match their needs.

Now, let's take a look at the dataset.

### Open Images Dataset

The [Open Images Dataset](https://storage.googleapis.com/openimages/web/index.html) is a massive machine learning dataset generated by Google. This dataset contains relationship narratives, annotations, and localized narratives for over 3 million images. 

The below histogram documents how many images there exist for each class in Open Images.

![data](https://user-images.githubusercontent.com/31427851/88501309-16577e00-cf99-11ea-97f3-b0a7cbbdf86c.png)

Open Images also offers 15,851,536 boxes on 600 categories. Because we are interested in an Object Detection use case, this property is most relevant to us. To make the most of our computing resources and time constrains, we will build a customized model to focus on two classes: fireplace and coffeemaker.


## Related Work

This work is inspired in large part by the work of Airbnb engineers to build an amenity detection model, as documented in [this Medium article](https://medium.com/airbnb-engineering/amenity-detection-and-beyond-new-frontiers-of-computer-vision-at-airbnb-144a4441b72e).

Airbnb engineers developed a taxonomy of 30 image classes, selecting them from the 600 present in the Open Images dataset, and fused this data with their internal network data. They built models on Google's AutoML and chose two pre-trained models for fine-tuning ssd_mobilenet_v2 and faster_rcnn_inception_resnet_v2. They improved upon the accuracy of generic third-party models substantially, as evinced by the per-class graph of mean average precision (mAP) below.

![map-airbnb](https://user-images.githubusercontent.com/31427851/88501295-0dff4300-cf99-11ea-95e2-a26babf89b35.png)

## Machine Specifications

Let's take a look at what kind of a machine Google Colab has given us.

Using the [Tensorflow API](https://www.tensorflow.org/lite/examples), we can reveal the device name, type, and memory limit. We should also determine how much RAM is available to help someone else recreate our findings.

In [1]:
import tensorflow as tf
tf.test.gpu_device_name()

'/device:GPU:0'

In [2]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 17137567336604863346, name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 14151441755751622264
 physical_device_desc: "device: XLA_CPU device", name: "/device:XLA_GPU:0"
 device_type: "XLA_GPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 2981426434565437507
 physical_device_desc: "device: XLA_GPU device", name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 15701463552
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 9901328481589976786
 physical_device_desc: "device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0"]

XLA stands for Accelerated Linear Algebra. It's a compiler for linear algebra that can accelerate Tensorflow models. 

According to the Tensorflow website, the results are improvements in speed and memory usage: "most internal benchmarks run ~1.15x faster after XLA is enabled."

Notice how there are two devices in the device list: an XLA CPU and an XLA GPU. GPUs can process data orders of magnitude faster than CPUs because of parallelism and multithreading architectures. Given how much image data we will need to process for this machine learning application, we will rely on its computational power.

In [4]:
!cat /proc/meminfo

MemTotal:       13333556 kB
MemFree:         9246592 kB
MemAvailable:   11889900 kB
Buffers:           75632 kB
Cached:          2680064 kB
SwapCached:            0 kB
Active:          1187980 kB
Inactive:        2484612 kB
Active(anon):     844276 kB
Inactive(anon):     8528 kB
Active(file):     343704 kB
Inactive(file):  2476084 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               284 kB
Writeback:             0 kB
AnonPages:        916996 kB
Mapped:           663900 kB
Shmem:              9156 kB
Slab:             175616 kB
SReclaimable:     128824 kB
SUnreclaim:        46792 kB
KernelStack:        3696 kB
PageTables:         8636 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6666776 kB
Committed_AS:    3123580 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:              920 kB
AnonHugePages:   

It looks like there are ~11.9 GB in available memory on this GPU. 

In [1]:
!nvidia-smi

Mon Jul 27 15:24:22 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   73C    P8    33W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

We are running an NVIDIA Tesla K80 GPU.

## Getting into the Weeds

Detectron2, the pre-built computer vision model we will train on our amenity-specific data, relies on [Facebook's PyTorch Library](https://pytorch.org/).

FIrst, let's install the python-specific PyTorch dependencies. Next, we will install a few other packages we need for our data wrangling, and we will import all of the common python libraries used for machine learning.

In [2]:
# Python's torch library supports tensor computation with GPU acceleration
# Detectron2 is built with PyTorch
!pip install -U torch==1.4+cu100 torchvision==0.5+cu100 -f https://download.pytorch.org/whl/torch_stable.html

# Let's make sure that worked
import torch, torchvision

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.4+cu100
[?25l  Downloading https://download.pytorch.org/whl/cu100/torch-1.4.0%2Bcu100-cp36-cp36m-linux_x86_64.whl (723.9MB)
[K     |████████████████████████████████| 723.9MB 26kB/s 
[?25hCollecting torchvision==0.5+cu100
[?25l  Downloading https://download.pytorch.org/whl/cu100/torchvision-0.5.0%2Bcu100-cp36-cp36m-linux_x86_64.whl (4.0MB)
[K     |████████████████████████████████| 4.1MB 44.5MB/s 
Installing collected packages: torch, torchvision
  Found existing installation: torch 1.5.1+cu101
    Uninstalling torch-1.5.1+cu101:
      Successfully uninstalled torch-1.5.1+cu101
  Found existing installation: torchvision 0.6.1+cu101
    Uninstalling torchvision-0.6.1+cu101:
      Successfully uninstalled torchvision-0.6.1+cu101
Successfully installed torch-1.4.0+cu100 torchvision-0.5.0+cu100


In [None]:
# Pyaml will allow us to parse yaml files which contain information about machine learning
!pip install cython pyyaml==5.1

# COCO (Common Objects in COntext) is a large scale object detection, segmentation, and captioning dataset
# Detectron2 is trained on the COCO dataset
!pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

!pip install awscli # let's us download images from Open Images using downloadOI.py script

Now let's install the Detectron2 model.

In [None]:
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/index.html

Looking in links: https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/index.html
Processing /root/.cache/pip/wheels/86/19/08/49b25f258ead1f861c9ab2fc41f73636f2928859adbb0e9797/pycocotools-2.0.1-cp36-cp36m-linux_x86_64.whl
Installing collected packages: pycocotools
  Found existing installation: pycocotools 2.0
    Uninstalling pycocotools-2.0:
      Successfully uninstalled pycocotools-2.0
Successfully installed pycocotools-2.0.1


We'll import some common libraries first.

In [None]:
import numpy as np
import pandas as pd
from tqdm import tqdm # we'll need this for downloading the data
import cv2
import random
from google.colab.patches import cv2_imshow

Now let's make sure we have Detectron2 set up on our machine.

In [None]:
import detectron2

# setup_logger contains a formatter that will allow us to see
# what's going on with the mmodel during training and will help us
# debug if any issues arise
from detectron2.utils.logger import setup_logger
setup_logger()

In [None]:
from detectron2 import model_zoo # a series of pre-trained Detectron2 models: https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md
from detectron2.engine import DefaultPredictor # a default predictor class to make predictions on an image using a trained model
from detectron2.config import get_cfg # a config of "cfg" in Detectron2 is a series of instructions for building a model
from detectron2.utils.visualizer import Visualizer # a class to help visualize Detectron2 predictions on an image
from detectron2.data import MetadataCatalog # stores information about the model such as what the training/test data is, what the class names are

## Run a Pre-Trained Model
Let's start having some fun! We will download the image from the Airbnb Article Cover visualize it using OpenCV.

Since the Open Images dataset are public, a big dog model was trained on as many of the public images as possible.

Airbnb selected 30 target classes that were relevant to amenity detection from Open Images and trained a model to identify the images.


Let's download the full trained model and make a prediction on the article cover.

In [None]:
# Download and display example image and save it as demo.jpeg
!wget https://raw.githubusercontent.com/mrdbourke/airbnb-amenity-detection/master/custom_images/airbnb-article-cover.jpeg -O demo.jpeg
img = cv2.imread("./demo.jpeg")
cv2_imshow(img)

In [None]:
# Download the trained model
!wget https://storage.googleapis.com/airbnb-amenity-detection-storage/airbnb-amenity-detection/open-images-data/retinanet_model_final/retinanet_model_final.pth 

# Download the train model config (instructions on how the model was built)
!wget https://storage.googleapis.com/airbnb-amenity-detection-storage/airbnb-amenity-detection/open-images-data/retinanet_model_final/retinanet_model_final_config.yaml

--2020-07-26 15:08:02--  https://storage.googleapis.com/airbnb-amenity-detection-storage/airbnb-amenity-detection/open-images-data/retinanet_model_final/retinanet_model_final.pth
Resolving storage.googleapis.com (storage.googleapis.com)... 108.177.111.128, 108.177.121.128, 74.125.124.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|108.177.111.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 446656531 (426M) [application/octet-stream]
Saving to: ‘retinanet_model_final.pth.1’


2020-07-26 15:09:02 (7.12 MB/s) - ‘retinanet_model_final.pth.1’ saved [446656531/446656531]

--2020-07-26 15:09:02--  https://storage.googleapis.com/airbnb-amenity-detection-storage/airbnb-amenity-detection/open-images-data/retinanet_model_final/retinanet_model_final_config.yaml
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.212.128, 172.217.214.128, 108.177.111.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.2

### Airbnb's 30 Target Classes

We'll feed the classifier a list of Airbnb's 30 target classes to identify.

In [1]:
# Airbnb's target classes
target_classes = ['Bathtub',
 'Bed',
 'Billiard table',
 'Ceiling fan',
 'Coffeemaker',
 'Couch',
 'Countertop',
 'Dishwasher',
 'Fireplace',
 'Fountain',
 'Gas stove',
 'Jacuzzi',
 'Kitchen & dining room table',
 'Microwave oven',
 'Mirror',
 'Oven',
 'Pillow',
 'Porch',
 'Refrigerator',
 'Shower',
 'Sink',
 'Sofa bed',
 'Stairs',
 'Swimming pool',
 'Television',
 'Toilet',
 'Towel',
 'Tree house',
 'Washing machine',
 'Wine rack']

Let's make a prediction and visualize it.

In [None]:
# Setup a model config file (set of instructions for the model)
cfg = get_cfg() # setup a default config, see: https://detectron2.readthedocs.io/modules/config.html
cfg.merge_from_file("./retinanet_model_final_config.yaml") # merge the config YAML file (a set of instructions on how to build a model)
cfg.MODEL.WEIGHTS = "./retinanet_model_final.pth" # setup the model weights from the fully trained model

# Create a default Detectron2 predictor for making inference
predictor = DefaultPredictor(cfg)

# Make a prediction the example image from above
outputs = predictor(img)

In [None]:
# Number of predicted amenities to draw on the target image
num_amenities = 7

# Set up a visualizer instance: https://detectron2.readthedocs.io/modules/utils.html#detectron2.utils.visualizer.Visualizer
visualizer = Visualizer(img_rgb=img[:, :, ::-1], # we have to reverse the color order otherwise we'll get blue images (BGR -> RGB)
                        metadata=MetadataCatalog.get(cfg.DATASETS.TEST[0]).set(thing_classes=amenity_list), # we tell the visualizer what classes we're drawing (from the target classes)
                        scale=0.7)

# Draw the models predictions on the target image
visualizer = visualizer.draw_instance_predictions(outputs["instances"][:num_amenities].to("cpu"))

# Display the image
cv2_imshow(visualizer.get_image()[:, :, ::-1])

### Download the Image Labels

This process will allow us to preprocess the data. The Image Labels have all of the information about the bounding boxes for the images.

In [None]:
# Training bounding boxes (1.11G)
!wget https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv

# Validating bounding boxes (23.94M)
!wget https://storage.googleapis.com/openimages/v5/validation-annotations-bbox.csv
    
# Testing bounding boxes (73.89M)
!wget https://storage.googleapis.com/openimages/v5/test-annotations-bbox.csv

# Class names of images (11.73K)
!wget https://storage.googleapis.com/openimages/v5/class-descriptions-boxable.csv

## Training our Customized Detectron2 Model

First, we'll download the training data from a Google Storage bucket to save download time. Next we'll visualize the training data to ensure it matches up with our expectations. After, we will preprocess the data -- combine the images and the labels so that Detectron2 knows where the bounding boxes are. Finally, we'll use [DefaultTrainer](https://detectron2.readthedocs.io/modules/engine.html?highlight=defaulttrainer#detectron2.engine.defaults.DefaultTrainer) to start training.

In [None]:
# Get coffeemaker and fireplace training images from Open Images and unzip them
!wget https://storage.googleapis.com/airbnb-amenity-detection-storage/airbnb-amenity-detection/open-images-data/cmaker-fireplace-train.zip
!unzip -q cmaker-fireplace-train

# Get coffeemaker and fireplace validation images from Open Images and unzip them
!wget https://storage.googleapis.com/airbnb-amenity-detection-storage/airbnb-amenity-detection/open-images-data/cmaker-fireplace-valid.zip
!unzip -q cmaker-fireplace-valid

We'll store the training and testing paths, so that we can point the classifier towards them later.

In [None]:
train_path = "./cmaker-fireplace-train/"
valid_path = "./cmaker-fireplace-valid/"

We'll sample an image from the training set to make sure nothing has gone amiss.

In [None]:
import os
# Read in a random image from the training directory
train_img = cv2.imread(train_path + random.sample(os.listdir("./cmaker-fireplace-train"), 1)[0])
cv2_imshow(train_img)