# Weakly Supervised Room Classification with YOLOv3 and Snorkel

This notebook is based on the idea that one of the difficulties in classifying hotels may be that we see very different rooms. We can't usually see both the bedroom and bathroom in the same photo, and some hotels have other spaces which are distinct from either (kitchens, etc). It's not particularly likely that the mapping between a particular bathroom and bedroom features is obvious.

The idea behind this notebook is to detect objects associated with particular rooms and views, such as toilets and sinks in the bathroom, and beds for the bedroom. We use Snorkel as a framework for handling and combining noisy labels.

In [None]:
!pip install -q snorkel pytorchyolo opencv-python torch==1.10.2 torchvision==0.11.3

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import cv2
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

In [None]:
import snorkel
from snorkel.labeling import labeling_function, LFAnalysis, PandasLFApplier
from snorkel.preprocess import preprocessor
from snorkel.labeling.model import LabelModel

Note that the YOLO directory below is in a private dataset. The repository I clone in that is GPLv3 licensed, so I can't republish under the Apache license.

You can find the script to download the data from https://github.com/eriklindernoren/PyTorch-YOLOv3 . The code is loaded through Pip, because it was painful to get dependencies to work otherwise.

In [None]:
YOLO_DIR = '/kaggle/input/pytorch-yolov3/PyTorch-YOLOv3/'
DATA_DIR = '/kaggle/input/hotel-id-to-combat-human-trafficking-2022-fgvc9/'

Snorkel uses a series of integer class labels, starting at zero. The negative label signifies no decision.

In [None]:
# Class labels to apply
ABSTAIN = -1
BEDROOM = 0
BATHROOM = 1
OTHER = 2

We find all the training images, and collate them by Hotel ID (the directory name).

In [None]:
data = {"full_path": [], "image": [], "hotel_id": []}
for subdir in os.listdir(DATA_DIR + '/train_images/'):
    hotel_id = int(subdir)
    for image in os.listdir(f'{DATA_DIR}/train_images/{hotel_id}'):
        path = f'{DATA_DIR}/train_images/{hotel_id}/{image}'
        data['image'].append(image)
        data['full_path'].append(path)
        data['hotel_id'].append(hotel_id)

df = pd.DataFrame(data)
df.head()

The below line, if uncommented, reduces the amount of data used for testing.

In [None]:
#df = df.sample(1000, random_state=42)

We use YOLOv3 to construct a Snorkel "preprocessor". This will process each sample, and extract computer-readable information. In this case, it's the objects detected by YOLO.

In [None]:
from pytorchyolo import models, detect
yolo_model = models.load_model(YOLO_DIR + 'config/yolov3.cfg', YOLO_DIR + 'weights/yolov3.weights')

labels = []
with open(YOLO_DIR + '/data/coco.names', 'r') as f:
    for line in f:
        labels.append(line.strip())

@preprocessor(memoize=True)
def object_detection(x):
    # Load the image as a numpy array
    img = cv2.imread(x.full_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # Runs the YOLO model on the image 
    boxes = detect.detect_image(yolo_model, img)
    
    # Re-map the integer class label in to a string
    objects = []
    for x1, y1, x2, y2, confidence, c in boxes:
        label = labels[int(c)]
        objects.append((x1, y1, x2, y2, confidence, label))
    x.object_boxes = objects
    return x

Each of the following is a "labeling function" for Snorkel. They are expected to be heuristic, somewhat noisy classifiers, which are allowed to return "don't know" or abstain. Each looks for one or more features with reasonable confidence, and will declare it to be a particular type of room if found. Snorkel will later combine these noisy labels together.

In [None]:
@labeling_function(pre=[object_detection])
def toilet_in_bathroom(x):
    for x1, y1, x2, y2, confidence, label in x.object_boxes:
        if label == 'toilet' and confidence > 0.5:
            return BATHROOM
    return ABSTAIN

@labeling_function(pre=[object_detection])
def sink_in_bathroom(x):
    for x1, y1, x2, y2, confidence, label in x.object_boxes:
        if label == 'sink' and confidence > 0.5:
            return BATHROOM
    return ABSTAIN

@labeling_function(pre=[object_detection])
def bed_in_bedroom(x):
    for x1, y1, x2, y2, confidence, label in x.object_boxes:
        if label == 'bed' and confidence > 0.5:
            return BEDROOM
    return ABSTAIN

@labeling_function(pre=[object_detection])
def oven_in_kitchen(x):
    for x1, y1, x2, y2, confidence, label in x.object_boxes:
        if label == 'oven' and confidence > 0.5:
            return OTHER
    return ABSTAIN

@labeling_function(pre=[object_detection])
def microwave_in_kitchen(x):
    for x1, y1, x2, y2, confidence, label in x.object_boxes:
        if label == 'microwave' and confidence > 0.5:
            return OTHER
    return ABSTAIN

@labeling_function(pre=[object_detection])
def fridge_in_kitchen(x):
    for x1, y1, x2, y2, confidence, label in x.object_boxes:
        if label == 'refrigerator' and confidence > 0.5:
            return OTHER
    return ABSTAIN

@labeling_function(pre=[object_detection])
def default_other(x):
    reliable_classifiers = [toilet_in_bathroom, sink_in_bathroom, bed_in_bedroom]
    if all([c(x) == ABSTAIN for c in reliable_classifiers]):
        return OTHER
    return ABSTAIN

We apply our label functions to our data, to extract the prediction that each function produced for each output as a 2D array.

In [None]:
lfs = [toilet_in_bathroom, sink_in_bathroom, bed_in_bedroom, oven_in_kitchen, microwave_in_kitchen, fridge_in_kitchen, default_other]

applier = PandasLFApplier(lfs=lfs)
L = applier.apply(df=df)
L

We can view coverage, overlap, and conflict statistics for each function, showing how valuable they are for labelling, or whether they're redundant.

In [None]:
LFAnalysis(L=L, lfs=lfs).lf_summary()

We show a sample of what each rule is picking up and labelling, to see whether these are reasonable. It's interesting to note that some of the "confused" images we see (e.g. the bed it thinks is a fridge) are rotated wrong - correcting that may be a path to improvement.

In [None]:
NUM_IMAGES = 5

fig, ax = plt.subplots(len(lfs), NUM_IMAGES, figsize=(22,8 * len(lfs)))
for i, lf in enumerate(lfs):
    ax[i][0].set_ylabel(lf.name)
    
    matched = df.iloc[np.not_equal(L[:, i], ABSTAIN)]
    sample = matched.sample(min(len(matched), NUM_IMAGES), random_state=1000+i)
    for j, path in enumerate(sample['full_path']):
        ax[i][j].imshow(mpimg.imread(path))

Snorkel can train a model based on the provided labels. My understanding of this step is limited, but I believe it will end up evaluating functions by their conflicts, weighting the most reliably agreeing ones higher.

In [None]:
label_model = LabelModel(cardinality=3, verbose=True)
label_model.fit(L_train=L, n_epochs=500, log_freq=100, seed=123)

We can now produce our noisy labels. Note that this is always a best-effort process, and so a confidence measure is produced alongside the peak prediction.

In [None]:
P, C = label_model.predict(L, return_probs=True)
LABELS = ['unknown', 'bedroom', 'bathroom', 'other']

ldf = df.copy()[['hotel_id', 'image']]
ldf['room_type'] = [LABELS[p+1] for p in P]
for i, l in enumerate(LABELS[1:]):
    ldf['p_' + l] = C[:, i]
ldf.head()

Summarise how many rooms ended up labeled as each type:

In [None]:
for l in LABELS:
    c = len(ldf[ldf['room_type'] == l])
    print(f'{l}: {c}')

And save the results.

In [None]:
ldf.to_csv('room-types.csv')

Snorkel models produce a "noisy" label set, don't generalise, and are rarely used directly. The normal path from here would be to train a classifier using the noisy labels, which gains the ability to generalise to other examples. A trained classifier is also likely to fill in the "unknown" elements, and provided it's not overfitted too much, may have a smoothing effect that lets it express reduced confidence in anything we labelled wrongly with these heuristics.

For this competition, these heuristics may well be "good enough" to use directly for some purposes.