## Explorer Notebook

This notebook is for a bunch of little experiments here and there. Mostly just a place to run Python code.

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [None]:
!pwd
import os
os.chdir('/content/drive/MyDrive/airbnb-amenity-detection/')
!pwd

/content/drive/MyDrive/airbnb-amenity-detection/custom_images
/content/drive/MyDrive/airbnb-amenity-detection


In [None]:
import pandas as pd

In [None]:
# These are the subset of classes,
subset = ["Toilet",
          "Swimming_pool",
          "Bed",
          "Billiard_table",
          "Sink",
          "Fountain",
          "Oven",
          "Ceiling_fan",
          "Television",
          "Microwave_oven",
          "Gas_stove",
          "Refrigerator",
          "Kitchen_&_dining_room_table",
          "Washing_machine",
          "Bathtub",
          "Stairs",
          "Fireplace",
          "Pillow",
          "Mirror",
          "Shower",
          "Couch",
          "Countertop",
          "Coffeemaker",
          "Dishwasher",
          "Sofa_bed",
          "Tree_house",
          "Towel",
          "Porch",
          "Wine_rack",
          "Jacuzzi"]

len(subset)

30

## Start exploring the class names in Open Images
Downloaded the class descriptions from Open Images: !wget https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv

This file contains all of the codenames for the classes which have bounding box labels in Open Images.

In [None]:
#!wget https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv
#!wget https://raw.githubusercontent.com/spmallick/learnopencv/master/downloadOpenImages/downloadOI.py
#!wget https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv
#!wget https://storage.googleapis.com/openimages/2018_04/validation/validation-annotations-bbox.csv
#!wget https://storage.googleapis.com/openimages/2018_04/test/test-annotations-bbox.csv
#!wget https://raw.githubusercontent.com/spmallick/learnopencv/master/downloadOpenImages/README.md

In [None]:
# All the classes in Open Images
classes = pd.read_csv("class-descriptions-boxable.csv", names=["ID", "Name"])
classes

Unnamed: 0,ID,Name
0,/m/011k07,Tortoise
1,/m/011q46kg,Container
2,/m/012074,Magpie
3,/m/0120dh,Sea turtle
4,/m/01226z,Football
...,...,...
596,/m/0qmmr,Wheelchair
597,/m/0wdt60w,Rugby ball
598,/m/0xfy,Armadillo
599,/m/0xzly,Maracas


In [None]:
# Let's get a subset or at least all the columns which match
classes["match"] = classes["Name"].isin(subset)
classes

Unnamed: 0,ID,Name,match
0,/m/011k07,Tortoise,False
1,/m/011q46kg,Container,False
2,/m/012074,Magpie,False
3,/m/0120dh,Sea turtle,False
4,/m/01226z,Football,False
...,...,...,...
596,/m/0qmmr,Wheelchair,False
597,/m/0wdt60w,Rugby ball,False
598,/m/0xfy,Armadillo,False
599,/m/0xzly,Maracas,False


In [None]:
classes.match.value_counts()

False    581
True      20
Name: match, dtype: int64

In [None]:
# Where do they match up?
matches = classes[classes["match"] == True]["Name"].tolist()
matches

['Sink',
 'Towel',
 'Stairs',
 'Fountain',
 'Oven',
 'Couch',
 'Shower',
 'Pillow',
 'Bathtub',
 'Bed',
 'Fireplace',
 'Refrigerator',
 'Porch',
 'Mirror',
 'Jacuzzi',
 'Television',
 'Coffeemaker',
 'Toilet',
 'Countertop',
 'Dishwasher']

In [None]:
# Where are they different?
missing_classes = list(set(subset)-set(matches))
missing_classes # missing classes in Open Images that are in Airbnb's classes of concern

['Gas_stove',
 'Ceiling_fan',
 'Microwave_oven',
 'Wine_rack',
 'Swimming_pool',
 'Sofa_bed',
 'Tree_house',
 'Kitchen_&_dining_room_table',
 'Washing_machine',
 'Billiard_table']

In [None]:
# Are there similar versions of these classes in the descriptions I could use?
classes[classes["Name"].str.contains("pool")]

Unnamed: 0,ID,Name,match
444,/m/0b_rs,Swimming pool,False


In [None]:
classes[classes["Name"].str.contains("stove")]

Unnamed: 0,ID,Name,match
197,/m/02wv84t,Gas stove,False
270,/m/04169hn,Wood-burning stove,False


In [None]:
classes[classes["Name"].str.contains("stove")]["Name"].tolist()

['Gas stove', 'Wood-burning stove']

In [None]:
# Get the individual words from each string of missing classes
strings = [i.split('_') for i in missing_classes]
strings = [item for sublist in strings for item in sublist]
strings

['Gas',
 'stove',
 'Ceiling',
 'fan',
 'Microwave',
 'oven',
 'Wine',
 'rack',
 'Swimming',
 'pool',
 'Sofa',
 'bed',
 'Tree',
 'house',
 'Kitchen',
 '&',
 'dining',
 'room',
 'table',
 'Washing',
 'machine',
 'Billiard',
 'table']

In [None]:
# Now find if any of the strings match up
more_matches = []
for string in strings:
  more_matches.append(classes[classes["Name"].str.contains(string)]["Name"].tolist())
more_matches = list(set([item for sublist in more_matches for item in sublist]))
more_matches

['Microwave oven',
 'Billiard table',
 'Washing machine',
 'Kitchen utensil',
 'Wine',
 'Kitchen appliance',
 'Tree house',
 'Infant bed',
 'Bathroom accessory',
 'Tennis racket',
 'Coffee table',
 'Tree',
 'Mushroom',
 'Vegetable',
 'Kitchen knife',
 'Wine rack',
 'Swimming pool',
 'Sofa bed',
 'Bathroom cabinet',
 'Sewing machine',
 'Kitchenware',
 'Dog bed',
 'Gas stove',
 'Kitchen & dining room table',
 'Wood-burning stove',
 'Spice rack',
 'Wine glass',
 'Lighthouse',
 'Ceiling fan',
 'Table tennis racket',
 'Mechanical fan']

In [None]:
# Take out the underscore
missing_classes_no_space = [i.replace("_", " ") for i in missing_classes]
missing_classes_no_space

['Gas stove',
 'Ceiling fan',
 'Microwave oven',
 'Wine rack',
 'Swimming pool',
 'Sofa bed',
 'Tree house',
 'Kitchen & dining room table',
 'Washing machine',
 'Billiard table']

In [None]:
# Find the actual missing classes
actual_missing_classes = list(set(missing_classes_no_space) - set(more_matches))
actual_missing_classes

[]

Turns out there aren't any missing classes from the Open Images set! The only difference here is the naming convention. Airbnb used underscores "_" in their class names. This is a simple fix we can implement later.

Let's remove the underscores from our subset list and play with that to start downloading classes.

In [None]:
subset_no_underscore = [i.replace("_", " ") for i in subset]
subset_no_underscore

['Toilet',
 'Swimming pool',
 'Bed',
 'Billiard table',
 'Sink',
 'Fountain',
 'Oven',
 'Ceiling fan',
 'Television',
 'Microwave oven',
 'Gas stove',
 'Refrigerator',
 'Kitchen & dining room table',
 'Washing machine',
 'Bathtub',
 'Stairs',
 'Fireplace',
 'Pillow',
 'Mirror',
 'Shower',
 'Couch',
 'Countertop',
 'Coffeemaker',
 'Dishwasher',
 'Sofa bed',
 'Tree house',
 'Towel',
 'Porch',
 'Wine rack',
 'Jacuzzi']

Okay we'll start with a small class (small as in, there are likely not many examples), let's use Jacuzzi first.

In [None]:
#!git clone https://github.com/EscVM/OIDv4_ToolKit.git
#!git clone https://github.com/spmallick/learnopencv.git

In [None]:
#!curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
#!unzip awscliv2.zip
#!/content/drive/MyDrive/airbnb-amenity-detection/aws/install --update
#!/usr/local/bin/aws --version
#!rm -rf ./aws/
#!rm -rf ./train/
#!rm -rf ./train\ \(1\)
#!mkdir data
#!cp *.csv* data/

In [None]:
!python3 downloadOI.py --classes 'Jacuzzi' --mode train

Class 0 : Jacuzzi
Annotation Count : 103
Number of images to be downloaded : 102
100% 102/102 [06:07<00:00,  3.60s/it]


In [None]:
!python3 downloadOI.py --classes 'Toilet,Bathtub' --mode validation

Class 0 : Toilet
Class 1 : Bathtub
Annotation Count : 43
Number of images to be downloaded : 39
100% 39/39 [02:28<00:00,  3.82s/it]
