# ADE20K image segmentation dataset
## Loading the index files

### Portland Data Science Group Applied Data Science Meetup
### John Burt

This notebook demonstrates how to read the image data index files converted into CSV files from the original index file 'index_ade20k.mat'

See the [description on this page](http://groups.csail.mit.edu/vision/datasets/ADE20K/) for more about the index file. Note that the description doesn't quite match the content of this file, so you will have to infer for yourself what some of the fields mean! 

Original Matlab file: index_ade20k_2015.mat

Converted files:

CSV file: ADE20K_index_image.csv
- filename: cell array of length N=22210 with the image file names.
- folder: cell array of length N with the image folder names.
- scene: cell array of length N providing the scene name (same classes as the Places database) for each image.


CSV file: ADE20K_index_objectPresence.csv
- objectPresence: array of size [length C, N] with the object counts per image. objectPresence(c,i)=n if in image i there are n instances of object class c.



CSV file: ADE20K_index_objectIsPart.csv
- objectIsPart: array of size [length C, N] counting how many times an object is a part in each image. objectIsPart(c,i)=m if in image i object class c is a part of another object m times. For objects, objectIsPart(c,i)=0, and for parts we will find: objectIsPart(c,i) ≈ objectPresence(c,i).



CSV file: ADE20K_index_object.csv
- objectnames: cell array of length C with the object class names.
- wordnet_found: array of length C. It indicates if the objectname was found in Wordnet.
- wordnet_hypernym: cell array of length C. WordNet hypernyms for each object name.
- wordnet_gloss: cell array of length C. WordNet definition.
- objectcounts: array of length C with the number of instances for each object class.
- proportionClassIsPart: array of length C with the proportion of times that class c behaves as a part. If proportionClassIsPart(c)=0 then it means that this is a main object (e.g., car, chair, ...). See bellow for a discussion on the utility of this variable.


In [2]:
# remove warnings
import warnings
warnings.filterwarnings('ignore')
# ---

%matplotlib inline
from matplotlib import pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

import pandas as pd
pd.options.display.max_columns = 100

import numpy as np
import datetime
import time


In [3]:
df = pd.read_csv('ADE20K_index_image.csv')
df.head()

Unnamed: 0,filename,folder,scene,typeset
0,ADE_train_00000001.jpg,ADE20K_2016_07_26/images/training/a/airport_te...,airport_terminal,1
1,ADE_train_00000002.jpg,ADE20K_2016_07_26/images/training/a/airport_te...,airport_terminal,1
2,ADE_train_00000003.jpg,ADE20K_2016_07_26/images/training/a/art_gallery,art_gallery,1
3,ADE_train_00000004.jpg,ADE20K_2016_07_26/images/training/b/badlands,badlands,1
4,ADE_train_00000005.jpg,ADE20K_2016_07_26/images/training/b/ball_pit,ball_pit,1


In [4]:
df = pd.read_csv('ADE20K_index_objectIsPart.csv')
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,...,22160,22161,22162,22163,22164,22165,22166,22167,22168,22169,22170,22171,22172,22173,22174,22175,22176,22177,22178,22179,22180,22181,22182,22183,22184,22185,22186,22187,22188,22189,22190,22191,22192,22193,22194,22195,22196,22197,22198,22199,22200,22201,22202,22203,22204,22205,22206,22207,22208,22209
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [5]:
df = pd.read_csv('ADE20K_index_objectPresence.csv')
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,...,22160,22161,22162,22163,22164,22165,22166,22167,22168,22169,22170,22171,22172,22173,22174,22175,22176,22177,22178,22179,22180,22181,22182,22183,22184,22185,22186,22187,22188,22189,22190,22191,22192,22193,22194,22195,22196,22197,22198,22199,22200,22201,22202,22203,22204,22205,22206,22207,22208,22209
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [6]:
df = pd.read_csv('ADE20K_index_object.csv')
df.head()

Unnamed: 0,objectcounts,objectnames,proportionClassIsPart,wordnet_found,wordnet_frequency,wordnet_gloss,wordnet_hypernym,wordnet_level1,wordnet_synonyms,wordnet_synset
0,1.0,-,0.0,0,0,,,,,
1,0.0,aarm panel,0.0,1,1,the part of an armchair or sofa that supports ...,arm; armrest; rest; support; device; instrumen...,arm,arm,arm. armrest. rest. support. device. instrumen...
2,1.0,abacus,0.0,1,0,a calculator that performs arithmetic function...,"abacus; calculator, calculating machine; machi...",abacus,abacus,"abacus. calculator, calculating machine. machi..."
3,1.0,"accordion, piano accordion, squeeze box",0.0,1,0,a portable box-shaped free-reed instrument; th...,"accordion, piano accordion, squeeze box; free-...","accordion, piano accordion, squeeze box","accordion, piano accordion, squeeze box","accordion, piano accordion, squeeze box. free-..."
4,11.0,acropolis,0.0,1,0,the citadel in ancient Greek towns,"acropolis; bastion, citadel; stronghold, fastn...",acropolis,acropolis,"acropolis. bastion, citadel. stronghold, fastn..."
