# MPII Cooking 2: Attributes Annotation 

This file contains the structure of the [attributesAnnotations_MPII-Cooking-2.mat](http://datasets.d2.mpi-inf.mpg.de/MPII-Cooking-2/annos/attributesAnnotations_MPII-Cooking-2.mat) file, *i.e.*, what and where attributes are stored. It presents a brief description of the content of the file in order to make it easy to extract the content. Below, we describe each attribute and its content. All atributes listed here are stored in `annos` key. The content is extracted using Python as follows:

In [1]:
import h5py
import numpy as np
from os import path

FILEINPUT='attributesAnnotations_MPII-Cooking-2.mat'
data = h5py.File(FILEINPUT, 'r')
print(data.keys())

def convert_hd52srt(data, key):
    """
    Convert data inside `key` from h5py object to string
    
    data: h5py.File instance
    key: data['annos'].keys(), e.g. '/annos/activities'
    """
    content = []
    for vec in data[key].value.flatten():
        vec_srt = data[vec].value
        str_data = vec_srt.astype('uint8').tobytes().decode('utf8')
        content.append(str_data)
    return content

[u'#refs#', u'#subsystem#', u'annos']


### Content of Annotation

Below we present all the content stored in `annos`. The content is stored according to the following keys: 

In [2]:
anno = data['annos']
for key in anno.keys():
    print('- {}'.format(key))

- activity
- annoFileMap
- attrFields
- attributeMap
- bgClassName
- bgLabel
- bgLabelAttr
- classMap
- className
- containers
- containersDestination
- containersProperties
- containersSource
- dataset
- dish
- endFrame
- endTimeSeconds
- fields
- fileId
- fileName
- fileNameId
- filters
- frameRate
- iaMat
- idxInFile
- ingredients
- ingredientsProperties
- labels
- minBgWindow
- nImgsPerFile
- nullAttr
- nullAttrLabel
- startFrame
- startTimeSeconds
- subject
- tool
- toolProperties


According to the [MPII site](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/human-activity-recognition/mpii-cooking-2-dataset/), the most important information is stored in:
    
- **startFrame**: start frame, with respect to the video
- **endFrame**: end frame, with respect to the video
- **iaMat**: label matrix of attributes
- **attributeMap**: Name of attributes
- **subject**: id of human subject, determines training, val, test split
- **dish**: dish/topic/composite id

Although they consider these topics the most important, below we present the content of each topic.

## Key: dataset

This attribute shows the name of the dataset created.

In [3]:
anno = data['/annos/dataset'].value.flatten()
dataset = anno.astype('uint8').tobytes().decode('utf8')
print('Name of the dataset: {}'.format(dataset))

Name of the dataset: cooking12eccv


## Key: bgClassName

Background class name presents the name used for the background class in detection.

In [4]:
anno = data['/annos/bgClassName'].value.flatten()
bgClassName = anno.astype('uint8').tobytes().decode('utf8')
print('Background class name: {}'.format(bgClassName))

Background class name:  Background activity


## Key: bgLabel

Background label describes the class label id used for background class in detection.

In [5]:
anno = data['/annos/bgLabel'].value.flatten()
bgLabel = int(anno)
print('Label for background: {}'.format(bgLabel))

Label for background: 1


## Key: bgLabelAttr

Background label attribute presents the label id of the background class used for detection.

In [6]:
anno = data['/annos/bgLabelAttr'].value.flatten()
bgLabelAttr = int(anno)
print('Label attribute for background: {}'.format(bgLabelAttr))

Label attribute for background: 0


## Key: minBgWindow

Minimum background window contains the number of minimal windows. It is only set for detection.

In [7]:
minBgWindow = data['/annos/minBgWindow'].value.flatten()
print('Minimum background window: {}'.format(int(minBgWindow)))

Minimum background window: 0


## Key: frameRate

frameRate field shows the frame rate of all videos.

In [8]:
frameRate = data['/annos/frameRate'].value.flatten()
print('Frame rate of all videos: {}'.format(frameRate[0]))

Frame rate of all videos: 29.4


## Key: attrFields

Attribute fields describe the attributes that are used in the dataset.

In [9]:
attrFields = convert_hd52srt(data, '/annos/attrFields')
print('Attribute fields key contains {} elements.'.format(len(attrFields)))
print (np.array(attrFields))

Attribute fields key contains 4 elements.
[u'activity' u'tool' u'ingredients' u'containers']


## Key: fields

Fields describe all the fields that are used. This is similar to `attrFields` key.   

In [10]:
fields = convert_hd52srt(data, '/annos/fields')
print('Attribute fields key contains {} elements.'.format(len(fields)))
print (np.array(fields))

Attribute fields key contains 4 elements.
[u'activity' u'tool' u'ingredients' u'containers']


## Key: annoFileMap

Annotation for the file map contains the entire path for each video of the dataset. It contains 273 elements (paths). In the file name `s` represents the id of the subject that performs the video and `d` the id of the dish being executed. Thus, `s07-d72` represents the 7 subject doing the 72 dish. 

In [11]:
annoFileMap = convert_hd52srt(data, '/annos/annoFileMap')
print('File map contains {} elements.'.format(len(annoFileMap)))
print (np.array(annoFileMap[:10]))

File map contains 273 elements.
[u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s07-d72-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s08-d02-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s08-d04-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s08-d11-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s08-d14-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s10-d02-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s10-d10-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s10-d11-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s11-d01-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s11-d06-cam-002.tsv']


## Key: classMap

Simmilar to the `annoFileMap`, class map contains the entire path for each video of the dataset. It contains 273 elements (paths). In the file name `s` represents the id of the subject that performs the video and `d` the id of the dish being executed. Thus, `s07-d72` represents the 7 subject doing the 72 dish. 

In [12]:
classMap = convert_hd52srt(data, '/annos/classMap')
print('Class map contains {} elements.'.format(len(classMap)))
print (np.array(classMap[:10]))

Class map contains 273 elements.
[u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s07-d72-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s08-d02-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s08-d04-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s08-d11-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s08-d14-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s10-d02-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s10-d10-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s10-d11-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s11-d01-cam-002.tsv'
 u'/BS/MPIICookingCompositeActivities/work/MPII-Cooking-2//annotations/tsv/s11-d06-cam-002.tsv']


## Key: subject

Id of human subject for each frame. It determines training, val, and test splits.

In [13]:
subject = data['/annos/subject'].value.flatten()
print('Videos contain {} subjects.'.format(len(set(subject))))
print('List of subjects for each of the {} frames'.format(subject.shape[0]))
print(subject)
print('List of subjects for the first 30 frames')
print(subject[:30])

Videos contain 30 subjects.
List of subjects for each of the 14105 frames
[ 7.  7.  7. ... 37. 37. 37.]
List of subjects for the first 30 frames
[7. 7. 7. 7. 7. 7. 8. 8. 8. 8. 8. 8. 8. 8. 8. 8. 8. 8. 8. 8. 8. 8. 8. 8.
 8. 8. 8. 8. 8. 8.]


## Key: dish

Dish presents the list of all dish ids annotated for each frame in the dataset. Thus, a total of 14.105 dishes are described with a list of 59 unique dishes.

In [14]:
anno = data['/annos/dish'].value.flatten()
dish = [int(id) for id in anno]
print('Dish contains {} elements.'.format(len(dish)))
print(np.array(dish))
print('')
setdish = set(dish)
print('Dish contains {} unique elements.'.format(len(setdish)))
print(sorted(setdish))

Dish contains 14105 elements.
[72 72 72 ... 74 74 74]

Dish contains 59 unique elements.
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 21, 23, 24, 25, 26, 27, 28, 29, 31, 32, 34, 35, 36, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61, 62, 63, 65, 67, 68, 69, 70, 71, 72, 73, 74]


## Key: fileName

File name identifies the file containing the video. It has as identifier the structure: `sSS-dDD-cam-002` where `SS` is the subject and `DD` is the dish.

In [15]:
fileName = convert_hd52srt(data, '/annos/fileName')
print('File name vector contains {} elements.'.format(len(fileName)))
print (np.array(fileName))
print('')
setName = set(fileName)
print('File name vector contains {} unique elements.'.format(len(setName)))

File name vector contains 14105 elements.
[u's07-d72-cam-002' u's07-d72-cam-002' u's07-d72-cam-002' ...
 u's37-d74-cam-002' u's37-d74-cam-002' u's37-d74-cam-002']

File name vector contains 273 unique elements.


## Key: fileId

File ID presents the id for each file in the dataset (see `fileName`).

In [16]:
anno = data['/annos/fileId'].value.flatten()
fileId = [int(id) for id in anno]
print('File ID vector contains {} elements.'.format(len(fileId)))
print(np.array(fileId))

File ID vector contains 14105 elements.
[  1   1   1 ... 273 273 273]


## Key: fileNameId

File mame ID is a string identifying the video file without camera info, with the structured `sSS-dDD`, where `SS` is the subject and `DD` the dish.

In [17]:
fileNameId = convert_hd52srt(data, '/annos/fileNameId')
print('File name vector contains {} elements.'.format(len(fileNameId)))
print (np.array(fileNameId))
print('')
setNameId = set(fileNameId)
print('File name vector contains {} unique elements.'.format(len(setNameId)))
print(sorted(setNameId))

File name vector contains 14105 elements.
[u's07-d72' u's07-d72' u's07-d72' ... u's37-d74' u's37-d74' u's37-d74']

File name vector contains 273 unique elements.
[u's07-d72', u's08-d02', u's08-d04', u's08-d11', u's08-d14', u's10-d02', u's10-d10', u's10-d11', u's11-d01', u's11-d06', u's11-d11', u's11-d12', u's11-d13', u's11-d14', u's12-d05', u's12-d07', u's12-d09', u's12-d10', u's12-d14', u's13-d08', u's13-d09', u's13-d11', u's13-d12', u's13-d13', u's13-d21', u's13-d23', u's13-d25', u's13-d27', u's13-d28', u's13-d31', u's13-d40', u's13-d45', u's13-d48', u's13-d49', u's13-d52', u's13-d54', u's13-d63', u's14-d08', u's14-d09', u's14-d11', u's14-d26', u's14-d27', u's14-d29', u's14-d32', u's14-d35', u's14-d36', u's14-d39', u's14-d43', u's14-d46', u's14-d50', u's14-d51', u's14-d61', u's14-d65', u's15-d03', u's15-d07', u's15-d14', u's15-d23', u's15-d24', u's15-d26', u's15-d29', u's15-d35', u's15-d46', u's15-d61', u's15-d70', u's15-d73', u's16-d01', u's16-d06', u's16-d09', u's16-d11', u's17-d02

## Key: idxInFile

idxInFile contains the index of the annotation in the corresponding video.

In [18]:
idxInFile = data['/annos/idxInFile'].value.flatten()
print('IDs vector contains {} elements.'.format(idxInFile.shape))
print('The first 30 elements of the vector:')
print(idxInFile[:30])

IDs vector contains (14105,) elements.
The first 30 elements of the vector:
[ 1.  2.  3.  4.  5.  6.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12.
 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.]


## Key: activity

Activity contains all actions of the dataset. Each frame contains a single action that is being executed. A total of 14.105 frames containing 87 unique activities are annotated.

In [19]:
att = convert_hd52srt(data, '/annos/activity')
print('Activity contains {} elements.'.format(len(att)))
print (np.array(att))
print('')
setatt = set(att)
print('Activity contains {} unique elements.'.format(len(setatt)))
print(sorted(set(att)))

Activity contains 14105 elements.
[u'enterV' u'take outV' u'take outV' ... u'assembleV' u'unplugV'
 u'push downV']

Activity contains 87 unique elements.
[u'addV', u'apply plasterV', u'arrangeV', u'assembleV', u'change temperatureV', u'chopV', u'cleanV', u'closeV', u'cut apartV', u'cut diceV', u'cut off endsV', u'cut out insideV', u'cut stripesV', u'cutV', u'dryV', u'enterV', u'fillV', u'flipV', u'foldV', u'gatherV', u'grateV', u'grindV', u'hangV', u'lockV', u'mixV', u'moveV', u'open capV', u'open closeV', u'open eggV', u'open tinV', u'openV', u'packageV', u'peelV', u'plugV', u'pokeV', u'pourV', u'pressV', u'pull apartV', u'pull upV', u'pullV', u'pureeV', u'purgeV', u'push downV', u'put inV', u'put lidV', u'put onV', u'put rubber bandV', u'readV', u'remove from packageV', u'remove labelV', u'remove rubber bandV', u'rip offV', u'rip openV', u'rip-offV', u'rollV', u'scratch offV', u'screw closeV', u'screw openV', u'shakeV', u'shapeV', u'sharpenV', u'sliceV', u'smellV', u'spiceV', u'sprea

## Key: attributeMap

Attribute map contains the name of all attributes used in the dataset. It includes the attributes that are verbs (actions/activities) and names of objects. Actions are represented with the letter `V` in the end of the word.

In [20]:
attributeMap = convert_hd52srt(data, '/annos/attributeMap')
print('Attribute map contains {} elements.'.format(len(attributeMap)))
print (attributeMap)

Attribute map contains 222 elements.
[u'addV', u'apple', u'arils', u'arrangeV', u'asparagus', u'avocado', u'bag', u'baking-paper', u'baking-tray', u'blender', u'bottle', u'bowl', u'box-grater', u'bread', u'bread-knife', u'broccoli', u'bun', u'bundle', u'butter', u'carafe', u'carrot', u'cauliflower', u'change temperatureV', u'cheese', u'chefs-knife', u'chilli', u'chive', u'chocolate', u'chopV', u'cleanV', u'closeV', u'coffee', u'coffee-container', u'coffee-machine', u'coffee-powder', u'colander', u'cooking-spoon', u'corn', u'counter', u'cream', u'cucumber', u'cup', u'cupboard', u'cut apartV', u'cut diceV', u'cut off endsV', u'cut out insideV', u'cut stripesV', u'cutV', u'cutting-board', u'dough', u'drawer', u'dryV', u'egg', u'eggshell', u'electricity-column', u'electricity-plug', u'enterV', u'fig', u'fillV', u'filter-basket', u'finger', u'flat-grater', u'flower-pot', u'food', u'fork', u'fridge', u'front-peeler', u'frying-pan', u'garbage', u'garlic-bulb', u'garlic-clove', u'gatherV', u'g

## Key: className

Class name contains all class names for a given frame. Thus, a total of 14.105 class names (one for each frame) are described and each frame contains all the classes presented in it. For example, the frame `id=1` contains the class `take outVhandbottlefridge` that represents the verb `take outV` and the objects `hand`, `bottle` and `fridge`.

In [22]:
className = convert_hd52srt(data, '/annos/className')
print('Class name contains {} elements.'.format(len(className)))
print (np.array(className))
print('')
setclassName = set(className)
print('Class name contains {} unique elements.'.format(len(setclassName)))
print('The 20 first class names are:')
print(sorted(setclassName)[:20])

Class name contains 14105 elements.
[u'enterV' u'take outVhandbottlefridge'
 u'take outVhandglass mugcupboardcounter' ...
 u'assembleVhandcarafecoffee machine'
 u'unplugVhandelectricity plugelectricity column'
 u'push downVhandelectricity column']

Class name contains 3268 unique elements.
The 20 first class names are:
[u'addVbowleggbowlfrying pan', u'addVbowlpotatobowlpot', u'addVbowlspatulacarrotbowlfrying pan', u'addVbowlspatulacreambowlfrying pan', u'addVcappuccino powder bagcappuccino powdercappuccino powder bagcup', u'addVchefs knifebutterfrying pan', u'addVchefs knifebutterplastic boxfrying pan', u'addVchefs knifecauliflowercutting boardcolander', u'addVchefs knifecauliflowercutting boardpot', u'addVchefs knifechiveoreganoparsleycutting boardbowl', u'addVchefs knifecutting boardbroccolicutting boardpot', u'addVchefs knifecutting boardleekcutting boardfrying pan', u'addVchefs knifehandchivecutting boardbowl', u'addVchefs knifehandhammushroomcutting boardbowl', u'addVchefs knifeha

## Key: iaMat

Attributes matrix contains the attributes of `attributeMap` in each frame. It is represented by a matrix of ( `attributeMap` x frames), resulting in a (222, 14.105) binary matrix, where `0` indicates that the attribute $x$ does not occur in the frame $y$.

In [23]:
iaMat = data['/annos/iaMat'].value
print('Matrix of attributes has the shape: {}'.format(iaMat.shape))
print(iaMat)

Matrix of attributes has the shape: (222, 14105)
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [24]:
frame1 = iaMat.T[1]
idels = np.argwhere(frame1).flatten()
print('Frame 1 has the following attributes:')
for id in idels:
    print('- {}'.format(attributeMap[id]))
    
# In order to verify whether it is correct, we can check the className vector at position 1
print('Frame 1 has the following attributes: {}'.format(className[1]))

Frame 1 has the following attributes:
- bottle
- fridge
- hand
- take outV
Frame 1 has the following attributes: take outVhandbottlefridge


## Key: startFrame

Start frame described the id of the start frame, with respect to the video corresponding to a certain action.

In [25]:
anno = data['/annos/startFrame'].value.flatten()
startFrame = [int(id) for id in anno]
print('Start frame vector contains {} elements.'.format(len(startFrame)))
print(np.array(startFrame))

Start frame vector contains 14105 elements.
[  194   270   454 ... 10953 11019 11064]


## Key: startTimeSeconds

Start time of each action in seconds, with respect to the video.    

In [26]:
anno = data['/annos/startTimeSeconds'].value.flatten()
startTimeSeconds = [int(id) for id in anno]
print('Start time vector contains {} elements.'.format(len(startTimeSeconds)))
print(np.array(startTimeSeconds))

Start time vector contains 14105 elements.
[  6   9  15 ... 372 374 376]


## Key: endFrame

End frame described the id of the last frame, with respect to the video corresponding to a certain action.

In [27]:
anno = data['/annos/endFrame'].value.flatten()
endFrame = [int(id) for id in anno]
print('End frame vector contains {} elements.'.format(len(endFrame)))
print(np.array(endFrame))

End frame vector contains 14105 elements.
[  243   414   725 ... 10994 11045 11094]


## Key: endTimeSeconds

End time of each action in seconds, with respect to the video.    

In [28]:
anno = data['/annos/endTimeSeconds'].value.flatten()
endTimeSeconds = [int(id) for id in anno]
print('End time vector contains {} elements.'.format(len(endTimeSeconds)))
print(np.array(endTimeSeconds))

End time vector contains 14105 elements.
[  8  14  24 ... 373 375 377]


## Key: labels

Labels contains ids to "something" in each frame. **Vector containing strange values**

In [29]:
labels = data['/annos/labels'].value.flatten()
print('Labels vector contains {} elements.'.format(labels.shape[0]))
print(labels)
print('Labels vector contains {} unique elements.'.format(len(set(labels))))

Labels vector contains 14105 elements.
[ 706. 2559. 2684. ...  294. 3111. 1714.]
Labels vector contains 3268 unique elements.


## Key: nImgsPerFile

Number of frames per video. **Returns a strange array**

In [30]:
nImgsPerFile = data['/annos/nImgsPerFile'].value.flatten()
print(nImgsPerFile)

[3707764736          2          1          1          1          1]


## Key: nullAttr

As said in [README](https://datasets.d2.mpi-inf.mpg.de/MPII-Cooking-2/annos/annos.mat-README.txt) site: **IGNORE**

In [31]:
nullAttr = data['/annos/nullAttr'].value.flatten()
print(nullAttr)

[0]


## Key: nullAttrLabel

As said in [README](https://datasets.d2.mpi-inf.mpg.de/MPII-Cooking-2/annos/annos.mat-README.txt) site: **IGNORE**

In [32]:
nullAttrLabel = data['/annos/nullAttrLabel'].value.flatten()
print(nullAttrLabel)

[110 117 108 108]


## Key: filters

**IGNORE**, string replacements to clean up annotations (already applied to this data). 

```diff
- HDF5 object reference!
```

## Key: tool

More annotation details, **relevant is iaMat**

```diff
- HDF5 object reference!
```

## Key: toolProperties

More annotation details, **relevant is iaMat**

```diff
- HDF5 object reference!
```

## Key: containers

More annotation details, **relevant is iaMat** 

```diff
- HDF5 object reference!
```

## Key: containersDestination

More annotation details, **relevant is iaMat** 

```diff
- HDF5 object reference!
```

## Key: containersProperties

More annotation details, **relevant is iaMat**

```diff
- HDF5 object reference!
```

## Key: containersSource

More annotation details, **relevant is iaMat** 

```diff
- HDF5 object reference!
```

## Key: ingredients

More annotation details, **relevant is iaMat** 

```diff
- HDF5 object reference!
```

## Key: ingredientsProperties

More annotation details, **relevant is iaMat**

```diff
- HDF5 object reference!
```