# Resources

This notebook demonstrates how to encapsulate related resources into a single Elsa object.

In the repository already present are the ground truth annotations, label metadata, and image metadata for the 924 images and 
4.3K annotated bounding boxes included by ELSA. Although these are loaded by default, you may also pass your own resources
to the constructor to leverage ELSA on your own data.

## Parameters:
### Images
A path to the images metadata DataFrame.

It must contain the following columns:
- ifile: string identifier for each file e.g. GSV_1
- file: name of each file e.g. 103331102130110202_x4_cropped

| ifile     | file                             |
|-----------|----------------------------------|
| GSV_190   | 7AEzt7nUCjgfLmZ5PJXUkg_right_cropped |
| GSV_177   | 6uDfXfuXpWk4KnnsrDup8w_left_cropped  |

<br><br><br>

### Labels
A path to the labels metadata DataFrame.

It must contain the following columns:
- label: string the label represents e.g. 'alone' or 'group'
- ilabel: integer identifier of the label 

| label           | id  |
|-----------------|-----|
| alone           | 0   |
| group           | 1   |
| couple/2people  | 2   |
| sitting         | 3   |
| standing        | 4   |
| walking         | 5   |

<br><br><br>
  
### Truth
A path to the ground truth annotations DataFrame.

It must contain the following columns:
- ilabel: integer identifier of the label

It must contain either column:
- ifile: string identifier for each file e.g. GSV_1
- file: name of each file

It must contain some combination of the following columns. Dependening on what has been provided, the rest are dynamically computed:
- Absolute coordinates (in pixels):
    - x: x-coordinate of the bounding box in pixels
    - y: y-coordinate of the bounding box in pixels
    - height: height of the bounding box in pixels
    - width: width of the bounding box in pixels
    - xmin: minimum x-coordinate of the bounding box in pixels
        - w: may be used synonymously as the 'western' bound
    - ymin: minimum y-coordinate of the bounding box in pixels
        - s: may be used synonymously as the 'southern' bound
    - xmax: maximum x-coordinate of the bounding box in pixels
        - e: may be used synonymously as the 'eastern' bound
    - ymax: maximum y-coordinate of the bounding box in pixels
        - n: may be used synonymously as the 'northern' bound
- Normalized coordinates (as ratios of the image width and height):
    - normx: normalized x-coordinate of the bounding box, as a ratio of the image width
    - normy: normalized y-coordinate of the bounding box, as a ratio of the image height
    - normheight: normalized height of the bounding box, as a ratio of the image height
    - normwidth: normalized width of the bounding box, as a ratio of the image width
    - normxmin: normalized minimum x-coordinate of the bounding box, as a ratio of the image width
        - normw: may be used synonymously as the normalized 'western' bound
    - normymin: normalized minimum y-coordinate of the bounding box, as a ratio of the image height
        - norms: may be used synonymously as the normalized 'southern' bound
    - normxmax: normalized maximum x-coordinate of the bounding box, as a ratio of the image width
        - norme: may be used synonymously as the normalized 'eastern' bound
    - normymax: normalized maximum y-coordinate of the bounding box, as a ratio of the image height
        - normn: may be used synonymously as the normalized 'northern' bound
                        
| ifile | normx | normy | normwidth | normheight | ilabel |
|-------|-------|-------|-----------|------------|--------|
| BSV_60 | 0.51  | 0.50  | 0.63      | 0.54       | 13     |
| BSV_60 | 0.51  | 0.50  | 0.63      | 0.54       | 22     |

<br><br><br>

### Files
A path, or list of paths, to the directories containing the images, with filenames matching the 'file' column in the images metadata.

Not included in the repository, however are the images themselves, which must be downloaded separately. 
These literal image files must then be passed as a parameter if you are to do any visualization or prediction tasks. The 
[README](https://github.com/redacted/SIRiUS/blob/main/README.md) contains instructions on how to download the images. 
ELSA includes annotations for images provided by both Google Street View and Bing Street Viwer; to include both directories
please pass them as a tuple:

<br><br><br>
### Quiet
Silence warnings.

<br><br><br>
 

## Using the Provided Resources

We have provided constructors for the Bing, Google, and unified (both Bing and Google) datasets. You may use the Elsa.from_bing, 
Elsa.from_google, and Elsa.from_unified constructors respectively to load these datasets. As mentioned with with the files parameter,
you must pass the path or paths to the directories containing the images. 

In [1]:
from elsa import Elsa
# Modify with your file paths
bing = '/Archive/bing'
google = '/Archive/google'
files = bing, google
elsa = Elsa.from_unified(files=files, quiet=True)
elsa

BSV_0
BSV_1
BSV_10
BSV_100
BSV_101
...
GSV_95
GSV_96
GSV_97
GSV_98
GSV_99


It might get annoying passing the image directories! In [src/elsa/local.py](https://github.com/redacted/SIRiUS/blob/HEAD/src/elsa/local.py) 
you can include your username and a string to configure the default directories for your particular machine. In my case, I have:

In [None]:
config = {
    "files": {
        "bing": {
            "redacted": "/home/redacted/Downloads/Archive/bing",
        },
        "google": {
            "redacted": "/home/redacted/Downloads/Archive/google",
        }
    }
}



The instantiation of Elsa wraps these resources for you to easily interact with. Elsa will warn you about files that are not present across the metadata, files, or annotations.
Now, I can instantiate Elsa without any parameters:

In [2]:
from elsa import Elsa

elsa = Elsa.from_unified(quiet=True)
elsa

BSV_0
BSV_1
BSV_10
BSV_100
BSV_101
...
GSV_95
GSV_96
GSV_97
GSV_98
GSV_99



## Using Your Own Resources

For your own purposes you may pass your own ground truth annotations, image metadata, label metadata, and image files to the base Elsa.from_resources
constructor. Please follow data structure of the resources included in the repository. For including a single dataset, refer to the 
[truth,](https://github.com/redacted/SIRiUS/blob/main/gt_data/triple_inspected_May23rd/bing/bing_gt.csv) 
[images,](https://github.com/redacted/SIRiUS/blob/main/gt_data/triple_inspected_May23rd/bing/images.csv) 
and [labels](https://github.com/redacted/SIRiUS/blob/main/gt_data/triple_inspected_May23rd/bing/label_idx.csv) 
available for the Bing dataset in the repository. For including a unified dataset, refer to the 
[truth,](https://github.com/redacted/SIRiUS/blob/main/gt_data/triple_inspected_May23rd/merged/label_per_box_sanity_checked_removed_unwanted_labels_unified_labels_after_distr_thresholding.csv)
[images,](https://github.com/redacted/SIRiUS/blob/main/gt_data/triple_inspected_May23rd/merged/images.csv)
and [labels](https://github.com/redacted/SIRiUS/blob/main/gt_data/triple_inspected_May23rd/merged/label_id_dict_after_distr_thresholding.csv) 
available for the unified dataset in the repository.


In [5]:
from elsa import Elsa

elsa = Elsa.from_resources(
    truth=...,
    images=...,
    labels=...,
    files=...,
)

## Truth
`elsa.truth` contains the ground truth annotations and the relevant data. Any additional columns included in the CSV will be included as well.

The following columns are dynamically computed by adding the 'nfile' column which assigns an integer to each file. These are necessary for performing geometric operations while ensuring no file is overlapping. 
- fx: "file x" or the x-coordinate of the bounding box, including the nfile offset
- fy: "file y" or the y-coordinate of the bounding box, including the nfile offset
- fw: "file western" or the 'western' bound, or minimum x-coordinate of the bounding box, including the nfile offset
- fs: "file southern" or the 'southern' bound, or minimum y-coordinate of the bounding box, including the nfile offset
- fe: "file eastern" or the 'eastern' bound, or maximum x-coordinate of the bounding box, including the nfile offset
- fn: "file northern" or the 'northern' bound, or maximum y-coordinate of the bounding box, including the nfile offset


The following columns are dynamically computed:
- ibox: integer identifier for each unique bounding box in the ground truth annotations. Annotations with the same bounds and file will have the same ibox.
- ilabels: ordered tuple of the label IDs representing a given combo. For example, if the labels metadata contains the mapping 'person':1, 'walking':5, a bounding box representing 'person walking' has the ilabels (0, 5). These are our "classes" in this open set classification problem.
- iclass: integer identifier of each ilabels, or "class" in the open set classification problem; for example "person walking" or (0, 5) may represent class 0.
- iann: unique integer identifier of each annotation, or row in the annotation DataFrame
- cat: category of the label: 'condition', 'state', 'activity', or 'others'


The following methods are of interest:
- include: add labels to the annotations
- exclude: remove labels from the annotations



In [3]:
elsa.truth

Unnamed: 0_level_0,ifile,normx,normy,normwidth,normheight,ilabel,is_challenging,data_source,unique_ifile,num_labels,...,label,normxmin,normw,normymin,norms,normxmax,norme,normymax,normn,ibox
iann,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,BSV_0,0.015832,0.339149,0.029275,0.226827,0,False,BSV,BSV_0,3,...,alone,0.001195,0.001195,0.225736,0.225736,0.030469,0.030469,0.452563,0.452563,0
1,BSV_0,0.015832,0.339149,0.029275,0.226827,21,False,BSV,BSV_0,3,...,phone interaction,0.001195,0.001195,0.225736,0.225736,0.030469,0.030469,0.452563,0.452563,0
2,BSV_0,0.015832,0.339149,0.029275,0.226827,5,False,BSV,BSV_0,3,...,walking,0.001195,0.001195,0.225736,0.225736,0.030469,0.030469,0.452563,0.452563,0
3,BSV_0,0.036444,0.352781,0.054964,0.223555,0,False,BSV,BSV_0,2,...,alone,0.008962,0.008962,0.241003,0.241003,0.063926,0.063926,0.464558,0.464558,1
4,BSV_0,0.036444,0.352781,0.054964,0.223555,5,False,BSV,BSV_0,2,...,walking,0.008962,0.008962,0.241003,0.241003,0.063926,0.063926,0.464558,0.464558,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10439,GSV_365,0.860038,0.305085,0.042595,0.170039,5,False,GSV,GSV_365,3,...,walking,0.838741,0.838741,0.220066,0.220066,0.881335,0.881335,0.390105,0.390105,4333
10440,GSV_365,0.860038,0.305085,0.042595,0.170039,32,False,GSV,GSV_365,3,...,elderly,0.838741,0.838741,0.220066,0.220066,0.881335,0.881335,0.390105,0.390105,4333
10441,GSV_365,0.919853,0.347953,0.059707,0.195907,2,False,GSV,GSV_365,3,...,couple/2people,0.890000,0.890000,0.250000,0.250000,0.949707,0.949707,0.445907,0.445907,4334
10442,GSV_365,0.919853,0.347953,0.059707,0.195907,5,False,GSV,GSV_365,3,...,walking,0.890000,0.890000,0.250000,0.250000,0.949707,0.949707,0.445907,0.445907,4334


## Images
'elsa.images' contains the images metadata

The following columns are dynamically computed:
- width: width of the image in pixels
- height: height of the image in pixels
- path: path to the image file
- nfile: integer identifier for each unique file



In [4]:
elsa.images

Unnamed: 0_level_0,file,download
ifile,Unnamed: 1_level_1,Unnamed: 2_level_1
BSV_265,023301210220101210_x4_cropped,bing
BSV_323,020310023302030102_x4_cropped,bing
BSV_272,103331120131131310_x4_cropped,bing
BSV_258,103330233121001302_x4_cropped,bing
BSV_580,103330322201212202_x4_cropped,bing
...,...,...
GSV_182,5K16xk-KhmDX0eAWpeOl2g_right_cropped,google
GSV_154,7jNcFc30HLtiMBYXaMEjWw_right_cropped,google
GSV_284,EgyZMQF7WifMzJQt29dGhg_right_cropped,google
GSV_67,0H2a1nFUT8lpS1Ujs8N0AQ_right_cropped,google


## Files

'elsa.files' contains the filenames and their paths according to which directories were passed.

The following columns are dynamically computed:
    - nboxes: number of bounding boxes in the image
    

In [4]:
#  elsa.files contains the paths and filenames from the directories included. 
elsa.files

Unnamed: 0_level_0,path,file
ifile,Unnamed: 1_level_1,Unnamed: 2_level_1
BSV_545,/home/redacted/Downloads/Archive/bing/10333110...,103331102130110202_x4_cropped
BSV_7,/home/redacted/Downloads/Archive/bing/10333210...,103332101110110302_x4_cropped
BSV_584,/home/redacted/Downloads/Archive/bing/10333032...,103330322130221310_x4_cropped
BSV_615,/home/redacted/Downloads/Archive/bing/10333032...,103330322123201010_x4_cropped
BSV_327,/home/redacted/Downloads/Archive/bing/02100033...,021000331300322010_x4_cropped
...,...,...
GSV_72,/home/redacted/Downloads/Archive/google/mKQxoW...,mKQxoWgznYIaFhQTkRIatA_right_cropped
GSV_53,/home/redacted/Downloads/Archive/google/mlItm_...,mlItm_4v1nITVQPzLSMAYw_right_cropped
GSV_336,/home/redacted/Downloads/Archive/google/ArgKub...,ArgKubZ99aQHTeJO_4PpOQ_left_cropped
GSV_109,/home/redacted/Downloads/Archive/google/0LnYT_...,0LnYT_xHjfVQqocbdTlJPw_right_cropped


## Labels

The following columns are dynamically computed:
- cat: category of the label: 'condition', 'state', 'activity', or 'others'


In [5]:
# elsa.labels contains the label metadata
elsa.labels

Unnamed: 0_level_0,ilabel
label,Unnamed: 1_level_1
alone,0
group,1
couple/2people,2
sitting,3
standing,4
walking,5
running,6
biking,7
mobility aids,8
riding carriage,9
