# Predict

This notebook demonstrates how to leverage the model to make object detection predictions.


In the previous segment we showed how to view the ground truth annotation boxes. Now we will use an OVD model to make predictions
on the same images so that we may perform some evaluations next. Let's initialize Elsa oncemore. Unless you have set the configuration, you must pass the image directories. 

In [1]:
from elsa import Elsa
google = "/home/redacted/Downloads/yolo/images/"
bing = '/home/redacted/Downloads/images/'
files = google, bing
elsa = Elsa.from_unified(files=files, quiet=True)
elsa

BSV_0
BSV_1
BSV_10
BSV_100
BSV_101
...
GSV_95
GSV_96
GSV_97
GSV_98
GSV_99


In [2]:
# you can also modify the code in your clone to add entries where the key is your username and the value is the path in the elsa.local.config files
from elsa.local import config
config['files']['bing'] 
config['files']['google']

{'redacted': '/home/redacted/Downloads/yolo/images/',
 'redacted': '/scratch/datasets/redacted/label_1k/old/google/images'}

By default, we are predicting with the [Open-GroundingDINO model](https://github.com/longzw1997/Open-GroundingDino).  
No parameters are required: you can start inference just by calling `elsa.predict()`! However, here are some 
parameters that you can set:

- outdir: Directory where the predictions will be saved. By default saves to /predict under the current working directory.
- batch_size: Batch size to use for inference: default is 1, which will be slow. Increase it based on your system's capacity.
- synonyms: Number of synonyms to use for each prompt
- config: Model configuration file. We use `config/cfg_odvg.py` by default. [See Open-GroundingDino.](https://github.com/longzw1997/Open-GroundingDino?tab=readme-ov-file#config)
- checkpoint: Model weights. We use `GroundingDINO-T (fine-tune)` by default. [See Open-GroundingDino.](https://github.com/longzw1997/Open-GroundingDino?tab=readme-ov-file#results-and-models)
- force: If True, will overwrite the existing predictions.
- prompts: 
    - list/Series/ndarray: boolean mask, aligned with `elsa.prompts`, selecting which prompts to use.
    - int: Number of prompts to use: 5 means the first 5 prompts will be used.
- files
    - list/Series/ndarray: boolean mask, aligned with `elsa.files`, selecting which files to predict.  
    - int: Number of files to predict: 5 means the first 5 files will be used.
    - please note this will make each prediction file a subset of a whole prediction. You would then need to use force=True to overwrite it for a full prediction.


#### Quick Test Run
Before we start predicting, let's do a quick test run with the first 2 prompts and 2 files. Force will specify that this prediction runs regardless of the contents in your output directory.

In [3]:
batch_size = 4
prompts = 2 
elsa.predict(batch_size=batch_size, prompts=prompts, files=2, force=True)

2024-11-21 18:32:35.088707: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-21 18:32:35.097329: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-21 18:32:35.099824: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
  0%|          | 0/2 [00:00<?, ?it/s]INFO     truth.unique.stacked.consumed.prompts.ilabels_string
INFO         classes.ilabels_string
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


final text_encoder_type: bert-base-uncased
load tokenizer done.


  checkpoint = torch.load(checkpoint, map_location="cpu")
  with torch.cuda.amp.autocast(enabled=False):
INFO     labels.char2label
INFO     cat2char
  with torch.cuda.amp.autocast(enabled=False):
100%|██████████| 2/2 [00:02<00:00,  1.36s/it]


Unnamed: 0_level_0,outpath,iprompt
natural,Unnamed: 1_level_1,Unnamed: 2_level_1
a person,/home/redacted/PycharmProjects/sirius/notebook...,120
an individual,/home/redacted/PycharmProjects/sirius/notebook...,121


### Full Run
By default, the output was in the 'predict' folder under the current working directory. We checked the folder and everything looks good. 
Now we can run inference on the full dataset. This will take very long, however, especially with a large batch size, so the rest of the notebook 
discusses how you may limit the number of prompts for which inference is run. If compute is not an issue, you may ignore the rest of the notebook.  

In [None]:
elsa.predict(outdir='~Downloads/predictions', batch_size=8)


## Limiting Synonymous Prompts
Each class has many synonymous prompts available to see in `elsa.prompts`. By default, all synonymous prompts are used for a class. However, this may be overkill 
for the average user and result in a batch that takes days for the user to run on a personal computer. You may use the `synonyms` parameter to limit the number 
of synonymous prompts used for each class. If you want to run inference on a personal computre and do some quick analysis not for publishing, you may want to use 
the synonyms parameter to limit the number of synonymous prompts per class to less than 5. This parameter prioritizes the most diverse prompts. For example, a class
might have the following prompts:
- a person walking
- a person strolling
- an individual walking
- an individual strolling

There may be much more prompts available by default than the user
requires, so there is a need to limit the amount of synonymous prompts.
However, it is not sufficient to simply take the first N prompts.
For example, selecting the first two prompts would result in:
- a person walking
- a person strolling

These predictions are not sufficiently varied. The `synonyms` parameter ensures the most varied synonymous prompts are used. 
In this case, the first two would be:
- a person walking
- an individual strolling


In [None]:
elsa.predict(outdir='~Downloads/predictions', batch_size=8, synonyms=4)

## Run with Specific Prompts

In the previous chapter, 02-combos.ipynb, we went over how to visualize specific ground truth annotation boxes. 
Similiarly we can specify which prompts to run inference on, so that you don't have to run the full inference, which likely would take days. If you do not wish to
run inference on the full set of prompts, there are some features available to specify which prompts for which inference is run. These mostly overlap with the `Combos` class because 
both are representing a class. 

The following columns are of interest:
- natural: "natural" language prompt: the realistic and "human" prompt that is fed to the model
- ilabels: ordered tuple of the label IDs representing a given prompt. For example, if the labels metadata contains the mapping {'person':1, 'walking':5}, a bounding box representing 'person walking' has the ilabels (0, 5). These are our "classes" in this open set classification problem.
- level: level of the prompt, or the characters, e.g. cs, csa, csao


The following methods are of interest:
- includes: True where prompt has a label or category
- excludes: False where prompt has a label or category
- contains_substring: True where prompt label contains a substring


In [10]:
prompts = elsa.prompts
prompts['natural level'.split()]

Unnamed: 0_level_0,natural,level
iprompt,Unnamed: 1_level_1,Unnamed: 2_level_1
120,a person,c
121,an individual,c
122,a human,c
81,a person sitting on a chair or a bench,cs
82,an individual sitting on a chair or a bench,cs
...,...,...
1021,two people riding a wheelchair,cs
1022,a pair on a wheelchair,cs
1023,a pair riding a wheelchair,cs
1024,two humans on a wheelchair,cs


#### Run Inference where a specific label is included
Here we create a boolean mask, selecting all the prompts that contain the 'person' label.

In [11]:
prompts = elsa.prompts.includes('person').values
elsa.prompts.natural.loc[prompts]
# elsa.predict(outdir=..., prompts=prompts)

truth.unique.stacked.consumed.prompts.natural
iprompt
120                                       a person
121                                  an individual
122                                        a human
81          a person sitting on a chair or a bench
82     an individual sitting on a chair or a bench
                          ...                     
823                   a person riding a wheelchair
824                  an individual on a wheelchair
825              an individual riding a wheelchair
826                        a human on a wheelchair
827                    a human riding a wheelchair
Name: natural, Length: 291, dtype: category
Categories (830, object): ['a child pushing a stroller and strolling to c..., 'a child pushing a stroller and walking to cro..., 'a child standing', 'a child strolling', ..., 'two people walking and riding a wheelchair', 'two people walking to cross a crosswalk', 'two people walking to cross a crosswalk and o..., 'two people walking to cro

#### Run inference for only CSAO-level prompts
We can select 
For a class or prompt to be CSAO-level it must have one of each of the following categories:
- condition
- state
- activity
- others
An example would be "A person walking to cross a crosswalk with a pet." 
- person: condition
- walking: state
- crossing crosswalk: activity
- pet: other

In [12]:
prompts = elsa.prompts.level == 'csao'
elsa.prompts.natural.loc[prompts]
# elsa.predict(outdir=..., prompts=prompts)

truth.unique.stacked.consumed.prompts.natural
iprompt
996       a person with a pet walking with a mobility aid
997       a person with a dog walking with a mobility aid
998     a person with a pet strolling with a mobility aid
999     a person with a dog strolling with a mobility aid
1000    an individual with a pet walking with a mobili...
                              ...                        
973     a pair including a child strolling to cross a ...
974     two humans including a kid walking to cross a ...
975     two humans including a child walking to cross ...
976     two humans including a kid strolling to cross ...
977     two humans including a child strolling to cros...
Name: natural, Length: 178, dtype: category
Categories (830, object): ['a child pushing a stroller and strolling to c..., 'a child pushing a stroller and walking to cro..., 'a child standing', 'a child strolling', ..., 'two people walking and riding a wheelchair', 'two people walking to cross a crosswalk', 

#### Run Inference where there are two activities
Perhaps we want to select the prompts that don't include or exclude any particular labels, but rather select the prompts which have two activities.
For this we need to interact with the individual labels of the prompts, rather than the aggregate. We can view the individual
labels of the prompts in "stacked" form by accessing `elsa.prompts.stacked`.

## Stacked

The following columns are of interest:
- ilabels: ordered tuple of the label IDs representing a given prompt. For example, if the labels metadata contains the mapping {'person':1, 'walking':5}, a bounding box representing 'person walking' has the ilabels (0, 5). These are our "classes" in this open set classification problem.
- label: name of the label that comprises the prompt
- natural: "natural" version of the label. For example, "person" becomes "a person". These are concatenated to create the natural prompt.

The following methods are of interest:
- includes: True where a stacked synonym is equivalent to a label, or belongs to a category
- excludes: False where a stacked synonym is equivalent to a label, or belongs to a category
- contains_substring: True where stacked synonym's natural label contains a substring
- get_nunique_labels: Pass a boolean mask aligned with the stacked labels from the prompts, and this will return the number of unique labels for each prompt.



In [13]:
elsa.prompts.stacked

Unnamed: 0_level_0,ilabels,label,isyn,iorder,ilabel,isyns,natural,cat,cat_char,catchars,prompt,labelchar,labelchars
iprompt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
120,"(0,)",person,0,subject,0,"(0,)",a person,condition,c,c cccccc,a person,A,A AAAAAA
121,"(0,)",individual,1,subject,0,"(1,)",an individual,condition,c,cccccccccc,an individual,A,AAAAAAAAAA
122,"(0,)",human,4,subject,0,"(4,)",a human,condition,c,ccccc,a human,A,AAAAA
81,"(0, 3)",person,0,subject,0,"(0, 75)",a person,condition,c,cccccc,a person,A,AAAAAA
81,"(0, 3)",sitting on chair or bench,75,verb,3,"(0, 75)",sitting on a chair or a bench,state,s,sssssss sssss sssss,sitting on a chair or a bench,D,DDDDDDD DDDDD DDDDD
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1023,"(2, 34)",riding wheelchair,43,verb,34,"(22, 43)",riding a wheelchair,state,s,ssssss ssssssssss,riding a wheelchair,c,cccccc cccccccccc
1024,"(2, 34)",two humans,24,subject,2,"(24, 40)",two humans,condition,c,ccc cccccc,two humans,C,CCC CCCCCC
1024,"(2, 34)",on wheelchair,40,verb,34,"(24, 40)",on a wheelchair,state,s,ssssssssss,on a wheelchair,c,cccccccccc
1025,"(2, 34)",two humans,24,subject,2,"(24, 43)",two humans,condition,c,ccc cccccc,two humans,C,CCC CCCCCC


In [4]:
# Generate a boolean mask of which stacked prompt labels include the 'activity' label
activity = elsa.prompts.stacked.includes(cat='activity').values
# Select the prompts where there are two activities for each 
activities = elsa.prompts.stacked.get_nunique_labels(activity) == 2
elsa.prompts.loc[activities, 'natural']
# elsa.predict(outdir=..., prompts=activities)

iprompt
813    an individual talking and standing to cross a ...
815    an individual chatting and standing to cross a...
816    a person talking and standing to cross a cross...
818    a person chatting and standing to cross a cros...
819    a human talking and standing to cross a crosswalk
                             ...                        
230    two people chatting and walking to cross a cro...
231    two humans talking and strolling to cross a cr...
232    two humans talking and walking to cross a cros...
235    two humans chatting and strolling to cross a c...
236    two humans chatting and walking to cross a cro...
Name: natural, Length: 121, dtype: category
Categories (830, object): ['a child pushing a stroller and strolling to c..., 'a child pushing a stroller and walking to cro..., 'a child standing', 'a child strolling', ..., 'two people walking and riding a wheelchair', 'two people walking to cross a crosswalk', 'two people walking to cross a crosswalk and o..., 'two p