Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
92db07f
Added get_detections_from_video_capture function.
ZachCafego Jun 9, 2023
5dedba1
Added create_tracks, type hints.
ZachCafego Jun 12, 2023
40cab2f
Adapt component to support video, added unittest
ZachCafego Jun 14, 2023
c6013a3
Updated get_classifications to prevent repeats of class names
ZachCafego Jun 15, 2023
fbae5a0
Added rollup unittest, modified video unittest.
ZachCafego Jun 21, 2023
d6205ac
Reverting back testing code.
ZachCafego Jun 29, 2023
dda1b54
New changes
ZachCafego Jul 7, 2023
861f234
Updated component to support batching.
ZachCafego Jul 11, 2023
a2336b2
Merge branch 'develop' into feat/clip-video
ZachCafego Jul 11, 2023
361795b
Fixed wonky git merge.
ZachCafego Jul 11, 2023
ad2e162
Fixed errors regarding cropped images and video files
ZachCafego Jul 13, 2023
cd1b1ca
Made changes to README
ZachCafego Jul 20, 2023
8f7861f
Added support for multiple CLIP models.
ZachCafego Sep 29, 2023
c6f83f5
Updated README file.
ZachCafego Sep 29, 2023
819fa8f
Fixed job property descriptions.
ZachCafego Sep 29, 2023
585c83d
Added tag to openmpf_clip_detection_triton_models image
ZachCafego Sep 29, 2023
f3e246e
Addressing PR changes.
ZachCafego Jan 5, 2024
7a271f2
More updated changes for PR
ZachCafego Jan 22, 2024
52408a4
Merge branch 'develop' into feat/clip-video
ZachCafego Jan 22, 2024
2227fcc
Update to PR.
ZachCafego Feb 9, 2024
a8ff906
Merge branch 'develop' into feat/clip-video
jrobble Feb 21, 2024
7ac1cd5
Changes for PR
ZachCafego Feb 29, 2024
9de71d4
Merge branch 'feat/clip-video' of https://github.com/openmpf/openmpf-…
ZachCafego Feb 29, 2024
aa1b5fe
Added comment
ZachCafego Mar 1, 2024
ce5ab60
Merge branch 'develop' into feat/clip-video
jrobble Mar 4, 2024
e31e379
Remove debug. Fix test.
jrobble Mar 5, 2024
2f5596c
Replace "detections" with "tracks" in log message.
jrobble Mar 28, 2024
66899e1
Check Triton model.
jrobble Mar 28, 2024
5adaf2b
Merge branch 'develop' into jrobble/clip-video-2
jrobble Mar 28, 2024
d6adb9f
Improve error handling.
jrobble Mar 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ hs_err_pid*
*.devcontainer*

target/
venv/

# CMake Files
CMakeCache.txt
Expand Down Expand Up @@ -60,3 +61,6 @@ cmake-build-release/
# Python
*.egg-info
*.pyc

*.private
venv
3 changes: 2 additions & 1 deletion python/ClipDetection/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,11 @@
ARG MODELS_REGISTRY=openmpf/
ARG BUILD_REGISTRY
ARG BUILD_TAG=latest
FROM ${MODELS_REGISTRY}openmpf_clip_detection_models:7.2.0 as models
FROM ${MODELS_REGISTRY}openmpf_clip_detection_models:8.0.0 as models
FROM ${BUILD_REGISTRY}openmpf_python_executor_ssb:${BUILD_TAG}

COPY --from=models /models/ViT-B-32.pt /models/ViT-B-32.pt
COPY --from=models /models/ViT-L-14.pt /models/ViT-L-14.pt

RUN --mount=type=tmpfs,target=/var/cache/apt \
--mount=type=tmpfs,target=/var/lib/apt/lists \
Expand Down
44 changes: 42 additions & 2 deletions python/ClipDetection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ This repository contains source code for the OpenMPF CLIP detection component. C

The following are the properties that can be specified for the component. Each property has a default value and so none of them necessarily need to be specified for processing jobs.

- `MODEL_NAME`: Specifies the CLIP model that is loaded and used by the component. The only supported models are 'ViT-L/14' (the default model) and 'ViT-B/32'.

- `NUMBER_OF_CLASSIFICATIONS`: Specifies how many of the top classifications you want to return. The default value is set to 1, and so you'll only see the classification with the greatest confidence.

- `CLASSIFICATION_PATH`: If specified, this allows the user to give the component a file path to their own list of classifications in a CSV file, if the COCO or ImageNet class lists aren't of interest. See below for the formatting that's required for that file.
Expand All @@ -14,16 +16,18 @@ The following are the properties that can be specified for the component. Each p

- `TEMPLATE_PATH`: If specified, this allows the user to give the component a file path to their own list of templates. See below for the formatting that's required for that file. The OpenAI developers admitted that the process of developing templates was a lot of trial and error, so feel free to come up with your own!

- `NUMBER_OF_TEMPLATES`: There are three template files that are included in the component, with the number of templates in each being 1, 7, and 80. The one template is a basic template, while the 7 and 80 come from the OpenAI team when trying to [improve performance](https://github.com/openai/CLIP/blob/main/notebooks/Prompt_Engineering_for_ImageNet.ipynb) on the ImageNet dataset. The default value is 80, while 1 and 7 are the only other valid inputs. Also this property is overridden if a `TEMPLATE_PATH` is specified.
- `TEMPLATE_TYPE`: There are three template files that are included in the component, with the number of templates in each being 1, 7, and 80. The one template is a basic template, while the 7 and 80 come from the OpenAI team when trying to [improve performance](https://github.com/openai/CLIP/blob/main/notebooks/Prompt_Engineering_for_ImageNet.ipynb) on the ImageNet dataset. The default value is 'openai_80', while 'openai_1' and 'openai_7' are the only other valid inputs. Also this property is overridden if a `TEMPLATE_PATH` is specified.

- `ENABLE_CROPPING`: A boolean toggle to specify if the image is to be cropped into 144 images of size 224x224 which cover all areas of the original. By default, this is set to true. This technique is described Section 7 of the paper "[Going deeper with convolutions](https://arxiv.org/abs/1409.4842)" from Szegedy, et al.
- `ENABLE_CROPPING`: A boolean toggle to specify if the image is to be cropped into 144 images of size 224x224 which cover all areas of the original. By default, this is set to true. This technique is described in Section 7 of the paper "[Going deeper with convolutions](https://arxiv.org/abs/1409.4842)" from Szegedy, et al.

- `ENABLE_TRITON`: A boolean toggle to specify whether the component should use a Triton inference server to process the image job. By default this is set to false.

- `INCLUDE_FEATURES`: A boolean toggle to specify whether the `FEATURE` detection property is included with each detection. By default, this is set to false.

- `TRITON_SERVER`: Specifies the Triton server `<host>:<port>` to use for inferencing. By default, this is set to 'clip-detection-server:8001'.

- `DETECTION_FRAME_BATCH_SIZE`: Specifies the batch size when processing video files. By default, this is set to 64.

## Detection Properties

Returned `ImageLocation` objects have the following members in their `detection_properties`:
Expand Down Expand Up @@ -54,6 +58,42 @@ tench,"tench, Tinca tinca"
kite (bird of prey),kite
magpie,magpie
```
# Non-Triton Performance
The table below shows the performance of this component on a NVIDIA Tesla V100 32GB GPU, for varying batch sizes with both models:
| Model Name | Batch Size | Total Time (seconds) | Average Time per Batch (seconds) | Average Images per Second |
|------------|------------|----------------------|----------------------------------|---------------------------|
| ViT-B/32 | 16 | 38.5732 | 0.04311 | 371.1126 |
| ViT-B/32 | 32 | 37.3478 | 0.08349 | 383.289 |
| ViT-B/32 | 64 | 34.6141 | 0.1548 | 413.5598 |
| ViT-B/32 | 128 | 35.897 | 0.321 | 398.7798 |
| ViT-B/32 | 256 | 33.5689 | 0.6003 | 426.4364 |
| ViT-B/32 | 512 | 36.3621 | 1.3006 | 393.6791 |
| ViT-L/14 | 16 | 108.6101 | 0.1214 | 131.8017 |
| ViT-L/14 | 32 | 103.8613 | 0.2322 | 137.828 |
| ViT-L/14 | 64 | 101.1478 | 0.4522 | 141.5256 |
| ViT-L/14 | 128 | 102.0473 | 0.9125 | 140.2781 |
| ViT-L/14 | 256 | 99.6637 | 1.7823 | 143.633 |
| ViT-L/14 | 512 | 105.8889 | 3.7873 | 135.1889 |

# Triton Performance
The table below shows the performance of this component with Triton on a NVIDIA Tesla V100 32GB GPU, for varying batch sizes:
| Model Name | Batch Size | VRAM Usage (MiB) | Total Time (seconds) | Average Time per Batch (seconds) | Average Images per Second |
|------------|------------|------------------|----------------------|----------------------------------|---------------------------|
| ViT-B/32 | 16 | 1249 | 23.9591 | 0.02678 | 597.4765 |
| ViT-B/32 | 32 | 1675 | 20.1931 | 0.04514 | 708.9055 |
| ViT-B/32 | 64 | 1715 | 33.08468 | 0.1479 | 432.6776 |
| ViT-B/32 | 128 | 1753 | 35.3511 | 0.3161 | 404.9379 |
| ViT-B/32 | 256 | 1827 | 33.7730 | 0.6040 | 423.8593 |
| ViT-L/14 | 16 | 1786 | 126.2017 | 0.1411 | 113.4295 |
| ViT-L/14 | 32 | 2414 | 114.7415 | 0.2565 | 124.7587 |
| ViT-L/14 | 64 | 2662 | 132.1087 | 0.5906 | 108.3577 |
| ViT-L/14 | 128 | 3150 | 140.7985 | 1.2590 | 101.6701 |
| ViT-L/14 | 256 | 3940 | 131.6293 | 2.3540 | 108.7524 |

# Future Research
* Investigate using the CLIP interrogator for determining text prompts for classification.
* Investigate methods to automate the generation of text prompts.
* [Context Optimization (CoOp)](http://arxiv.org/abs/2109.01134) and [Conditional Context Optimization (CoCoOp)](http://arxiv.org/abs/2203.05557) models a prompt's context as a set of learnable vectors that can be optimized for the classes you're looking for, with CoCoOp improving on CoOp's ability in classifying to classes unseen by CoOp in training.

# Known Issues

Expand Down
Loading