-
Notifications
You must be signed in to change notification settings - Fork 3
Update CLIP component to support video #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please state which model was used. The numbers for a batch of 16 and 32 are very high. I was under the impression that Triton didn't offer much, but those numbers are significantly better than when using non-Triton for the same batch sizes (for ViT_B/32). I remember we talked about potentially using shared memory between the Triton client and server to alleviate performance loss when using Triton, but these number tell a different story. Are these numbers the same ones we talked about before? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @ZachCafego)
python/ClipDetection/README.md
line 64 at r1 (raw file):
Non-Triton Performance
The table below shows the performance of this component on a Nvidia Tesla V100 32GB GPU, for varying batch sizes with both models:
Use all caps for NVIDIA. Search for other usages in this file and update them.
python/ClipDetection/README.md
line 80 at r1 (raw file):
Please address this:
Please state which model was used.
python/ClipDetection/clip_component/clip_component.py
line 59 at r1 (raw file):
@staticmethod def _get_prop(job_properties, key, default_value, accep_values=[]):
Nitpick: Let's call this accept_values
.
python/ClipDetection/clip_component/clip_component.py
line 110 at r1 (raw file):
yield detection num_detections += 1 logger.info(f"Job complete. Found {num_detections} detection{'' if num_detections == 1 else 's'}.")
In general, we don't worry about this and just use the plural.
python/ClipDetection/clip_component/clip_component.py
line 150 at r1 (raw file):
for n, batch in batch_gen: try: detection = list(islice(wrapper.get_classifications(batch, **kwargs), n))
get_detections
would be a more accurate name for this function. Also, it can return more than one detection. Let's use this:
detections += list(islice(wrapper.get_detections(batch, **kwargs), n))
python/ClipDetection/clip_component/clip_component.py
line 450 at r1 (raw file):
new_w, new_h = (round(width * resize_ratio), round(height * resize_ratio)) # if width < self.image_size and height < self.image_size:
Please remove dead code, unless this is important enough to be turned into a comment for future reference.
python/ClipDetection/plugin-files/descriptor/descriptor.json
line 26 at r1 (raw file):
{ "name": "MODEL_NAME", "description": "Specifies which CLIP model to load for inferencing. The only available models are 'ViT-L/14' and 'ViT-B/32'.",
Please remove "only".
python/ClipDetection/plugin-files/descriptor/descriptor.json
line 91 at r1 (raw file):
}, { "name": "DETECTION_FRAME_BATCH_SIZE_TRITON",
The OcvDNN component has one property called DETECTION_FRAME_BATCH_SIZE
that it uses for both non-Triton and Triton runs (although in that case it uses a value of 16 for everything). For consistency, I think we should use one property here too. A user may forget that there is a Triton and non-Triton version of the property.
Please add "DETECTION_FRAME_BATCH_SIZE" : "32"
to the Triton actions you defined below.
python/ClipDetection/tests/data/test_video.mp4
line 0 at r1 (raw file):
Add an entry to the NOTICE
file in this directory for this piece of media. Explain what images were used to create it and their respective licenses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @ZachCafego)
python/ClipDetection/triton_server/Dockerfile
line 31 at r1 (raw file):
ARG MODELS_REGISTRY=openmpf/ FROM ${MODELS_REGISTRY}openmpf_clip_detection_triton_models:7.2.0 as models
I don't see this renamed version on Docker Hub, so I created it. We will need to delete https://hub.docker.com/r/openmpf/openmpf_clip_detection_models from Docker Hub once this PR lands to develop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @ZachCafego)
a discussion (no related file):
From the Dockerfile:
COPY --from=models /models/ViT-B-32.pt /models/ViT-B-32.pt
Please pre-install ViT-L-14 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @ZachCafego)
python/ClipDetection/clip_component/clip_component.py
line 348 at r1 (raw file):
''' def __init__(self, triton_server: int): self._model_name = 'ip_clip_512'
As mentioned in a separate comment, this needs to be configurable to ViT-B/32 or ViT-L/14.
python/ClipDetection/clip_component/clip_component.py
line 396 at r1 (raw file):
except Exception: raise mpf.DetectionException( f"Inference failed.",
If possible, the exception should contain the original exception message.
For example, when I ran CLIP TRITON IMAGENET CLASSIFICATION TASK
on this video these log messages mention the protobuf and serialization:
2023-11-28 01:49:50,289 ERROR [clip_component.py:152] - [Job 6:new_face_video.avi(0-509)] Job failed due to: Inference failed. (DetectionError.NETWORK_ERROR)
Traceback (most recent call last):
File "/opt/mpf/plugin-venv/lib/python3.8/site-packages/clip_component/clip_component.py", line 392, in get_responses
responses.append(self._triton_client.infer(model_name=self._model_name, inputs=inputs, outputs=outputs))
File "/opt/mpf/plugin-venv/lib/python3.8/site-packages/tritonclient/grpc/_client.py", line 1380, in infer
raise_error_grpc(rpc_error)
File "/opt/mpf/plugin-venv/lib/python3.8/site-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Exception serializing request!
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/mpf/plugin-venv/lib/python3.8/site-packages/clip_component/clip_component.py", line 149, in get_detections_from_video_capture
detection = list(islice(wrapper.get_classifications(batch, **kwargs), n))
File "/opt/mpf/plugin-venv/lib/python3.8/site-packages/clip_component/clip_component.py", line 202, in get_classifications
results = self._inferencing_server.get_responses(torch_imgs)
File "/opt/mpf/plugin-venv/lib/python3.8/site-packages/clip_component/clip_component.py", line 394, in get_responses
raise mpf.DetectionException(
mpf_component_api.mpf_component_api.DetectionException: Inference failed. (DetectionError.NETWORK_ERROR)
2023-11-28 01:49:50,290 ERROR [org.mitre.mpf.detection:0] - [Job 6:new_face_video.avi(0-509)] An error occurred while invoking the "get_detections_from_video" method on the Python component: DetectionException: Inference failed. (DetectionError.NETWORK_ERROR)
At:
/opt/mpf/plugin-venv/lib/python3.8/site-packages/clip_component/clip_component.py(394): get_responses
/opt/mpf/plugin-venv/lib/python3.8/site-packages/clip_component/clip_component.py(202): get_classifications
2023-11-28 01:49:50,292 ERROR [org.mitre.mpf.detection:0] - [Job 6:new_face_video.avi(0-509)] Video detection method returned an error for /opt/mpf/share/remote-media/new_face_video.avi
Exception serializing message!2023-11-28 01:49:50,289 ERROR [_common.py:91] - [Job 6:new_face_video.avi(0-509)] Exception serializing message!
Traceback (most recent call last):
File "/opt/mpf/plugin-venv/lib/python3.8/site-packages/grpc/_common.py", line 89, in _transform
return transformer(message)
ValueError: Message inference.ModelInferRequest exceeds maximum protobuf size of 2GB: 2774532167
However, what I see in the JSON output is pretty generic:
{
"source": "CLIP",
"code": "NETWORK_ERROR",
"message": "An error occurred while invoking the \"get_detections_from_video\" method on the Python component: Inference failed. (DetectionError.NETWORK_ERROR) (Frames: 0 - 509)"
}
Here's the REST request that resulted in the above error:
{
"jobProperties": {
"MODEL_NAME": "ViT-B/32",
"DETECTION_FRAME_BATCH_SIZE_TRITON": "32",
},
"media": [
{
"mediaUri": "file:///opt/mpf/share/remote-media/new_face_video.avi"
}
],
"pipelineName": "CLIP TRITON IMAGENET CLASSIFICATION PIPELINE",
"priority": 0
}
Adding "ENABLE_CROPPING" : "false"
as a job property allows the job to complete successfully. The issue seems to be that you're not considering the number of crops (in this case 144) against the batch size.
To be clear, a single image with 144 crops is equivalent to 144 frames with a single crop for each. In each case that's a batch of 144 images. If using a batch size of 32 that would be 4 full batches with a partial batch of 16.
python/ClipDetection/plugin-files/descriptor/descriptor.json
line 62 at r1 (raw file):
{ "name": "ENABLE_CROPPING", "description": "If true, the image will be cropped into 144 images of size 224x224. The results from each of these images is averaged to get the results. Not available for use on CPU.",
Do you mean "Not recommended for use on CPU" instead of "Not available ..." ? Does something prevent this from running on the CPU?
python/ClipDetection/triton_server/Dockerfile
line 35 at r1 (raw file):
FROM nvcr.io/nvidia/tritonserver:22.04-py3 as openmpf_triton_server COPY --from=models /models/model.pt /models/ip_clip_512/1/model.pt
This is the ViT-B/32 model, right? We need to support the ViT-L/14 model on Triton too. The user should be able to select that model using the existing MODEL_NAME
property.
Previously, jrobble (Jeff Robble) wrote…
In command: [tritonserver,
--model-repository=/models,
--strict-model-config=false,
--model-control-mode=explicit,
--load-model=ip_clip_512,
# --log-verbose=1, # (optional)
--grpc-infer-allocation-pool-size=16 ] Set Also, please test that when a user specifies |
Previously, jrobble (Jeff Robble) wrote…
In the long run, if and when we seriously want to consider enabling cropping for videos, we should first look at using a 5-crop approach instead of a 144-crop approach. For now, if the user sets Update the description of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @ZachCafego)
python/ClipDetection/README.md
line 80 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Please address this:
Please state which model was used.
We discussed that these numbers were generated for ViT-B/32. Please generate them for ViT-L/14 as well.
python/ClipDetection/clip_component/clip_component.py
line 396 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
In the long run, if and when we seriously want to consider enabling cropping for videos, we should first look at using a 5-crop approach instead of a 144-crop approach.
For now, if the user sets
ENABLE_CROPPING
to true, let's ignore the value ofDETECTION_FRAME_BATCH_SIZE
and just just a batch size of 1 to ensure that we process all crops in one request.Update the description of
DETECTION_FRAME_BATCH_SIZE
indescriptor.json
to mention that it will be ignored whenENABLE_CROPPING
is true.
Also, please test the 144 crop on Triton on an image that is 1080p, just as a sanity check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @ZachCafego)
a discussion (no related file):
Please update the CLIP classification list to be:
crane bird,crane bird
construction crane,construction crane
python/ClipDetection/plugin-files/descriptor/descriptor.json
line 37 at r1 (raw file):
}, { "name": "NUMBER_OF_TEMPLATES",
Please change this to TEMPLATE_TYPE
with the acceptable values of openai-1
, openai-7
, and openai-80
.
This will lay the groundwork for coop
templates in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 13 files reviewed, 15 unresolved discussions (waiting on @jrobble)
a discussion (no related file):
Previously, jrobble (Jeff Robble) wrote…
From the Dockerfile:
COPY --from=models /models/ViT-B-32.pt /models/ViT-B-32.pt
Please pre-install ViT-L-14 as well.
Done.
a discussion (no related file):
Previously, jrobble (Jeff Robble) wrote…
Please update the CLIP classification list to be:
crane bird,crane bird construction crane,construction crane
Done.
python/ClipDetection/clip_component/clip_component.py
line 59 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Nitpick: Let's call this
accept_values
.
Done.
python/ClipDetection/clip_component/clip_component.py
line 150 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
get_detections
would be a more accurate name for this function. Also, it can return more than one detection. Let's use this:detections += list(islice(wrapper.get_detections(batch, **kwargs), n))
Done.
python/ClipDetection/clip_component/clip_component.py
line 348 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
As mentioned in a separate comment, this needs to be configurable to ViT-B/32 or ViT-L/14.
Done.
python/ClipDetection/clip_component/clip_component.py
line 396 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Also, please test the 144 crop on Triton on an image that is 1080p, just as a sanity check.
Need to test 144 crop on Triton, 1080p.
python/ClipDetection/clip_component/clip_component.py
line 450 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Please remove dead code, unless this is important enough to be turned into a comment for future reference.
Done.
python/ClipDetection/plugin-files/descriptor/descriptor.json
line 26 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Please remove "only".
Done.
python/ClipDetection/plugin-files/descriptor/descriptor.json
line 37 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Please change this to
TEMPLATE_TYPE
with the acceptable values ofopenai-1
,openai-7
, andopenai-80
.This will lay the groundwork for
coop
templates in the future.
Done.
python/ClipDetection/plugin-files/descriptor/descriptor.json
line 62 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Do you mean "Not recommended for use on CPU" instead of "Not available ..." ? Does something prevent this from running on the CPU?
Done.
python/ClipDetection/plugin-files/descriptor/descriptor.json
line 91 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
The OcvDNN component has one property called
DETECTION_FRAME_BATCH_SIZE
that it uses for both non-Triton and Triton runs (although in that case it uses a value of 16 for everything). For consistency, I think we should use one property here too. A user may forget that there is a Triton and non-Triton version of the property.Please add
"DETECTION_FRAME_BATCH_SIZE" : "32"
to the Triton actions you defined below.
Done.
python/ClipDetection/tests/data/test_video.mp4
line at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Add an entry to the
NOTICE
file in this directory for this piece of media. Explain what images were used to create it and their respective licenses.
Done.
Leave a code comment explaining why we do this. |
Couldn't you just check to see if |
This test alone took 1 hour and 50 min to generate the ImageNet text embeddings as part of the Jenkins build. Please update this test to use the COCO classes. ImageNet simply takes too long to be part of the build. For comparison, generating the COCO embeddings takes about 9 minutes, which is nearing the limit of the amount of time we want to spend on unit tests (10 min). Another option is to pre-generate the embeddings. |
|
Did you intend to add these info lines? They seem like debug. If not, please remove them. |
The Jenkins build failed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 10 files at r2, 1 of 4 files at r3, 7 of 7 files at r4, 2 of 4 files at r5, 2 of 2 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @ZachCafego)
Issues:
Related PRs:
This change is