Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Webcam Demo for Spatio-temporal Action Detection Models #795

Merged
merged 24 commits into from Apr 20, 2021

Conversation

irvingzhang0512
Copy link
Contributor

@irvingzhang0512 irvingzhang0512 commented Apr 9, 2021

Description

This implementation is based on SlowFast Spatio-temporal Action Detection Webcam Demo.

TODO

  • Multi-threads for read/display/inference.
  • Human detector
    • easy to use abstract class
    • mmdet
    • [ ] yolov4 human detector: it seems human detector is not the bottleneck for this demo.
  • MMAction2 stdet models.
  • Output result
    • cv2.imshow
    • write to local video file.
  • decouple display frame shape and model frame shape.
  • logging
  • remvoe global variables
  • BUG: Unexpected exit when read thread is dead and display thread is alive.
  • BUG: Ignore sampeling strategy
  • fix known issue.
  • Improvement: In SlowFast Webcam Demo, predict_stepsize must in range [clip_len * frame_interval // 2, clip_len * frame_interval]. Find a way to support predict_stepsize in range [0, clip_len * frame_interval]
  • Docs
    • Annotations in script
    • demo/README.md
    • docs_zh_CN/demo.md

Known issue

  • config model -> test_cfg -> rcnn -> action_thr should be .0 instead of current default value 0.002. This may cause different bboxes number for different actions.
result = stdet_model(...)[0]

previous_shape = None
for class_id in range(len(result)):
    if previous_shape is None:
        previous_shape = result[class_id].shape
    else:
        assert previous_shape == result[class_id].shape, 'This assertion error may be raised.'
  • This may cause index of range error

with torch.no_grad():
result = model(
return_loss=False,
img=[input_tensor],
img_metas=[[dict(img_shape=(new_h, new_w))]],
proposals=[[proposal]])
result = result[0]
prediction = []
# N proposals
for i in range(proposal.shape[0]):
prediction.append([])
# Perform action score thr
for i in range(len(result)):
if i + 1 not in label_map:
continue
for j in range(proposal.shape[0]):
if result[i][j, 4] > args.action_score_thr:
prediction[j].append((label_map[i + 1], result[i][j,
4]))
predictions.append(prediction)

j of result[i][j, 4] may be out of range. The for j in range(proposal.shape[0]) loop are assuming that all of the result[i] has the same shape, aka the same bbox number for different actions.

Usage

  • Modify --output-fps according to printed log DEBUG:__main__:Read Thread: {duration} ms, {fps} fps.
  • Modify --predict-stepsize so that the durations for read and inference, which are both printed by logger, are almost the same.
python demo/webcam_demo_spatiotemporal_det.py --show \
  --output-fps 15 \
  --predict-stepsize 8

@irvingzhang0512 irvingzhang0512 mentioned this pull request Apr 9, 2021
9 tasks
@codecov
Copy link

codecov bot commented Apr 9, 2021

Codecov Report

Merging #795 (38658e5) into master (db1fa97) will increase coverage by 0.33%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #795      +/-   ##
==========================================
+ Coverage   84.89%   85.22%   +0.33%     
==========================================
  Files         131      131              
  Lines        9394     9415      +21     
  Branches     1605     1612       +7     
==========================================
+ Hits         7975     8024      +49     
+ Misses       1012      985      -27     
+ Partials      407      406       -1     
Flag Coverage Δ
unittests 85.22% <ø> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmaction/datasets/base.py 59.71% <0.00%> (-0.15%) ⬇️
mmaction/datasets/samplers/__init__.py 100.00% <0.00%> (ø)
mmaction/datasets/pipelines/augmentations.py 93.11% <0.00%> (ø)
mmaction/core/evaluation/accuracy.py 93.18% <0.00%> (+0.90%) ⬆️
mmaction/datasets/builder.py 44.18% <0.00%> (+1.00%) ⬆️
mmaction/datasets/rawframe_dataset.py 96.66% <0.00%> (+8.60%) ⬆️
mmaction/datasets/samplers/distributed_sampler.py 87.30% <0.00%> (+65.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db1fa97...38658e5. Read the comment docs.

@kennymckormick
Copy link
Member

For the known issue section, I do not quite understand. Why we should have the same bbox number for different actions? If they are different, which error will it trigger? (I can not find the attached piece of code in the master branch).

@irvingzhang0512
Copy link
Contributor Author

@kennymckormick This may cause index of range error

with torch.no_grad():
result = model(
return_loss=False,
img=[input_tensor],
img_metas=[[dict(img_shape=(new_h, new_w))]],
proposals=[[proposal]])
result = result[0]
prediction = []
# N proposals
for i in range(proposal.shape[0]):
prediction.append([])
# Perform action score thr
for i in range(len(result)):
if i + 1 not in label_map:
continue
for j in range(proposal.shape[0]):
if result[i][j, 4] > args.action_score_thr:
prediction[j].append((label_map[i + 1], result[i][j,
4]))
predictions.append(prediction)

j of result[i][j, 4] may be out of range. The for j in range(proposal.shape[0]) loop are assuming that all of the result[i] has the same shape, aka the same bbox number for different actions.
I'll fix this in this PR

@irvingzhang0512
Copy link
Contributor Author

@kennymckormick @congee524 This pr is ready for review

demo/webcam_demo_spatiotemporal_det.py Outdated Show resolved Hide resolved
demo/webcam_demo_spatiotemporal_det.py Outdated Show resolved Hide resolved
demo/webcam_demo_spatiotemporal_det.py Outdated Show resolved Hide resolved
demo/webcam_demo_spatiotemporal_det.py Outdated Show resolved Hide resolved
demo/README.md Outdated Show resolved Hide resolved
demo/webcam_demo_spatiotemporal_det.py Show resolved Hide resolved
demo/webcam_demo_spatiotemporal_det.py Outdated Show resolved Hide resolved
demo/webcam_demo_spatiotemporal_det.py Outdated Show resolved Hide resolved
demo/webcam_demo_spatiotemporal_det.py Outdated Show resolved Hide resolved
Copy link
Member

@kennymckormick kennymckormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, I have run the demo with a video as the input and output to a file. First, it runs very slow: it takes tens of minutes for a one-minute video. Besides, it doesn't write the output to a video file.

demo/README.md Show resolved Hide resolved
demo/webcam_demo_spatiotemporal_det.py Outdated Show resolved Hide resolved
demo/webcam_demo_spatiotemporal_det.py Outdated Show resolved Hide resolved
@irvingzhang0512
Copy link
Contributor Author

irvingzhang0512 commented Apr 13, 2021

First, it runs very slow: it takes tens of minutes for a one-minute video.

It takes 1:37 to run this demo with a 1:16 long video on V100. And I have also tested webcam input(real-time) demo on 1080ti with --predict-stepsize 8.
Could you please share the logs?

Besides, it doesn't write the output to a video file.

It's a bug when creating cv2.VideoWriter, fixed and tested.

@irvingzhang0512
Copy link
Contributor Author

irvingzhang0512 commented Apr 13, 2021

@kennymckormick Could reproduce your bug. reading frames too fast leads to unexpected performance degradation. add time.sleep() after cap.read() could fix this bug.

@kennymckormick
Copy link
Member

kennymckormick commented Apr 14, 2021

@kennymckormick Could reproduce your bug. reading frames too fast leads to unexpected performance degradation. add time.sleep() after cap.read() could fix this bug.

Hi, Irving. I have used the latest code, and it seems that the problem still occurs, can you run this command (which I used) on your machine to check how long does it take? That might be helpful to figure out the problem.

python demo/webcam_demo_spatiotemporal_det.py --input-video demo/ava_1min.mp4 --config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py --checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth --det-config demo/faster_rcnn_r50_fpn_2x_coco.py --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth --det-score-thr 0.9 --action-score-thr 0.5 --label-map demo/label_map_ava.txt --predict-stepsize 40 --output-fps 20 --out-filename tmp.mp4;

Besides, here is output.profile, I don't know how to analyze it ...

output.zip

@irvingzhang0512
Copy link
Contributor Author

irvingzhang0512 commented Apr 14, 2021

Actually, I have already used your configs yesterday... Can you share some logs and the 1min video please.. Will have a try tonight.

output.profile shows that the bottleneck is not the gpu. I guess RAM is the bottleneck. Reading frames too fast leads to Excessive RAM consumption. Maybe you could check the memory consumption when running this demo.

@kennymckormick
Copy link
Member

kennymckormick commented Apr 14, 2021

Actually, I have already used your configs yesterday... Can you share some logs and the 1min video please.. Will have a try tonight.

output.profile shows that the bottleneck is not the gpu. I guess RAM is the bottleneck. Reading frames too fast leads to Excessive RAM consumption. Maybe you could check the memory consumption when running this demo.

Oops, I'm sorry, I just found that ava_1min.mp4 is not in the codebase, here is the testing video.

ava_1min.mp4

@irvingzhang0512
Copy link
Contributor Author

..Couldn't reproduce your bug... i7-8700K + 1080ti + 32G...
Can you share your terminal logs of the latest codes?

@irvingzhang0512
Copy link
Contributor Author

kindly ping @kennymckormick Please share part of the terminal logs.
Besides, I'll test this demo on AGX (which has less computation resources) next week.

@kennymckormick
Copy link
Member

kindly ping @kennymckormick Please share part of the terminal logs.
Besides, I'll test this demo on AGX (which has less computation resources) next week.

Maybe it's a problem with my server. I think it's OK to go on since there seems no problem on your side. Besides, I will try to test it on another platform.

@irvingzhang0512
Copy link
Contributor Author

Successfully run this demo on AGX.

@innerlee
Copy link
Contributor

innerlee commented Apr 20, 2021

Thanks for the demo code! Demos are always great things to have. Two comments

  • Need to make sure that there are no licence issues
  • That file is too long (not a blocking issue though), we may refactor it to be simpler in future

edit:
Is it possible to remove the font file (can keep interface to specify fonts)?

@irvingzhang0512
Copy link
Contributor Author

irvingzhang0512 commented Apr 20, 2021

Need to make sure that there are no licence issues

There is NO LICENCE in AlphAction, I'll create a issue to ask for permission.

  • That file is too long (not a blocking issue though), we may refactor it to be simpler in future

I do have think about how to refactor this demo. Most of these codes(mmdet wrapper, stdet wrapper, multi-thread codes) shouldn't not move to mmaction directory. Maybe the visualization tools could move to main repo.

edit:
Is it possible to remove the font file (can keep interface to specify fonts)?

One reason i like this alphation visualization tool is that the font is much better than supported opencv fonts. How about

  1. removing this font file
  2. choose exsiting font as default font
  3. add warning that we recommend to use this Roboto-Bold font and share a link to download this.

@innerlee
Copy link
Contributor

add warning that we recommend to use ...

You can add an info to tell users that we support customizing font, and how to do it. There's no need to give a specific font since everyone has her/his favorite font.

@irvingzhang0512
Copy link
Contributor Author

irvingzhang0512 commented Apr 20, 2021

ok, I'll ask for permission first.

@irvingzhang0512
Copy link
Contributor Author

Free for non-commercial use only. Will remove related codes later.
@innerlee @kennymckormick

@kennymckormick
Copy link
Member

Free for non-commercial use only. Will remove related codes later.
@innerlee @kennymckormick

I guess we can use them in our project. After all, OpenMMLAB is also a non-commercial project, right? @innerlee

@innerlee
Copy link
Contributor

innerlee commented Apr 20, 2021

@kennymckormick Not really. Licence issues are more complicated. Our project is open-source of course.

@kennymckormick
Copy link
Member

@kennymckormick Not really. Licence issues are more complicated. Our project is open-source of course.

Sorry I misunderstood it.

@innerlee innerlee merged commit 8fb39c3 into open-mmlab:master Apr 20, 2021
@Deep-learning999
Copy link

when run webcam_demo_spatiotemporal_det on cloud gpu into debug
python demo/webcam_demo_spatiotemporal_det.py
--input-video demo/2.mp4
--config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py
--checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth
--det-config demo/faster_rcnn_r50_fpn_2x_coco.py
--det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth
--det-score-thr 0.9
--action-score-thr 0.5
--label-map demo/label_map_ava.txt
--predict-stepsize 20
--output-fps 20
--show

Use load_from_http loader
Use load_from_http loader
DEBUG:main:Read thread: 3690 ms, 17 fps
/usr/local/lib/python3.8/dist-packages/mmdet/datasets/utils.py:64: UserWarning: "ImageToTensor" pipeline is replaced by "DefaultFormatBundle" for batch inference. It is recommended to manually replace it in the test data pipeline in your config file.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/mmdet/models/dense_heads/rpn_head.py:191: UserWarning: In rpn_proposal or test_cfg, nms_thr has been moved to a dict named nms as iou_threshold, max_num has been renamed as max_per_img, name of original arguments and the way to specify iou_threshold of NMS will be deprecated.
warnings.warn(
INFO:main:Stdet Results: None
DEBUG:main:Main thread inference time 484 ms
DEBUG:main:Display thread: 4201 ms, read id 0, display id 0
DEBUG:main:Read thread: 1178 ms, 17 fps
INFO:main:Stdet Results: None
DEBUG:main:Read thread: 1176 ms, 17 fps
DEBUG:main:Main thread inference time 1416 ms
DEBUG:main:Display thread: 2112 ms, read id 2, display id 1
INFO:main:Stdet Results: [[('sit', 0.9589796), ('talk to (e.g., self, a person, a group)', 0.7111095)]]
DEBUG:main:Read thread: 1111 ms, 18 fps
DEBUG:main:Main thread inference time 1609 ms
DEBUG:main:Display thread: 1610 ms, read id 3, display id 2
INFO:main:Stdet Results: [[('sit', 0.98160917), ('talk to (e.g., self, a person, a group)', 0.9181804)]]
DEBUG:main:Read thread: 1104 ms, 18 fps
DEBUG:main:Read thread: 1178 ms, 17 fps
DEBUG:main:Read thread: 1160 ms, 17 fps
DEBUG:main:Main thread inference time 2901 ms
DEBUG:main:Display thread: 2902 ms, read id 6, display id 3
INFO:main:Stdet Results: [[('sit', 0.97351325), ('talk to (e.g., self, a person, a group)', 0.81250376)]]
DEBUG:main:Main thread inference time 384 ms
DEBUG:main:Display thread: 384 ms, read id 6, display id 4
INFO:main:Stdet Results: [[('sit', 0.98715776), ('talk to (e.g., self, a person, a group)', 0.92745554)]]
DEBUG:main:Read thread: 1094 ms, 18 fps
DEBUG:main:Main thread inference time 926 ms
DEBUG:main:Display thread: 926 ms, read id 7, display id 5
INFO:main:Stdet Results: [[('sit', 0.9661723), ('talk to (e.g., self, a person, a group)', 0.9143741)]]
DEBUG:main:Read thread: 1096 ms, 18 fps
DEBUG:main:Main thread inference time 1514 ms
DEBUG:main:Display thread: 1514 ms, read id 8, display id 6
INFO:main:Stdet Results: [[('sit', 0.9726977), ('talk to (e.g., self, a person, a group)', 0.90133816)]]
DEBUG:main:Read thread: 1092 ms, 18 fps
DEBUG:main:Read thread: 1079 ms, 19 fps
DEBUG:main:Read thread: 1159 ms, 17 fps
DEBUG:main:Main thread inference time 2578 ms
DEBUG:main:Display thread: 2579 ms, read id 11, display id 7
INFO:main:Stdet Results: [[('sit', 0.9682999), ('talk to (e.g., self, a person, a group)', 0.9515228)]]
DEBUG:main:Main thread inference time 282 ms
DEBUG:main:Display thread: 283 ms, read id 11, display id 8
INFO:main:Stdet Results: [[('sit', 0.98002636), ('talk to (e.g., self, a person, a group)', 0.8736231)]]
DEBUG:main:Main thread inference time 443 ms
DEBUG:main:Display thread: 444 ms, read id 11, display id 9
INFO:main:Stdet Results: [[('sit', 0.9718218), ('carry/hold (an object)', 0.5342022), ('talk to (e.g., self, a person, a group)', 0.9233674)]]
DEBUG:main:Main thread inference time 243 ms
DEBUG:main:Display thread: 245 ms, read id 12, display id 10
DEBUG:main:Read thread: 1176 ms, 17 fps
INFO:main:Stdet Results: [[('sit', 0.9556866), ('carry/hold (an object)', 0.6416971), ('talk to (e.g., self, a person, a group)', 0.92124444)]]
DEBUG:main:Main thread inference time 303 ms
DEBUG:main:Display thread: 302 ms, read id 12, display id 11
INFO:main:Stdet Results: [[('sit', 0.95329607), ('carry/hold (an object)', 0.64250106), ('talk to (e.g., self, a person, a group)', 0.8144327)]]
DEBUG:main:Main thread inference time 263 ms
DEBUG:main:Display thread: 275 ms, read id 12, display id 12
DEBUG:main:Read thread: 1136 ms, 18 fps
INFO:main:Stdet Results: [[('sit', 0.96635747), ('carry/hold (an object)', 0.685908), ('talk to (e.g., self, a person, a group)', 0.76616365)]]
DEBUG:main:Main thread inference time 225 ms
DEBUG:main:Display thread: 885 ms, read id 13, display id 13
DEBUG:main:Read thread: 1132 ms, 18 fps
INFO:main:Stdet Results: [[('sit', 0.97268265), ('carry/hold (an object)', 0.71732974), ('talk to (e.g., self, a person, a group)', 0.73268604)]]
DEBUG:main:Main thread inference time 400 ms
DEBUG:main:Display thread: 1307 ms, read id 14, display id 14
DEBUG:main:Read thread: 1118 ms, 18 fps
INFO:main:Stdet Results: [[('sit', 0.98331535), ('talk to (e.g., self, a person, a group)', 0.8445648)]]
DEBUG:main:Main thread inference time 419 ms
DEBUG:main:Display thread: 1146 ms, read id 15, display id 15
DEBUG:main:Read thread: 1092 ms, 18 fps
INFO:main:Stdet Results: [[('sit', 0.7194687)]]
DEBUG:main:Main thread inference time 402 ms
DEBUG:main:Display thread: 1066 ms, read id 16, display id 16
DEBUG:main:Read thread: 1091 ms, 18
when run cloud gpu why into debug

@irvingzhang0512 irvingzhang0512 deleted the webcam-stdet branch May 21, 2021 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants