Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about person_box and action_predictor #38

Closed
pxssw opened this issue Nov 19, 2020 · 6 comments
Closed

A question about person_box and action_predictor #38

pxssw opened this issue Nov 19, 2020 · 6 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@pxssw
Copy link

pxssw commented Nov 19, 2020

Wonderful job, as a researcher in the same field, I would like to express my appreciation to the author.
I have one question about the part "compute_prediction" of "sction_predictior.py", the inputs of calculate function are nearest list frames(just like self.frame_stack = self.frame_stack[-self.frame_buffer_numbers:]) and only the box of center frame, it is assumed that the pedestrian has little displacement in input frames? I wonder to konw if add the exact box of each frame (can be extracted from the tracking results) can make the finall result better?(for the big change range of motion, just like hit, fight?) or just I mistook the process ?

@yelantf
Copy link
Collaborator

yelantf commented Nov 20, 2020

Thank you for your attention! As shown in the paper, our model takes the bounding box on the center frame to do RoIAlign on all frames of the input video clips. This is mainly following previous works, but also because AVA dataset only provides boxes annotated on the center frame. Of course, we could use a tracking algorithm to generate more accurate bounding boxes on every single frame, and then use them to get more robust results. Actually, there are some previous works [link] trying that. However, we did not find a very robust tracker (especially for fast motion scenes), so we chose to use the current design in our method.

@pxssw
Copy link
Author

pxssw commented Nov 20, 2020

Copy that,wish you to make greater success with the progress in the relevant fields!
At the same time, there are some little problems I meet in the project. (Maybe they are just my personal misunderstanding or bugs, if that please ignore them)

  1. The part update_action_dictionary of visualizer.py: the finall result self.action_dictionary includes the all IDs results (from the first person ), if the project is running for a leng time or for crowds maybe there will be a large demand for calculate resources? Maybe there needs a clean for the long long ago IDS'results.
  2. The cur_millis = stream.get(cv2.CAP_PROP_POS_MSEC) of video_detection_loader.py : I find , in my webcam mode, the begin value of cur_millis is very big (just like 410^8+) , I really don't konw why it not is 0ms, and the value keeps going up for different new running of my project(4.X10^8+, 5.X10^8+...). It's a common problem? I really don't konw.

@yelantf
Copy link
Collaborator

yelantf commented Nov 23, 2020

Thanks for pointing out these problems! First, I have to admit that our current demo program is not well-designed. It could have some little bugs and is also hard to read. As to these two problems you mentioned above:

  1. Yes, you are right. This is indeed a problem for long time running. We will try to enhance it following your suggestions when we are free. Of course, pull requests are also welcomed.

  2. We did not notice this issue before, and actually we did not fully test the demo script in webcam mode because that requires a server with graphical interfaces and a camera, which is not always available to us. According to the documentation of opencv, this flag should give current position of the video file in milliseconds or video capture timestamp. I'm inclined to think that it is the right format for video timestamp, which is relevant to specific camera?

@pxssw
Copy link
Author

pxssw commented Nov 24, 2020

good job! 瑕不掩瑜

@yelantf yelantf added enhancement New feature or request question Further information is requested labels Dec 2, 2020
@yelantf yelantf closed this as completed Dec 2, 2020
@jun0wanan
Copy link

Thank you for your attention! As shown in the paper, our model takes the bounding box on the center frame to do RoIAlign on all frames of the input video clips. This is mainly following previous works, but also because AVA dataset only provides boxes annotated on the center frame. Of course, we could use a tracking algorithm to generate more accurate bounding boxes on every single frame, and then use them to get more robust results. Actually, there are some previous works [link] trying that. However, we did not find a very robust tracker (especially for fast motion scenes), so we chose to use the current design in our method.

hi,
sorry to disturb you , I want to ask how the 1th clip's person bbox link to 2rd clip's person bbox (the same person)?

best,
jun

@jun0wanan
Copy link

hi,
sorry to disturb you , I want to ask how the 1th clip's person bbox link to 2rd clip's person bbox (the same person)?

best,
jun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants