Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

VideoPose3D on Android #160

Closed
aloyspx opened this issue Oct 14, 2020 · 6 comments
Closed

VideoPose3D on Android #160

aloyspx opened this issue Oct 14, 2020 · 6 comments

Comments

@aloyspx
Copy link

aloyspx commented Oct 14, 2020

How would one go about running this model for use in the wild on an Android device ?
I imagine that it would be a similar process than described here: https://pytorch.org/mobile/android/ .
I just wanted to know how feasible this would be or if it has already been done. I'd imagine I would have to use a more lightweight 2D pose inference than Detectron or Detectron2 to keep the runtime reasonable.

Thanks.

@aloyspx aloyspx closed this as completed Oct 24, 2020
@RWOverdijk
Copy link

I was quietly following this issue. Did you find a solution?

@aloyspx
Copy link
Author

aloyspx commented Oct 26, 2020

I haven't had much time to look into it. I was playing around with TFLite Pose Estimation (https://www.tensorflow.org/lite/models/pose_estimation/overview) and it runs faster than Detectron/Detectron2 which is better for mobile, and you can play around with it to make it more or less accurate. I believe it'll require some post-processing to get into the right format for input into VideoPose3D (I think this repo might provide some insight: https://github.com/darkAlert/VideoPose3d_with_Detectron2 or maybe the inference scripts in this repo). The next step I was thinking of following the process described in the PyTorch mobile documentation with the TemporalModelBase in the model.py file, but haven't gotten around to it.

@serviceberry3
Copy link

serviceberry3 commented Nov 3, 2020

I've been playing with the TFLite 2D model for a while and am going to try plugging it into VideoPose3D. I'll post updates here.

UPDATE: I've loaded VideoPose3D into Android as a Torch Script, along with the Posenet library. I'll be drawing some visualizations and trying to improve runtime.

UPDATE: so far I've swapped out Detectron for Posenet and got the visualization working on my computer. See my repository here. You need to download the Posenet lite model. From root directory of my repo, you can run
python3 infer_video_posenet.py --output-dir data --image-ext mp4 (relative path to videos folder). Then just run

python3 run_3d_vis.py -d custom -k myvideos -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_detectron_coco.bin --render --viz-subject (relative path to video) --viz-action custom --viz-camera 0 --viz-video (relative path to video) --viz-output (relative path to desired output location) --viz-size 6

to get visualization.

You can also now do live real-time inference (with Posenet still for the 2D tracker) using webcam on PC (see instructions on the repo). I think the accuracy of Posenet can be tuned up, right now it's a little glitchy.

@serviceberry3
Copy link

I made the app, but the 3d inference is still running too slow. Does anyone have any ideas to speed it up? I will try (a) only running it every couple frames, (b) try to figure out how to run it on the GPU, and (c) figuring out how to fuse some of the convolutional/batchnorm layers of the model using torch.quantization. Also, is torch in Android guaranteed to use max number of threads? I saw something online about org.pytorch.Module.setNumThreads, but I can't find that method.

@aloyspx
Copy link
Author

aloyspx commented Nov 18, 2020

Great job. Testing it out right now, looks very promising. Performance-wise, looks like the documented approach for better performance on mobile is close to option (c): https://pytorch.org/tutorials/recipes/mobile_perf.html?highlight=mobile.
That being said with newer phones getting access to "neural engine" type hardware, performance will get better in time.

@serviceberry3
Copy link

serviceberry3 commented Jan 13, 2021

Does anyone know if this model would generally run faster or slower than the model outline in the 3d-pose-baseline paper by Martinez et al.? I'm wondering if should try 3d-pose-baseline to speed things up, but I don't have enough background knowledge to know if it's lighter. All I know is that in the paper, they claimed 3d-pose-baseline inferred in 2ms on a Titan Xp.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants