Demo will only be up for a couple of months.
This repository contains a simple API for performing exercise recognition on still-images using the OpenAI CLIP model. The demo is built using
OpenAI, with the release of CLIP, provides the first zero-shot paper that actually works relatively well across many datasets. CLIP is trained on ~20,000 (image, text) pairs across a vast ontology, so it extends easily to other domains without supervision, although it is possible to add a classifier on top of extracted features.
- Exploration
- First explore what does prediction look like on exercise classification.
- Explore how exercise classification performs in a video frame by frame as time-series.
- Build simple, deployable API for others to explore and expand upon.
This workflow is tested on GCP, however it should extend to other cloud providers as it's docker-based. Currently runs in CPU mode, but slight modifications would allow for use with GCP.
- Install dependencies onto machine, e.g.
sudo apt-get install git docker build-essentials
- Clone repository.
- Build demo server.
make serve-cpu
- "a person standing"
- "a person repeating a squat"
- "a person repeating a jumping jack"
- "a person performing a plank"
Edit src/labels.txt
if you want to try other labels in the demo api or edit texts
variable in the notebook to experiment. Have fun, "prompt engineering" is the new rage :-D.
In the notebook, I have an example working on video where I run the CLIP model frame by frame with a mean smoothing window size of 5.
- Use CLIP image encoder as embedding for image retrieval. Collect a dataset of still images to which new images are matched.
- Use CLIP image encoder when learning new tasks as an additional loss acting as a regularizer helping a network learn more generic features.