CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

Paper (ArXiv) | Video

Official Implementation for ACM Multimedia 2023 paper CLIP-Count: Towards Text-Guided Zero-Shot Object Counting.

TL;DR: We count anything with text descriptions.

Update

🔥🔥🔥 [Jul 26] Our paper is accepted by ACM Multimedia 2023.

🔥🔥🔥 [Aug 12] The code and pretrained model is released.

Preparation

Environment: Create a virtural environment use Anaconda, and install all dependencies.

conda create -n clipcount python=3.8 -y;
conda activate clipcount;
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=10.2 -c pytorch
# For GPU with CUDA version 11.x, please use:
# conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

Data: We conduct experiments over three datasets, you can download and use whichever you would like to test. The three dataset could be downloaded at: FSC-147 | CARPK | ShanghaiTech. Notice that you have to download the annoations of FSC-147 separately from their repo.

Extract and put the downloaded data in the data/ dir. The complete file structure should look like this. You don't have to download all the dataset for evaluation, but you must have FSC-147 if you want to train the model.

data
├─CARPK/
│  ├─Annotations/
│  ├─Images/
│  ├─ImageSets/
│
├─FSC/    
│  ├─gt_density_map_adaptive_384_VarV2/
│  ├─images_384_VarV2/
│  ├─FSC_147/
│  │  ├─ImageClasses_FSC147.txt
│  │  ├─Train_Test_Val_FSC_147.json
│  │  ├─ annotation_FSC147_384.json
│  
├─ShanghaiTech/
│  ├─part_A/
│  ├─part_B/

Run the Code

📘 Train. you can train the model using the following command. For first time you run the code, the forzen CLIP weight would be downloaded.

CUDA_VISIBLE_DEVICES=0 python run.py --mode train --exp_name exp --batch_size 32

Notice that the MAE logged during training is lower than the testing time, this is because training image is cropped from the whole image, while testing time sliding windows algorithm is applied.

🧰 Evaluation. Test the performance of trained ckpt with following command. You can change the --dataset_type parameter to test on different datasets.

CUDA_VISIBLE_DEVICES=0 python run.py --mode test --exp_name exp --batch_size 32 --dataset_type FSC --ckpt path/to/model.ckpt

We provide a pre-trained ckpt of our full model, which has similar quantitative result as presented in the paper.

FSC val MAE	FSC val RMSE	FSC test MAE	FSC test RMSE	CARPK MAE	CARPK RMSE
18.93	66.10	17.45	104.88	11.70	13.94

Part-A MAE	Part-A RMSE	Part-B MAE	Part-B RMSE
197.47	319.75	47.92	80.48

💻 Interactive Web App. We provide a user-friendly interface to let you interact with CLIP-Count. You can start a web app using following command:

CUDA_VISIBLE_DEVICES=0 python run.py --mode app --exp_name exp --batch_size 32 --ckpt path/to/model.ckpt

You can follow the instructions on the terminal to visit the interface,

$ Running on local URL:  http://127.0.0.1:xxxx
$ Running on public URL: https://xxxxxxxx-xxxx-xxxx.gradio.live

$ This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces

The interface would look like this. You can upload images and input prompts to test the model.

Gallery

Citation

Consider cite us if you find our paper is useful in your research :).

@article{jiang2023clip,
  title={CLIP-Count: Towards Text-Guided Zero-Shot Object Counting},
  author={Jiang, Ruixiang and Liu, Lingbo and Chen, Changwen},
  journal={arXiv preprint arXiv:2305.07304},
  year={2023}
}

Acknowledgement

This project is based on implementation from CounTR.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
asset		asset
models		models
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asset

asset

models

models

util

util

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

run.py

run.py

Repository files navigation

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

Paper (ArXiv) | Video

Preparation

Run the Code

Gallery

Citation

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

songrise/CLIP-Count

Folders and files

Latest commit

History

Repository files navigation

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

Paper (ArXiv) | Video

Preparation

Run the Code

Gallery

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages