Captioning satellitar images with CLIP + GPT2

Downloading datasets

For all the datasets the captions are already present in this repo. The images should be downloaded.

RSICD Captions

Can be found here https://github.com/201528014227051/RSICD_optimal. The folder structure is the following:

| data
|  ├── rsicd
│  │  ├── RSICD_images
│  │  │  ├── 00001.jpg
│  │  │  ├── 00002.jpg
│  │  │  ├── ...
│  │  ├── dataset_rsicd.json

UCM Captions

Can be found here https://github.com/201528014227051/RSICD_optimal. The folder structure is the following:

| data
|  ├── ucm
│  │  ├── images
│  │  │  ├── 1.tif
│  │  │  ├── 2.tif
│  │  │  ├── ...
│  │  ├── dataset.json

Sydney Captions

Can be found here https://github.com/201528014227051/RSICD_optimal. The folder structure is the following:

| data
|  ├── sydney
│  │  ├── images
│  │  │  ├── 1.tif
│  │  │  ├── 2.tif
│  │  │  ├── ...
│  │  ├── filenames
│  │  │  ├── descriptions_SYDNEY.txt
│  │  │  ├── filenames_test.txt
│  │  │  ├── filenames_train.txt
│  │  │  ├── filenames_val.txt

NWPU Captions

Can be found here: https://github.com/HaiyanHuang98/NWPU-Captions. The folder structure is the following:

| data
|  ├── nwpu
│  │  ├── images
│  │  │  ├── airplane
│  │  │  │  ├── airplane_001.jpg
│  │  │  │  ├── ...
│  │  │  ├── bridge
│  │  │  │  ├── bridge_001.jpg
│  │  │  │  ├── ...
│  │  │  ├── ...
│  │  ├── dataset_nwpu.json

Training

VGG16

To train using VGG16 as backbone encoder run:

python3 train_decoder.py --encoder=vgg

RemoteCLIP

To train using RemoteCLIP backbone run:

python3 train_decoder.py --encoder=remote_clip

Finetuing CLIP

If you want to finetune clip with SEG-4 dataset run:

python train_clip.py

Then you can train the decoder model:

python train_decoder.py --encoder=clip

Evaluation

To evaluate the model. It will evaluate the model on each dataset and printing the metrics.

python evaluate.py

Inference

To generate captions for a specific image, you can use the following command:

python caption.py --image_path path/to/image.jpg

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.txt		TODO.txt
__init__.py		__init__.py
caption.py		caption.py
count_dataset_occurrence.ipynb		count_dataset_occurrence.ipynb
dataset.py		dataset.py
dockerfile.yml		dockerfile.yml
eval.py		eval.py
main.py		main.py
model.py		model.py
project_satellitar.pdf		project_satellitar.pdf
requirements.txt		requirements.txt
test.ipynb		test.ipynb
train_clip.py		train_clip.py
train_decoder.py		train_decoder.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Captioning satellitar images with CLIP + GPT2

Downloading datasets

RSICD Captions

UCM Captions

Sydney Captions

NWPU Captions

Training

VGG16

RemoteCLIP

Finetuing CLIP

Evaluation

Inference

About

Releases

Packages

Languages

License

luna97/clip_captioning_satellitar

Folders and files

Latest commit

History

Repository files navigation

Captioning satellitar images with CLIP + GPT2

Downloading datasets

RSICD Captions

UCM Captions

Sydney Captions

NWPU Captions

Training

VGG16

RemoteCLIP

Finetuing CLIP

Evaluation

Inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages