Skip to content

luna97/clip_captioning_satellitar

Repository files navigation

Captioning satellitar images with CLIP + GPT2

Downloading datasets

For all the datasets the captions are already present in this repo. The images should be downloaded.

RSICD Captions

Can be found here https://github.com/201528014227051/RSICD_optimal. The folder structure is the following:

| data
|  ├── rsicd
│  │  ├── RSICD_images
│  │  │  ├── 00001.jpg
│  │  │  ├── 00002.jpg
│  │  │  ├── ...
│  │  ├── dataset_rsicd.json

UCM Captions

Can be found here https://github.com/201528014227051/RSICD_optimal. The folder structure is the following:

| data
|  ├── ucm
│  │  ├── images
│  │  │  ├── 1.tif
│  │  │  ├── 2.tif
│  │  │  ├── ...
│  │  ├── dataset.json

Sydney Captions

Can be found here https://github.com/201528014227051/RSICD_optimal. The folder structure is the following:

| data
|  ├── sydney
│  │  ├── images
│  │  │  ├── 1.tif
│  │  │  ├── 2.tif
│  │  │  ├── ...
│  │  ├── filenames
│  │  │  ├── descriptions_SYDNEY.txt
│  │  │  ├── filenames_test.txt
│  │  │  ├── filenames_train.txt
│  │  │  ├── filenames_val.txt

NWPU Captions

Can be found here: https://github.com/HaiyanHuang98/NWPU-Captions. The folder structure is the following:

| data
|  ├── nwpu
│  │  ├── images
│  │  │  ├── airplane
│  │  │  │  ├── airplane_001.jpg
│  │  │  │  ├── ...
│  │  │  ├── bridge
│  │  │  │  ├── bridge_001.jpg
│  │  │  │  ├── ...
│  │  │  ├── ...
│  │  ├── dataset_nwpu.json

Training

VGG16

To train using VGG16 as backbone encoder run:

python3 train_decoder.py --encoder=vgg

RemoteCLIP

To train using RemoteCLIP backbone run:

python3 train_decoder.py --encoder=remote_clip

Finetuing CLIP

If you want to finetune clip with SEG-4 dataset run:

python train_clip.py

Then you can train the decoder model:

python train_decoder.py --encoder=clip

Evaluation

To evaluate the model. It will evaluate the model on each dataset and printing the metrics.

python evaluate.py

Inference

To generate captions for a specific image, you can use the following command:

python caption.py --image_path path/to/image.jpg

About

Image captioning on satellitar data, using CLIP + GPT2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published