### Image Captioning of Earth Observational Imagery

## An MDS-MDA Joint Capstone Project


<img src="../imgs/mda.png" alt="drawing" style="width:600px;"/>




###### Dora Qian, Fanli Zhou, James Huang, Mike Chen

<font size="2">[MDA logo](https://mdacorporation.com/corporate/)</font>

## MDA
#### A Canadian Aerospace Company

+ Developed Canadarm and Canadarm-2 on the ISS

<img src="../imgs/canadarm.jpg" alt="drawing" style="width:800px;" class="center"/>

<font size="2">Sources: 
[Canadarm](https://www.thecanadianencyclopedia.ca/en/article/canadarm), 
[Canadarm2](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e5/STS-114_Steve_Robinson_on_Canadarm2.jpg/2560px-STS-114_Steve_Robinson_on_Canadarm2.jpg)</font>

## MDA

+ Access to a vast database of satellite images


<img src="../imgs/505.png" alt="drawing" style="width:400px;"/>

<font size="2">Sources: Image adapted from Qu, B. et al. (2016) [1].</font>

## The problem

+ These images are uncaptioned
    + Without captions, these images are difficult and computationally costly to work with
+ Technology of captioning satellite images is less mature than "traditional" photographs
+ Due the nature of these photographs, the model cannot be effectively trained on other types of images.
    + Limited existing resources to train the model

## Image Captioning: Motivation and Purpose

+ Associating an image with a caption makes it much more accessible:

    + Tag and sort images based on content

    + Return search queries
    
    + Evaluate image similarity
    
    + Downstream applications



### Training the Model

+ MDA images are uncaptioned
 + Train, validate, and test on public, captioned satellite images
     + Several different datasets
     + Assess cross dataset performance
 + Final manual evaluation on uncaptioned MDA images

## Final Data Product
- End-to-End image captioning pipeline 
- 3 independent modules
<img src="../imgs/dataproduct.png" width="800">

### Final Data Product: Database
- Non-relational database
- Stores both human-annotated and machine-generated image-caption pairs
<img src="../imgs/database.png" width="600">


<font size="2">Sources: [RSICD_optimal](https://github.com/201528014227051/RSICD_optimal)</font>

### Final Data Product: Deep Learning Model
- Load images from database
- Model training
- Model prediction
- Easy to update

<img src="../imgs/aws.png" width="300">

<img src="../imgs/Pytorch_logo.png" width="400">

<font size="2">Sources: 
[AWS logo](https://commons.wikimedia.org/wiki/File:Amazon_Web_Services_Logo.svg), 
[Pytorch logo](https://commons.wikimedia.org/wiki/File:Pytorch_logo.png)</font>

### Final Data Product: Visualization & Database Updating Tool

- Random selected images from database
- Self upload images outside database
- Standardize the image, make prediction and save to database

<img src="../imgs/tool.png" width="800">

<font size="2">Sources: [RSICD_optimal](https://github.com/201528014227051/RSICD_optimal)</font>

## Data Description

There are three labeled datasets:
- UCM_Captions 
- Sydney_Captions
- RSICD (Remote Sensing Imaging Captioning Dataset)

### UCM_captions

- 21 Different Classes of Images
- 2100 Different Images 
- 256 X 256 Pixels
- .tif Format


<table><tr><td><img src='../imgs/ucm_1.jpg' width="200" height="80"></td><td><img src='../imgs/ucm_2.jpg' width="200" height="80"></td></tr></table>

<table><tr><td><img src='../imgs/ucm_3.jpg' width="200" height="80"></td><td><img src='../imgs/ucm_4.jpg' width="200" height="80"></td></tr></table>

<font size="2">Sources: [UC Merced Land Use Dataset](http://weegee.vision.ucmerced.edu/datasets/landuse.html)</font>

### Sydney_captions

- 7 Different Classes of Images
- 613 Different Images 
- 500 X 500 Pixels 
- .tif Format

<table><tr><td><img src='../imgs/sydney_1.jpg' width="200" height="30"></td><td><img src='../imgs/sydney_2.jpg' width="200" height="30"></td></tr></table>


<table><tr><td><img src='../imgs/sydney_3.jpg' width="200" height="30"></td><td><img src='../imgs/syndey_4.jpg' width="200" height="30"></td></tr></table>

<font size="2">Sources: Image adapted from Qu, B. et al. (2016) [1].</font>

### RSICD (Remote Sensing Imaging Captioning Dataset)



- 10,922 Different Images 
- 224 X 224 Pixels
- .jpg Format 


<table><tr><td><img src='../imgs/rsicd_1.jpg' width="200" height="80"></td><td><img src='../imgs/rsicd_2.jpg' width="200" height="80"></td></tr></table>

<table><tr><td><img src='../imgs/rsicd_3.jpg' width="200" height="80"></td><td><img src='../imgs/rsicd_4.jpg' width="200" height="80"></td></tr></table>

<font size="2">Sources: [RSICD_optimal](https://github.com/201528014227051/RSICD_optimal)</font>

### RSICD Caption Example
<img src="../imgs/rsicd_caption.png" width="800">


<font size="2">Image adapted from Lu, X. et al. (2018) [1].</font>


### Exploratory Data Analysis 
- Train_Valid/Test = 80%/20%
- Train/Validation = 80%/20%
- Maximum Length: 34 words
- Minimum Length: 2 words
- Most Common Words


<img src="../imgs/word_cloud1.png" width="500">

## Data Science Techniques
### Baseline Model: CNN + RNN (LSTM)

<img src="../imgs/model_1.png" width="1200">

<font size="2">Sources: Image adapted from Lu, X. et al. (2018) [2].</font>

### Model II: CNN + Attention + LSTM

<img src="../imgs/model_2.png" width="1200">

<font size="2">Sources: Image adapted from Zhang, X. et al. (2019) [3].</font>

### Model III: CNN + multi-level Attention + LSTM

<img src="../imgs/model_3.png" width="1000">

<font size="2">Sources: Image adapted from Li, Y. et al. (2020) [4].</font>

## Timeline and Evaluation
<img src="../imgs/timeline.png" width="900">

## Reference

<font size="4">1. B. Qu, X. Li, D. Tao, and X. Lu, “Deep semantic understanding of high resolution remote sensing image,” International Conference on Computer, Information and Telecommunication Systems, pp. 124–128, 2016.</font>

<font size="4">2. Lu, X.; Wang, B.; Zheng, X.; Li, X. Exploring models and data for remote sensing image caption generation. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2183–2195.</font>

<font size="4">3. Zhang, X.; Wang, X.; Tang, X.; Zhou, H.; Li, C. Description Generation for Remote Sensing Images Using Attribute Attention Mechanism. Remote Sens. 2019, 11, 612.</font>

<font size="4">4. Li, Y.; Fang, S.; Jiao, L.; Liu, R.; Shang, R. A Multi-Level Attention Model for Remote Sensing Image Captions. Remote Sens. 2020, 12, 939.</font>


<img src="../imgs/thankyou.png" width="900">