# Classic vs remote sensing multispectral images
> A comprehensive comparison between classic RGB images and remote sensing multispectral images. 

- toc: true
- badges: true
- comments: true
- categories: images

# About
In this post we are going to do a detailed comparison between classic and multispectral images.

TODO: Write something in more detail

## Values
The [introduction to mulit-channel images post](https://kai-tub.github.io/master-thesis-blog/images/2020/09/16/images-with-channels.html) explained in great detail how the RGB values are stored on disk. In short, colored images generally use three channels: red (R), green (G), and blue (B). The channels have a predefined _range_. The range, or the number of distinct colors, is called the color-depth. In the deep-learning field these RGB images usually have a color-depth of 8-bit, resulting in 256 (=2⁸) color tones per channel and ~16 million (=2⁸ x 2⁸ x 2⁸) different color values in total. 

In contrast to RGB images with three channels, we saw multispectral remote sensing data with double-digit channels.
The exact amount of channels heavily varies between measuring equipment but for now we will focus on
images from the Sentinel-2 satellites with 13 bands (TODO: Add link).
The satellites sense electromagnetic waves ranging from the visible/near-infrared spectrum to the shortwave-infrared band.
The main idea being that the information outside of the visible spectrum helps to better differentiate between objects, even if they *look* the same in the visible light. For multispectral images the number of bits per channel is not called color-depth, instead it is referred to as radiometric resolution. 
The radiometric resolution for Sentinel-2 images effectively is 16-bits per band, resulting in much larger image files.


|                                      | Classic RGB images | Sentinel-2 multispectral images |
|--------------------------------------|:------------------:|:-------------------------------:|
| # Channels / Bands                   |          3         |                13               |
| Color-depth / Radiometric resolution |        8-bit       |              16-bit             |
| Size of 224px x 224px image          |       147 KiB      |            1,274 KiB            |
| Relative size to classic RGB image   |          1         |               8.67              |

## Image content
Although it is important to know how the data is digitally represented, we mostly care about the images themselves and not the binary values. 
Most visual deep-learning applications use pretrained models tuned on the popular image dataset ImageNet {% cite Russakovsky2015 %}. ImageNet consists of about 1.3 million images from animals, vehicles, tools, and many more everyday objects. Most of them are photos where the _interesting_ object lies in the center of the frame and dominates most of the image.

This is in stark contrast to satellite images.
Here, all regions are equally _important_ and contribute to the overall scene.
A popular alternative to ImageNet is the [Places](http://places2.csail.mit.edu/index.html) dataset {% cite Zhou2018 %}.
The Places dataset focuses on training deep-learning models on scenes instead on specific objects.
The main difference to remote images is the spatial resolution. While each pixel on the Places dataset covers a region of some square centimeters, every pixel from satellite imagery often covers double digit square meters! 
In other words, some remotely-sensed regions may have very few pixels associated with them, even if they are reasonably large and of high-interest.
As a result of the high spatial resolution, _smaller_ classes {% fn 1 %} such as airports, port areas, or burnt forests, are severly underrepresented in datasets, while areas such as forests and water bodies dominate the data distribution.
Also, each image can contain various scenes/objects.

Another interesting difference, is the rotation invariance of such data.
Due to the bird-eye view, there is no _top_ or _bottom_ for remote images{% fn 2 %}.

|                                  |                  Classic RGB images                  |        Sentinel-2 multispectral images        |
|----------------------------------|:----------------------------------------------------:|:---------------------------------------------:|
| Spatial resolution of each pixel |                  from mm² up to cm²                  |                     ~XX m²                    |
| Class distribution               |                       balanced                       |               heavily imbalanced              |
| Rotation property                | Variant to rotation (there is a correct orientation) | Invariant (there is no "correct" orientation) |

## Visualizing
- Show how hard it is to create such images
- Ambiguity

We understand how the data is represented and what the values stand for.
But how do we work with these images?
As most images use the RGB representation by default, there is a plethora of libraries and tools we could use to
load, visualize and transform the images. 
As an example, we could use the `PIL` library we have seen in the previous introductory posts about [single](https://kai-tub.github.io/master-thesis-blog/images/2020/09/02/introduction-grayscale-images.html) and [multi-channel](https://kai-tub.github.io/master-thesis-blog/images/2020/09/16/images-with-channels.html) images.


In [19]:
#collapse
from PIL import Image
from pathlib import Path

img_path = Path(".") / "2020-11-11" / "puppy.jpg"
opened_img = Image.open(img_path)

<figure>
        <div>
            <figure id="Fig1">
<img src="2020-11-11/puppy.jpg" alt="Image of a puppy">
            </figure>
        </div>
    <figcaption><center>Fig 1: Example RGB image (Image by <a href="https://pixabay.com/users/3194556-3194556/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=1903313">Karen Warfel</a> from <a href="https://pixabay.com/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=1903313">Pixabay</a>)</center></figcaption>
</figure>

In [22]:
#collapse
def center_crop(img, crop_size=512):
    width, height = img.size
    half_crop = crop_size // 2
    # PILs coordinate system defines the pixel 0, 0 
    # as the upper left corner of an image
    left = width // 2 - half_crop
    right = width // 2 + half_crop
    upper = height // 2 - half_crop
    lower = height // 2 + half_crop
    return img.crop((left, upper, right, lower))
    

img = Image.open(img_path)
cropped_rot_img = center_crop(img).rotate(90)

In [23]:
#hide
cropped_rot_img.save(img_path.parent / "cropped_rot_puppy.jpg")

<figure>
        <div>
            <figure id="Fig2">
<img src="2020-11-11/cropped_rot_puppy.jpg" alt="Center cropped and rotated image of the previous puppy">
            </figure>
        </div>
    <figcaption><center>Fig 2: Example transformation on RGB image (Image by <a href="https://pixabay.com/users/3194556-3194556/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=1903313">Karen Warfel</a> from <a href="https://pixabay.com/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=1903313">Pixabay</a>)</center></figcaption>
</figure>


With more than 3 channels, only few libraries are avaible. The main question we have to ask ourselves is, what we want to do with these images? 

## References

{% bibliography --cited_in_order %}

{{ 'In the deep-learning field, classes refer to the objects/areas of interest. We hope to train the model in such a way that it can differenciate between them. We will go into more detail in the next post.' | fndetail: 1 }}

{{ 'This may seem obvious and irrelevant but it will play a bigger role in later posts, I promise 😉' | fndetail: 2 }}