# Introduction to deep learning

- Some useful resouces
  - [Earth Data Science Courses](https://www.earthdatascience.org/courses/)

- Available satellite sensors for earth observation
    - Landsat (free, multi-specrtral)
    - Pléiades Neo (commercial, high-res)
    - MODIS (free, multi-spec, continuous observation)
    - Sentinel-1 (C-band radar)
    - NASA SMAP (L-band)
    - NASA CYGNSS (sensing sea elvel wind speed in tropical cyclones)
        - Application in surfacve soil moisture

- Supervised learning
    - Classification
    - Regression

- Gradient boosting
    - A fast ML method with mostly satisfactory results

- Unsupervised learning
    - Clustering
    - Generative models
    - Inverse problems

- Diff b/w ML and DL
    - Feature extraction and classification/regression are combined together in DL
    - Users do not need to perform feature extraction, which is a required step in ML

- State of the art
    - CNNs
    - RNN/LSTMs
    - Transformers

- Applications in remote sensing
    - Scene classification using aerial photos
        - AID: a benchmark dataset for performance evaluations of aerial scene classification
        - Can combine spectral and non-spectral data together to achieve a more desirable result of classification
        - [BigEarthNet](https://bigearth.net/)
            - Image patches with multiple labels
                - 43 imbalanced labels
            - A library to test a classification model
    - Object detection
        - Horizontal object detection and rotated object detection
        - [DIOR dataset](https://universe.roboflow.com/new-workspace-ghppr/dior-dataset-riv6b)
    - Cloud detection
    - Object recognition in an earth observation application context
        - Active fire, solar PV parks, floating plastic litter, etc.
    - Image segmentation (deal with pixels)
        - Assign every pixel a value
        - Land cover classification
    - Change detection
        - Challenges
            - Atmos conditions
            - Resolution
            - Modality
            - Lack of ground truth
    - Image denoising
    - Image supersolution
    - Image compression
    - Image fusion
    - Forecasting for images

- Data source for machine learning in remote sensing
    - In-situ
    - Remote sensing images
    - Process models
- Data assimilation
    - fusion of different data sources to estimate possible states of a system
    - Sparse observations
    - Measurements error
    - Indirect sensing

- Supervised learning
    - Exploit prior knowledge
        - Experts, crowdsourcing
        - Other instruments
        - Treat the problem as a forecasting problem

- Neural networks
    - Multiple layers in the neural network model: deep learning models
    - How does deep learning neural network work conceptually?
        - Input as pixels, identify edges, combination of edges, features, and combination of features
        - Activation functions link different leayers
        - Apply weights to new samples
- Training DNN
    - Get batch of data
    - Forward thru the network and compute loss (e.g. MSE, categorical cross entropy)
        - Goal: reduce the error computed at the end of each estimation (quantified by a loss function)
    - Gradient descent
        - Reduce the loss
    - Backpropagation
        - Automatically dealt by the model to answer the following two questions
            - Which is the best loss function for our problem?
            - Which value to use for the learning rate
        - Backpropagation aims to minimize the cost function by adjusting network’s weights and biases ([ref](https://towardsdatascience.com/understanding-backpropagation-algorithm-7bb3aa2f95fd))
    - Optimization algorithms
- Training characteristics
    - Under-fitting
    - Over-fitting
    - Curves of error for testing data as training steps go forward are different between under-fitting and over-fitting
        - In the case of over-fitting, error significantly increases as training steps go further

# Discriminative models

- CNN
    - Intro
        - Convolution layer, sub-sampling layer, repeat, and fully connected MLP
        - Thru convolution and sub-sampling process, we generate feature maps
        - Number of feature maps generated increases as the network's layer grows
        - Convolution: moves in the entire image to deal with a much smaller image piece than the whole image
    - Convolutional layers
        - Input: multi-channel 2D image
        - Apply: k filters
        - Get: k feature maps
        - Characteristics
            - Hierarchical features
            - Location invariance
        - Parameters
            - Number of filters
            - Filter size
            = Stride
            - Padding
    - Activation layer
        - Introduction of non-linearity
            - Thresholding: sigmoid
        - ReLU: no saturation gradients
    - Subsampling (pooling) layers
        - A downscaling process
        - For example: max pooling, average pooling
    - Fully connected layers
        - Eventually, the image will become one feature, then, connect to the classes (classification problem)
            - Becomes a vector
        - Typically at the end

- Recurrent neural network
    - Hidden layers are used in forward propagation
    - Deal with sequential data

- LSTMs
    - ConvLSTM
- Vision Transformer (ViT)

- Dropout
    - Zeroing out some neurons in the connection
        - Dropout rate
    - Only applies during training
- Batch normalization
- Transfer learning
    - Use model trained on a similar task to solve another problem
    - Useful when data is limited for training
- Knowledge distillation
    - Teacher model knowledge transfers to student model

- New challenges to work on after image classification problem has mostly been solved
    - Multi-class vs. Multi-label
        - Generate multi-label results for a given image
        - Can use CNN to independently determine if the label applies or not
            - SegNet, U-Net
    - Image segment

- Application in flood mapping
    - Flood mapping using deep learning models
        - [OMBRIA dataset](https://github.com/geodrak/OMBRIA) for flood mapping
            - [IEEE paper](https://ieeexplore.ieee.org/document/9723593)
    - Modification of U-Net models to meet different needs
        - Pre and post event
        - Multiple satellite image sources

- Application in object detection
    - Fast R-CNN

## Hands-on working session

- Refer to this [GitHub repository](https://github.com/gtsagkatakis/GRSS2023_Classification_Denoising_tutorial)

# Generative models and inverse problems

## Inverse problems

- Inverse problems: super-resolution (SR)
    - Metrics to measure model performance
        - MSE
        - PSNR
        - SSIM

- Super Resolution CNN (SRCNN)
    - No max-pooling
    - Patch extraction, nonlinear mapping, reconstruction
    - An interpolation process

- Difficult to increase the PSNR even using advanced methods
    - Different from classification problems

- Multi-frame super resolution
    - Fusing short sequence of low-quality images into higher-quality ones
    - Example: producing high-quality images for ESA's Proba-V mission
        - 300 m daily, 100 m 5-day
- Spatial and spectral channel attention network
    - Modification based on U-Net

## Generative models

### AutoEncoders (AE)
    - Unsupervised feature learning
    - Network is trained to output the input
    - Compard with PCA, Deep Autoencoder shows a better performance
- Variational autoencoder
- Application: onboard change detection
    - Challenge: traditional satellite data transmission takes too much time. What if we can do the analysis on-board?
    - Use AE technology to generate an image to be close to the original one, if the event is missing in this construction, that means change occurred-> change mask is generated
    - Benefit: does not need the labeling process, an unsupervised process

- Forecasting

### Generative adversarial networks (GAN)
    - Two neural networks
        - Genrator
        - Discriminator
    - Generative network
        - Generate a fake image
    - Discraminator network
        - Predicted labels (fake or true)
    - No need to provide ground truth

- GANs for super resolution
    - Example: use SAR image as input and generate multi-spectral image to avoid cloud coverage
- GANs for cloud removal
    - Spatiotemporal generative network

### Self-supervision

- Self-supervision learning
    - Predict (past or future)
    - Pretend to have some information and generate new data based on this information

## Hands-on demo

- Refer to this [Google Colab notebook](https://github.com/gtsagkatakis/GRSS2023_Classification_Denoising_tutorial/blob/main/Tutorial2023_Denoising_UCMerced.ipynb)

# Recent development

- Reinforcement learning

- ClimaX
    - Earth system and weather model
    - [Blog link](https://www.microsoft.com/en-us/research/group/autonomous-systems-group-robotics/articles/introducing-climax-the-first-foundation-model-for-weather-and-climate/)
    - [Project link](https://microsoft.github.io/ClimaX/)