This project presents a custom convolutional autoencoder used for unsupervised anomaly detection in optical remote sensing imagery. The autoencoder is trained exclusively on background-only maritime scenes, allowing it to learn the βnormalβ patterns of sea environments. Ships and other man-made objects are then detected as structural anomalies during inference based on reconstruction errors.
In maritime monitoring tasks, detecting the presence of potentially dangerous or unknown vessels is critical. However, acquiring labeled ship data can be expensive and incomplete. This project introduces an unsupervised strategy to infer the presence of ships using an autoencoder trained only on ship-free images. During inference, regions with high reconstruction error are flagged as anomalies.
-
Dataset: MASATI (Maritime Satellite Imagery)
-
Used a subset of background-only images extracted manually from the MASATI training set
-
Image size: All input images were resized to 256Γ256 for training
-
Encoder:
- Conv2D (in=3, out=16, kernel=3, stride=2, padding=1) + ReLU
- Conv2D (16 β 32, stride=2) + ReLU
- Conv2D (32 β 64, stride=2) + ReLU
-
Decoder:
- ConvTranspose2D (64 β 32, stride=2) + ReLU
- ConvTranspose2D (32 β 16, stride=2) + ReLU
- ConvTranspose2D (16 β 3, stride=2) + Sigmoid
Output is same size as input (256Γ256), suitable for pixel-wise comparison.
- Framework: PyTorch 2.0 with CUDA acceleration (A100-SXM4-40GB)
- Input: 256Γ256 RGB image patches
- Loss: Mean Squared Error (MSE)
- Optimizer: Adam
- Learning rate: 0.001
- Weight decay: 1eβ5
- Epochs: 30
- Final training loss: 0.008609
The autoencoder was trained to reconstruct background-only scenes. Ships, which were never seen during training, appear as anomalies in the reconstruction due to their deviation from the learned patterns.
After training, the autoencoder processes images that may or may not contain ships. Two different anomaly map strategies were tested:
This method computes the L1 norm of the difference between input and reconstructed image:
Error(i, j) = (|R_input - R_recon| + |G_input - G_recon| + |B_input - B_recon|) / 3
- Very sensitive to small differences
- Tends to highlight both real anomalies (ships) and natural variations (coastline textures)
This method computes the signed average of RGB differences:
Error(i, j) = (R_input - R_recon + G_input - G_recon + B_input - B_recon) / 3
- More balanced
- Better at suppressing minor variations and highlighting true structural anomalies (e.g., ships in open sea)
- Both methods successfully identified ship regions as anomalies
- Absolute RGB method often resulted in high false positives (shoreline falsely detected)
- Signed RGB method provided more reliable and localized anomaly maps
- The model performed well on open-sea images with ships
- False positives increased near ports or coastlines due to the presence of buildings, piers, or vehicles
- Not a conventional object detector β it does not output bounding boxes or classifications
- May confuse complex man-made shoreline features with ships
- Attempts to use deeper encoders (e.g., VGG19) showed no meaningful improvement due to hardware limitations
- Integrate anomaly detection maps with object detectors
- Use more diverse background training data including ports
- Re-implement with deeper architectures using more powerful GPUs
- MASATI Dataset: [https://www.iuii.ua.es/datasets/masati/)
This project is for academic and research purposes only.