# Leveraging Big Multitemporal Multisource Satellite Data and Artificial Intelligence for the Detection of Complex and Invisible Features - the Case of Extensive Irrigation Mapping


Nazarij Buławka 1*, Hector A. Orengo 2, 3, Felipe Lumbreras Ruiz 4 , Iban Berganzo-Besga 3 and Ekta Gupta 5

1.	Non-Invasive and Digital Archeology Laboratory (PANiC), Faculty of Archaeology, University of Warsaw
2.	Catalan Institution for Research and Advanced Studies (ICREA)
3.	Computational Social Sciences and Humanities Department, Barcelona Supercomputing Center
4.	Department of Computer Science, Universidad Autónoma de Barcelona,
5.	Landscape Archaeology Research Group (GIAP), Catalan Institute of Classical Archaeology


 ## Levee detection demo

The file contains the demonstration of the Swin UNETR model for the detection of the levees. The original model is run in the Docker environment, which is published in the GitHub repository.


The demo consist of six steps

1. Setup
2. Download the data and the model
3. Predict the levees 
4. Apply post-processing
5. Evaluate the results
6. Visualise the results


# Citation

Buławka, Nazarij, Hector A. Orengo, Felipe Lumbreras Ruiz, Iban Berganzo-Besga and Ekta Gupta. n.d. ‘Leveraging Big Multitemporal Multisource Satellite Data and Artificial Intelligence for the Detection of Complex and Invisible Features - the Case of Extensive Irrigation Mapping’.


## 1. Setup

### Clone the repository

In [1]:
#!git clone https://github.com/nazarb/2025_levees_DL.git

### Set up path for further processing

In [16]:
import os
basepath = os.getcwd()
print(basepath)
# set other paths for quick navigation
git_path = os.path.join(basepath, "2025_levees_DL")
print(git_path)
Swin_UNETR_path = os.path.join(git_path, "Swin_UNETR")
print(Swin_UNETR_path)
model_path = os.path.join(Swin_UNETR_path, "model/Swin_UNETR/Aug")
results_path = os.path.join(Swin_UNETR_path, "results/Swin_UNETR/Aug")
os.makedirs(results_path, exist_ok=True)
print(results_path)


/Workspace/levees_DL
/Workspace/levees_DL/2025_levees_DL
/Workspace/levees_DL/2025_levees_DL/Swin_UNETR
/Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/Swin_UNETR/Aug


### Install libraries

In [17]:
#!pip install --no-cache-dir monai==1.3.2


In [18]:
#!sudo apt install megatools
#!pip install rasterio


In [19]:

import warnings
warnings.filterwarnings("ignore")
import torch
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.cuda.is_available()
torch.cuda.device_count()
torch.cuda.current_device()
torch.cuda.get_device_name(0)

'NVIDIA L40S'

## 2. Download the data and the model

The raster data used in this study is calculated with the provided Google Earth Engine code. The model is published in the Dane Badawcze UW repository (Published version: 2025-09-24):

```
Buławka, Nazarij; Orengo, Hector A.; Lumbreras Ruiz, Felipe; Berganzo-Besga, Iban; Gupta, Ekta, 2025, "Traces of ancient irrigation in central Iraq detected using deep learning model", https://doi.org/10.58132/MY8CCL, Dane Badawcze UW, V1
```


In [20]:
import os
import shutil
import requests
import os
import shutil

# Download data the raster calculated with provided Google Earth Engine code


# MEGA file URL
mega_url = "https://mega.nz/file/E9gBCCjR#UCDklOgOVOwQ0hRAIy5pvAdX9Y-VLBQktDlsrj4R1Ms"
sector = "CFE_a"
file_to_detect = "CFE_a_selected_L5_S2_S1_MSRM_PCA_GLO_N48.tif"

# Temporary download path (megadl saves the file with original name by default)
download_dir = basepath
downloaded_file = os.path.join(download_dir, file_to_detect)

# Target directory
target_dir1 = f'{Swin_UNETR_path}/data'
os.makedirs(target_dir1, exist_ok=True)

# Download file using megadl
!megadl "{mega_url}" --path "{download_dir}"


# Move the file to target directory (if needed rename)
shutil.move(downloaded_file, os.path.join(target_dir1, file_to_detect))
print(f"File moved to: {target_dir1}")




/usr/bin/sh: 1: megadl: not found


FileNotFoundError: [Errno 2] No such file or directory: '/Workspace/levees_DL/CFE_a_selected_L5_S2_S1_MSRM_PCA_GLO_N48.tif'

In [None]:
# Download the model from the Dane Badawcze UW repository (Published version: 2025-09-24):
## Buławka, Nazarij; Orengo, Hector A.; Lumbreras Ruiz, Felipe; Berganzo-Besga, Iban; Gupta, Ekta, 2025, "Traces of ancient irrigation in central Iraq detected using deep learning model", https://doi.org/10.58132/MY8CCL, Dane Badawcze UW, V1

## URL of the file
url = "https://danebadawcze.uw.edu.pl/api/access/datafile/17759"

## Download location
download_path = "/content/Levees_SWINUNETR_48_best_dataset_F.pth"

## Target directory
target_dir2 = model_path

os.makedirs(target_dir2, exist_ok=True)  # create dir if it doesn't exist

## Download the file
print("Downloading...")
response = requests.get(url, stream=True)
with open(download_path, "wb") as f:
    shutil.copyfileobj(response.raw, f)
print("Download complete.")

3# Move to target directory
final_path = os.path.join(target_dir2, "Levees_SWINUNETR_48_best_dataset_F.pth")
shutil.move(download_path, final_path)
print(f"File moved to: {final_path}")



## 3. Predict the levees 
Area: southeast of Babylon


In [21]:
# Check the paths for further processing

model_path
# Swin_UNETR_pth = os.path.join(target_dir2, "Levees_SWINUNETR_48_best.pth")
Swin_UNETR_pth = os.path.join(model_path, "Levees_SWINUNETR_48_best_dataset_F.pth")

print({Swin_UNETR_pth})
#data_path =  os.path.join(target_dir1, "CFE_a_selected_L5_S2_S1_MSRM_PCA_GLO_N48.tif")
data_path =  os.path.join(basepath, "Rasters", "CFE_a_selected_L5_S2_S1_MSRM_PCA_N48_.tif")
print(data_path)


{'/Workspace/levees_DL/2025_levees_DL/Swin_UNETR/model/Swin_UNETR/Aug/Levees_SWINUNETR_48_best_dataset_F.pth'}
/Workspace/levees_DL/Rasters/CFE_a_selected_L5_S2_S1_MSRM_PCA_N48_.tif


In [22]:
#import monai
#from monai.inferers import sliding_window_inference
from monai.networks.nets import SwinUNETR
#import einops
#import warnings
#import json
#import numpy as np
import torch
#from torch.utils.data import Dataset, DataLoader
import os
#
#from rasterio.windows import Window
#import tifffile as tiff
from utils import get_filename_without_extension
from utils import predict_and_save
from utils import convert_tiles_to_npy
from utils import save_full_raster
from utils.datasets import NumpyDataset
from utils import load_large_image
from utils import prepare_test_loader, split_to_tiles, create_tfw_file


filename_to_detect = get_filename_without_extension(data_path)


def initialize_model():
    """Initialize the SwinUNETR model."""
    model = SwinUNETR(
        img_size=(96, 96),
        in_channels=48,
        out_channels=1,  # Use the passed `num_classes`
        use_checkpoint=True,
        feature_size=48,
        depths=(3, 9, 18, 3),
        num_heads=(4, 8, 16, 32),
        drop_rate=0.1,  # Added dropout
        attn_drop_rate=0.1,
        dropout_path_rate=0.2,
        spatial_dims=2
    )
    return model


def load_adapted_model(model, checkpoint_path):
    checkpoint = torch.load(checkpoint_path, map_location='cpu')
    model.load_state_dict(checkpoint, strict=False)
    return model




if __name__ == "__main__":
    image_path = data_path

    tile_size = 96
    batch_size = 12
    model = initialize_model()

    print("Loading and preprocessing the large image")
    large_image, transform = load_large_image(image_path)
    large_image = NumpyDataset.replace_nans_in_array(large_image)
    image_shape = large_image.shape
    print(f"Large image shape: {image_shape}")

    temp_dir = f'{Swin_UNETR_path}/results/temp/'
    npy_dir = f'{Swin_UNETR_path}/results/temp_npy/'
    pred_dir = f'{Swin_UNETR_path}/results/temp_pred/'
    result_path = f'{Swin_UNETR_path}/results/Swin_UNETR/Aug/{filename_to_detect}_swinunetr.tif'
    tfw_path = f'{Swin_UNETR_path}/results/Swin_UNETR/Aug/{filename_to_detect}_swinunetr.tfw'

    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)

    tile_paths = split_to_tiles(large_image, tile_size, temp_dir)
    npy_paths = convert_tiles_to_npy(tile_paths, npy_dir)
    test_loader = prepare_test_loader(npy_paths, batch_size)

    adapted_model_path = Swin_UNETR_pth
    model = load_adapted_model(model, adapted_model_path)

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    predictions = predict_and_save(model, test_loader, device, tile_paths, save_dir=pred_dir)

    # Adjust the image_shape to reflect the single-channel output
    single_channel_image_shape = (1, image_shape[1], image_shape[2])
    save_full_raster(predictions, single_channel_image_shape, tile_size, result_path)

    # Create the .tfw file using the transform from the original large image
    create_tfw_file(transform, tfw_path)

    print("Prediction, saving, and .tfw file creation complete")



Loading and preprocessing the large image
Loading large image from /Workspace/levees_DL/Rasters/CFE_a_selected_L5_S2_S1_MSRM_PCA_N48_.tif
Large image shape: (48, 2992, 1745)
Splitting image into tiles of size 96x96
Saving tile to /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/temp/tile_0_0.tif
Saving tile to /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/temp/tile_0_96.tif
Saving tile to /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/temp/tile_0_192.tif
Saving tile to /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/temp/tile_0_288.tif
Saving tile to /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/temp/tile_0_384.tif
Saving tile to /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/temp/tile_0_480.tif
Saving tile to /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/temp/tile_0_576.tif
Saving tile to /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/temp/tile_0_672.tif
Saving tile to /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/result

## 4. Apply post-processing

In [23]:
import numpy as np
import cv2
import os
from skimage import io
from utils import create_tfw_file
from utils import load_large_binary_image

## Load image for post-processing. You can use result_path from the previous step or use input
# # raster_r = input("Enter path to raster data: ").strip('"') # input

raster_r = result_path # the result_path from the previous step 
filename_to_detect = get_filename_without_extension(raster_r) # get the thename for further processing
binary_raster, transform = load_large_binary_image(raster_r) # load image for prediction
binary_raster = (binary_raster > 0).astype(np.uint8)  # Ensure binary

# REMOVE SMALL OBJECTS
min_area = 350
#num_labels, labels, stats, _ = cv2.connectedComponentsWithStats(binary_raster, connectivity=8)
num_labels, labels, stats, _ = cv2.connectedComponentsWithStats(binary_raster, connectivity=8)

# Keep only features above the selected min_area parameter
filtered_raster = np.zeros_like(binary_raster, dtype=np.uint8)
for i in range(1, num_labels):  # skip background
    if stats[i, cv2.CC_STAT_AREA] >= min_area:
        filtered_raster[labels == i] = 1


# Output
output_dir = results_path
os.makedirs(output_dir, exist_ok=True)
filtered_dir = os.path.join(output_dir, "Filtered")
os.makedirs(filtered_dir, exist_ok=True)

# Save
filtered_tif_path = os.path.join(filtered_dir, f'{filename_to_detect}_filtered.tif')
io.imsave(filtered_tif_path, (filtered_raster * 1).astype(np.uint8))
create_tfw_file(transform, os.path.splitext(filtered_tif_path)[0] + '.tfw')

print("Filtering complete. Small features removed and .tfw file saved.")

# CLOSING
kernel = np.ones((7,7),np.uint8)
closing_raster = cv2.morphologyEx(filtered_raster, cv2.MORPH_CLOSE, kernel)

# SAVE CLOSING results
closing_tif_path = os.path.join(filtered_dir, f'{filename_to_detect}_closing.tif')
cv2.imwrite(closing_tif_path, (closing_raster * 1).astype(np.uint8))
create_tfw_file(transform, os.path.splitext(closing_tif_path)[0] + '.tfw')



Loading large image from /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/Swin_UNETR/Aug/CFE_a_selected_L5_S2_S1_MSRM_PCA_N48__swinunetr.tif
Creating .tfw file at /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/Swin_UNETR/Aug/Filtered/CFE_a_selected_L5_S2_S1_MSRM_PCA_N48__swinunetr_filtered.tfw
Filtering complete. Small features removed and .tfw file saved.
Creating .tfw file at /Workspace/levees_DL/2025_levees_DL/Swin_UNETR/results/Swin_UNETR/Aug/Filtered/CFE_a_selected_L5_S2_S1_MSRM_PCA_N48__swinunetr_closing.tfw


## 5. Evaluate the results


### Download the reference data


In [24]:
# Download the reference data



# MEGA file URL
mega_url2 = "https://mega.nz/file/5txRRQiZ#dDRqdI4F695r7yG465W5KIISVdQ1Iw2O8EWK2BIdoFs"

# Temporary download path (megadl saves the file with original name by default)
download_dir3 = "/content"
downloaded_file = os.path.join(download_dir, "CFE_a_levee_reference.tif")

# Target directory
target_dir3 = f'{Swin_UNETR_path}/data'
os.makedirs(target_dir3, exist_ok=True)

# Download file using megadl
!megadl "{mega_url2}" --path "{download_dir3}"

# Move the file to target directory (if needed rename)
shutil.move(downloaded_file, os.path.join(target_dir3, "CFE_a_levee_reference.tif"))
print(f"File moved to: {target_dir3}")

Reference_data_path = os.path.join(target_dir3, "CFE_a_levee_reference.tif")


/usr/bin/sh: 1: megadl: not found


FileNotFoundError: [Errno 2] No such file or directory: '/Workspace/levees_DL/CFE_a_levee_reference.tif'

### Compare the predicted levees and the reference data


In [15]:
import os
from utils import read_tfw_coordinates
from utils import align_reference_with_crs
from rasterio.crs import CRS
from utils import compare_rasters
from rasterio.crs import CRS


reference_data_path_ =  "Reference/CFE_a_levee_reference.tif"
reference_data_path = os.path.join(basepath, reference_data_path_)
print(reference_data_path)
reference_tif = reference_data_path

#reference_tif = input("Enter path to reference data: ").strip('"') # If manual selection is needed
predicted_tif = closing_tif_path
#predicted_tif = input("Enter path to predicted data: ").strip('"') # If manual selection is needed


# Step 1: Read transform from TFW file
predicted_transform = read_tfw_coordinates(predicted_tif)

# Step 1b: Define CRS manually (replace EPSG code as needed!)
dst_crs = CRS.from_epsg(4326)   #

# Step 2: Align reference raster
aligned_ref_path = os.path.splitext(predicted_tif)[0] + "_ref_aligned.tif"
aligned_ref = align_reference_with_crs(reference_tif, predicted_tif, predicted_transform, dst_crs, aligned_ref_path)


metrics = compare_rasters(
    reference_tif=aligned_ref,
    predicted_tif=predicted_tif,
    IoU_buf=4,
    output_path="custom_comparison.tif"
)

print(f"Precision: {metrics['precision']:.4f}")
print(f"Recall: {metrics['recall']:.4f}")
print(f"F1 Score: {metrics['f1_score']:.4f}")
print(f"IoU: {metrics['iou']:.4f}")


/Workspace/levees_DL/Reference/CFE_a_levee_reference.tif

[TFW] Affine transform read from file:

--- Evaluation Metrics ---
Precision:  0.6328
Recall:     0.5403
F1 Score:   0.5829
IoU:        0.4114
Comparison raster saved to:
custom_comparison.tif
Precision: 0.6328
Recall: 0.5403
F1 Score: 0.5829
IoU: 0.4114


In [None]:
from utils import reproject_raster

reprojected_ouput_file = f'{Swin_UNETR_path}/results/Swin_UNETR/Aug/{filename_to_detect}_swinunetr_closing_3857.tfw'

# Using TFW file for transform information
output = reproject_raster(
    "closing_tif_path,
    "prediction_3857.tif",
    dst_crs="EPSG:3857",
    use_tfw=True
)

The evaluation of the last version of the model after post-processing (min_area=350) using Copernicus DEM GLO 30 

| Dataset* | Model      | Test area | Prediction         | Precision | Recall | F1 Score | IoU    |
|----------|------------|-----------|--------------------|-----------|--------|----------|--------|
| F        | Swin UNETR | 1a        | Filtered + Closing | 0.4661    | 0.6142 | 0.5300   | 0.3605 |
| F        | Swin UNETR | 1b        | Filtered + Closing | 0.4675    | 0.6329 | 0.5377   | 0.3678 |
| F        | Swin UNETR | 1c        | Filtered + Closing | 0.4735    | 0.6265 | 0.5393   | 0.3692 |
| F        | Swin UNETR | 1d        | Filtered + Closing | 0.4389    | 0.6590 | 0.5269   | 0.3577 |
| F        | Swin UNETR | 1e        | Filtered + Closing | 0.3938    | 0.5753 | 0.4676   | 0.3051 |
| F        | Swin UNETR | 1f        | Filtered + Closing | 0.4231    | 0.6062 |  0.4984  | 0.3319 |
| Average  | Swin UNETR |           | Filtered + Closing | 0.4438    | 0.6190 | 0.5167   | 0.3487 |


6. Visualise the results



In [None]:
#!pip install leaflet
!pip install leafmap
!pip install localtileserver

In [None]:
import leafmap

# Map center coordinates
latitude = 31.52806
longitude = 65.24722

# Initialize map
m = leafmap.Map(center=[latitude, longitude], zoom=14)

# Raster path
raster_path = "/content/2025_levees_DL/Swin_UNETR/results/Swin_UNETR/Aug/Filtered/CFE_a_selected_L5_S2_S1_MSRM_PCA_GLO_N48_swinunetr_closing_comparison_buf1.tif"

# Define palette
palette = [
    "#e5350e",  # -1 = False Positives (red)
    "#0dff0d",  #  1 = True Positives (green)
    "#729bff",  #  2 = False Negatives (blue)
]

# ✅ Correct syntax for add_raster
m.add_raster(
    raster_path,    # source (positional)
    palette=palette,
    vmin=-1,
    vmax=2,
    nodata=0,
    layer_name="Detection Results",
)

# Add legend
legend_dict = {
    "False Positives (-1)": "#e5350e",
    "True Positives (1)": "#0dff0d",
    "False Negatives (2)": "#729bff",
}
m.add_legend(title="Detection Results", legend_dict=legend_dict)

# Add layer control for interactivity
m.add_layer_control()

# Show map
m


In [None]:
from utils import reproject_raster
from utils import get_filename_without_extension

raster_data =  closing_tif_path
data_path = os.path.join(raster_folder, raster_data)
print(data_path)
raster_r = data_path
import os

import os

#raster_r = input("Enter path to raster data: ").strip('"')
raster_r = os.path.normpath(os.path.abspath(raster_r))  # Normalize and get absolute path
print(f"Normalized path: {raster_r}")

# Verify files exist
tfw_path = raster_r.replace(".tif", ".tfw")
print(f"TFW should be at: {tfw_path}")
print(f"TIF exists: {os.path.exists(raster_r)}")
print(f"TFW exists: {os.path.exists(tfw_path)}")


filename_to_detect = get_filename_without_extension(raster_r) # get the thename for further processing


import os

print(f"Input path: {raster_r}")
print(f"Absolute path: {os.path.abspath(raster_r)}")
print(f"File exists: {os.path.exists(raster_r)}")

tfw_path = raster_r.replace(".tif", ".tfw")
print(f"TFW path: {tfw_path}")
print(f"TFW exists: {os.path.exists(tfw_path)}")

reprojected_ouput_file = f'{Swin_UNETR_path}/results/Swin_UNETR/Aug/Filtered/{filename_to_detect}_3857.tif'
#reprojected_ouput_file = input("Enter path to raster data: ").strip('"') # input


# Using TFW file for transform information
output = reproject_raster(
    raster_r,
    reprojected_ouput_file,
    src_crs="EPSG:4326",
    dst_crs="EPSG:3857",
    use_tfw=True
)

print("here is", reprojected_ouput_file)


In [None]:
# Connected-pixel filtering of a binary channel raster (remove small components)
# - Reads a binary raster (0/1 or 0/255 etc.)
# - Labels connected components
# - Removes components with fewer than `min_pixels` pixels
# - Writes a filtered binary raster with the same georeferencing

import numpy as np
import rasterio


# ---- USER PARAMS ----
#in_raster   = "/home/hector/Documents/skeleton.tif"
in_raster = skeldir
min_pixels  = 50          # remove components smaller than this (in pixels)
connectivity = 8   
#out_raster  = "/home/hector/Documents/filter_skeleton_50conPix.tif"
out_raster = out_raster = os.path.join(filtered_dir, f'{filename_to_detect}_skeleton_{min_pixels}_{connectivity}.tif')
skeldir_sm = out_raster
       # 4 or 8 (8 is usually better for channel networks)

# If your raster is not strictly 0/1, define how to binarize it:
# e.g., channels are values > 0
binarize = lambda arr: (arr > 0)

# ---------------------
try:
    from scipy.ndimage import label
except ImportError as e:
    raise ImportError("This cell requires scipy. Install it with: pip install scipy (or conda install scipy)") from e

if connectivity not in (4, 8):
    raise ValueError("connectivity must be 4 or 8")

# Structuring element defines pixel connectivity
structure = np.array([[0,1,0],
                      [1,1,1],
                      [0,1,0]], dtype=np.uint8) if connectivity == 4 else np.ones((3,3), dtype=np.uint8)

with rasterio.open(in_raster) as src:
    prof = src.profile.copy()
    nodata = src.nodata

    band = src.read(1, masked=True)  # masked array if nodata exists
    # Build boolean binary mask for channels, respecting nodata
    binary = np.zeros(band.shape, dtype=bool)
    valid = ~band.mask if np.ma.isMaskedArray(band) else np.ones(band.shape, dtype=bool)
    binary[valid] = binarize(np.asarray(band)[valid])

    # Label connected components (only on True pixels)
    labels, nlab = label(binary, structure=structure)

    if nlab == 0:
        # Nothing to filter; just write an empty binary raster (or original)
        filtered = binary.astype(np.uint8)
    else:
        # Component sizes (labels start at 1; label 0 is background)
        sizes = np.bincount(labels.ravel())
        keep = sizes >= min_pixels
        keep[0] = False  # never keep background

        filtered = keep[labels].astype(np.uint8)

    # If you want to preserve nodata (rather than forcing nodata to 0), apply it back:
    if nodata is not None and np.any(~valid):
        filtered = filtered.astype(np.uint8)
        # set nodata pixels to nodata value (commonly 0, but not always)
        filtered = filtered.astype(np.float32) if prof["dtype"] in ("float32", "float64") else filtered
        filtered[~valid] = nodata

    # Write output: keep it binary uint8 unless you explicitly need another dtype
    prof.update(dtype=rasterio.uint8, count=1, nodata=0 if nodata is None else nodata, compress="deflate")
    # If nodata exists and is not 0, you may prefer to keep prof["dtype"] instead; adjust as needed.

    with rasterio.open(out_raster, "w", **prof) as dst:
        dst.write(filtered.astype(np.uint8), 1)

print(f"Done. Wrote: {out_raster}")
print(f"Removed components with < {min_pixels} pixels (connectivity={connectivity}).")
